Using non-alphanumeric characters in Sitemap URLs

development, encoding, google, links, php Comments Off on Using non-alphanumeric characters in Sitemap URLs

This article in Google Help explains how to deal with special characters in Sitemaps that you can submit to Webmaster tools in order to increase the number of indexed pages of your website.

The main point is: the URLs must contain ASCII symbols only.

It can be done this way:

  • (obvious) ampersand, both quotes and <> symbols must be encoded,
  • Unicode symbols must be encoded, eg. ü must be converted to %FC sequence,
  • URLs that you submit must follow the  RFC-3986

If you use PHP, pay attention to one thing: it seems rawurlencode should be used instead of the usual urlencode since it’s follows the RFC-3986 as stated in PHP documentation.

Delete Your Code #7: Right encoding

db, delete your code, development, encoding, mysql, php, server Comments Off on Delete Your Code #7: Right encoding

You might be surprised, but a right choise of the project text encoding can affect the project file size and amount of bugs.

To avoid bugs of wrong presentation of text on your page, make sure that all database entities and the application server (PHP) use the same encoding. That helps to forget about issues connected with text presentation.

On database side, make sure you set the correct encoding to:

  • database,
  • tables,
  • columns,
  • import-export tools parameters (a big source of wrong encoding bugs),
  • corresponding SQL server variables

On application server it’s usually just one query –

SET NAMES utf8

Pay attention, that utf8 might be not the best choise for your project: every non-English character needs 2-6 bytes of memory, so if you built a one-language (local) project with lots of database data, consider using a 1 byte encoding like windows-1251 and save about half of the space on server file system.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in