Using non-alphanumeric characters in Sitemap URLs

development, encoding, google, links, php Comments Off on Using non-alphanumeric characters in Sitemap URLs

This article in Google Help explains how to deal with special characters in Sitemaps that you can submit to Webmaster tools in order to increase the number of indexed pages of your website.

The main point is: the URLs must contain ASCII symbols only.

It can be done this way:

  • (obvious) ampersand, both quotes and <> symbols must be encoded,
  • Unicode symbols must be encoded, eg. ü must be converted to %FC sequence,
  • URLs that you submit must follow the  RFC-3986

If you use PHP, pay attention to one thing: it seems rawurlencode should be used instead of the usual urlencode since it’s follows the RFC-3986 as stated in PHP documentation.

Install Sphinx on Mac

development, mac, server, sphinx Comments Off on Install Sphinx on Mac

To install Sphinx search on Mac, I had to find pieces of information here and there.

Here is my list how to sort it out:

1. Download a fresh stable source from Sphinx website to your /tmp folder. It’s a tar.gz file.

2. Go to /tmp folder and run this command to untar the arhieve:

tar -zxvf sphinx-2.0.6-release.tar.gz

Fix the file name since the version can change.

3. Go into the “sphinx-2.0.6-release” folder.

4. Fix this command by my comments below and then run:

./configure --enable-id64  --prefix=/usr/local --with-mysql=/usr/local/mysql-5.1.63-osx10.6-x86/ LDFLAGS="-arch i386"
  • --enable-id64 mean you want to support really long integers as document IDs, depends on your application needs
  • --with-mysql – fix the path to the file

5. Run this command:

sudo make

Check the response, there must be no errors. Mine ends like this:

...[scary looking C commands]...
Making all in test
make[1]: Nothing to be done for `all'.
Making all in doc
All docs are already pre-built by developer.
If you want to rebuild them, install docbook-xsl
and xsltproc and then run 'make docs' instead of simple 'make'.
make[1]: Nothing to be done for `all-am'.

6. Run this:

sudo make install

7. Test if all is OK. If you run this, you must get a list of parameters of this tool:

indexer

Good luck!

GIS: parent-child auto-detection. Part I. Data.

db, development, geo, gis, mysql, postgres Comments Off on GIS: parent-child auto-detection. Part I. Data.

In our project, we had to solve a task of definition of administrative districts nesting.

The task was to organize the administrative centers in a clear hierarchy like a Russian doll (nested principle).

Examples could be:

  • Spain → Costa del Sol → Marbella
  • Ukraine → Crimea → South Coast → Yalta

There are touristic destinations in bold, not the political areas.

russian doll

Also we wanted to correct the nesting errors in the current database.

In this article I’ll show you how I solved this problem by using the KML-files of borders polygons and Postgres database + Postgis extension.

The geo data, which we used for our project, is not a commercial one  — it’s user generated open source, and full of mistakes. For example, the most frequent case — there were lots of cities assigned directly to a country and not belonging to any of its regions or areas; we call those the orphaned cities.

Plus, our business is tourist destinations, so that administrative and political fragmentation of countries is not always appropriate, and sometimes we have to  manually add the tourist regions. For example, such an administrative region like “Southern Coast of Crimea”. Although there is no such thing on a political map, that’s what people google for: “home on the South Coast” and not “a house in Yalta, Gaspra, Gursuf, somewhere there“.

The question is how to automatically find a parent for such administrative region (Ukraine for Crimea), and nest all its children (like the city of Yalta and Sudak in Southern Coast of Crimea area).

By the way: a country is made up of regions, regions consist of areas, the areas are filled with subareas, subareas contain cities. A russian doll, really.

To solve this problem it was decided to use the data of the boundaries of the regions, and then to compare parents borders against children borders.

Preparing data

The database of boundaries of regions can be found free on the Internet. I used the files for Google Earth — they are in KML format (XML with coordinates), or KMZ (zipped KML), since it is convenient to preview in Google Earth.

Automatical convert of KMZ  into KML did not work — the Unicode-characters in the names got broken, so I opened the KMZ-files in Google Earth and saved as KML manually. Yes-yes, “It is unworthy of excellent men to lose hours like slaves in thelabour of calculationwhich could safelybe relegated to anyone else if machines were used“, but if was just faster to re-save it manually rather than find a tool to automate.

The other way to get the boundaries file, especially if you need a custom region like the South Coast — is to draw it manually in Google Earth and save as a KML file.

The files I got had boundary data for every region of every country, but there were no relationships between parents and children. For example, one file was called “all regions of Spain” and the other was “all areas of Spain”, and no relation between the regions and areas. In fact, this is the challenge that we have to solve, and now we have all the data prepared to crack the nut.

In next chapter we will talk about the database platform and the data loading. Stay tuned.

(full article in Russian — http://habrahabr.ru/post/164997/)

Fix “Danish to go” player position

development Comments Off on Fix “Danish to go” player position

You can always have the player visible in 3 easy steps.

1. Create a bookmark of current page in your browser (Google Chrome in my example, but it doesn’t really matter).
2. Right-click on the bookmark and choose “Edit…” menu item.
3. Replace the bookmark URL with this snippet of code and click Save:

javascript: $('#playerViewer').css('position', 'fixed').css('top', '5px').css('right', '100px');

Looks like this:

Edit the bookmark

So now, when you start to listen to audio on the Danish to go page, click the bookmark, and the player will be shown in top right position of your browser, no matter how fast you scroll. Magic ;]

PHPUnit: difference between assertEquals and assertSame

phpunit, TDD Comments Off on PHPUnit: difference between assertEquals and assertSame

It turned out to be easy:

  • assertEquals — check by value
  • assertSame — check by reference

So here is an example:


  $objectA = new StdClass;
  $objectB = new StdClass;
  $objectA->value = 42;
  $objectB->value = 42;
  $container = array($objectA);

  $this->assertEquals($objectA, $objectB); //true
  $this->assertSame($objectA, $objectB); //false

  $this->assertEquals($objectA, $container[0]); //true
  $this->assertSame($objectA, $container[0]); //true

Problems with download in InternetExplorer

development, php Comments Off on Problems with download in InternetExplorer

Don’t forget to include this simple header, otherwise Internet Explorer users over SSL will get a weird error message that the download cannot be done:

header('Pragma: public');

It’s better to make a filename nice and simple, avoid slashes in it and don’t quote it — again in respect to IE users.

The best way to go is just copy the list of headers to implement the download of a fil from PHP documentation page.

Delete Your Code #8: Use the default values

db, delete your code, development, mysql Comments Off on Delete Your Code #8: Use the default values

Default values — how trivial this hint might seem…

Although, simple things make life easier. If you have a huge amount of SQL updates in your system, having correct default values helps to build smaller queries.

Also, when playing with data in your PhpMyAdmin, you can afford to focus on what is really important, not on repeating the boring staff. Every time when you have a big table and need to paste a couple of new records manually, you or the pear developer will be thankful that the majority the fields can be just ommited in order to have the proper and meaningful value.

It can be zero for numeric types, current date and time value for timestamp, a most frequent and meaningful value for the ENUM field or a “magic constant” of default value if you have to use one in your application.

Email address RFC

development, fun Comments Off on Email address RFC

It’s was entertaining to dig into email address format while working on a corresponding task.

Here are a few facts that I didn’t expect to be allowed for email address:

  • the local part of an email address can contain spaces, and it must be quoted and escaped by a back slash like “\ “
  • the local part of an email address can contain comments! It’s put in parentheses and can be omitted. Example: “john(comment).smith@example.com” equals to “john.smith@example.com
  • domain part can have IP instead of domain. To do that, it must be put in square braces like “john@[192.168.1.1]”

Here are examples of VALID email addresses:

  • '@[10.10.10.10]
  • user@[IPv6:2001:db8:1ff::a0b:dbd0]
  • "much.more\ unusual"@example.com
  • "very.unusual.@.unusual.com"@example.com
  • "very.(),:;<>[]\".VERY.\"very@\\\ \"very\".unusual"@strange.example.com
  • 0@a
  • !#$%&'*+-/=?^_`{}|~@example.org
  • "()<>[]:;@,\\\"!#$%&'*+-/=?^_`{}|\ \ \ \ \ ~\ \ \ \ \ \ \ ?\ \ \ \ \ \ \ \ \ \ \ \ ^_`{}|~.a"@example.org
  • ""@example.org

Javascript: indexOf problem in empty array

development, jquery Comments Off on Javascript: indexOf problem in empty array

Funny, but indexOf method doesn’t work in an empty array in Internet Explorer, the method is just not defined. Eh!

Forget about that and always use jQuery wrapper –

$.inArray(42, [31, 42, 53]);

Delete Your Code #7: Right encoding

db, delete your code, development, encoding, mysql, php, server Comments Off on Delete Your Code #7: Right encoding

You might be surprised, but a right choise of the project text encoding can affect the project file size and amount of bugs.

To avoid bugs of wrong presentation of text on your page, make sure that all database entities and the application server (PHP) use the same encoding. That helps to forget about issues connected with text presentation.

On database side, make sure you set the correct encoding to:

  • database,
  • tables,
  • columns,
  • import-export tools parameters (a big source of wrong encoding bugs),
  • corresponding SQL server variables

On application server it’s usually just one query –

SET NAMES utf8

Pay attention, that utf8 might be not the best choise for your project: every non-English character needs 2-6 bytes of memory, so if you built a one-language (local) project with lots of database data, consider using a 1 byte encoding like windows-1251 and save about half of the space on server file system.

Delete Your Code #6: Admin Panel

delete your code, development, symfony Comments Off on Delete Your Code #6: Admin Panel

All of web applications I’ve built have an admin panel. Usually, that panel is a system to manage items: add a product, disable a user, delete a message.

All these create-update-delete functions are common, and interface for them can be standard.

That’s a way to kill some code – instead of creating a custom admin panel, you can build extremely fast a standard one.

One way to do that is to use admin generator tool of Symfony framework. As you can see in the screencast, it builds the admin interface automatically based on entities models.

Delete Your Code #5: Use foreign keys

db, delete your code Comments Off on Delete Your Code #5: Use foreign keys

A foreign key, this classic concept of relational databases, is extremely helpful in many ways.

To start with, it’s a watch dog of your database integrity — for example, it would be impossible to add user images records, if the user does not exist. No additional code, more control.

Another benefit is that all depending data is deleted cascadedly when the parent row gets killed. You remove a user — and voila: all images, comments, ratings and staff fly away too.

And third, it helps to autogenerate admin tools; for instance, in Symfony framework, the generator of admin interface relies on this info.

To be able to use foreign keys in your MySQL database, do 2 following steps:

  • make sure the two tables you are connecting, are driven by InnoDB
  • ensure that the columns you connect (user_id in our example) are of exactly the same type, including NULL and sign allowance.

Delete Your Code #4: Ambiguous terms

delete your code, development, ideas Comments Off on Delete Your Code #4: Ambiguous terms

Every application operates in a knowledge domain.

File browser works with files, directories and drives. Library software deals with books and editions. Geo tool must be aware of coordinates.

Idea is, developers can save time and lines of code if they agree the terms of the knowledge domain and be strict with it.

Of course, all of us use some terms — this rule is to use a well defined set of terms and to avoid synonyms.

Examples of these rule violations:

  • “folder” vs. “directory”;
  • “date” or “timestamp”;
  • “partner” a.k.a. “affiliate”;
  • “DNS Provider” as “product supplier”;
  • short for “longitude” is “lon” or “lng”?
  • does “website address” differ from “URL”?
  • “region”, “area” and “subarea” — in contrast to ADM1, ADM2 and ADM3;
  • the application I deal with at my job has: “brand”, “model”, “market” and “platform”, which are also often called “manufacturer”, “vehicle”, “locale” and “client” respectively.

If you apply this rule, you will not waste time creating the multiple mapper classes. More reusage, more OOP inheritance, more rapport. More motivation to work on a solid, well defined project with people, you talk the same language with. Isn’t this nice?

Detect if city is within area

db, development, geo 1 Comment »

We at SunnyRentals are working a lot with geo data.

The last task I was solving was inclusion of all cities in a touristic regions bounds automatically.

Yes, there are MySQL spatial data types, so region border and cities coordinates can be stored in native data types, but the processing functions are not implemented right: they only operate on MBRs (minimum bounding rectangles) to make things simpler…

So, let’s create our own MySQL function for that.

There are different approaches to distinguish if a point is within a given polygon. I prefered the way to iterate the edges of the polygon and check where the point relatively to the edge is – at right or at left; if the point is always at the same side, it lies within the border.

So we came to a fact that we need 2 functions:

  1. Distinguish the side at which the point lies from the line
  2. Check that for every edge in the polygon.

Here is the first function:

DELIMITER //

DROP FUNCTION `GetPointPositionOfLine`//
CREATE FUNCTION `GetPointPositionOfLine`( a POINT, b POINT, p POINT) RETURNS tinyint NO SQL
  COMMENT 'Distinguish the side at which the point lies from the line'
BEGIN
  DECLARE s DOUBLE;
  SET s = (X(b) - X(a))*(Y(p) - Y(a)) - (Y(b) - Y(a))*(X(p) - X(a));
  IF s > 0 THEN
    RETURN 1;
  END IF;
  IF s < 0 THEN
    RETURN -1;
  END IF;
  RETURN 0;
END //


All three params are of type POINT. Points a and b are the beginning and end of the line. Point p is the point we check.

Here is the main function:

DELIMITER //

DROP FUNCTION `IsPointWithin`//
CREATE FUNCTION `IsPointWithin`( ls LINESTRING, p POINT) RETURNS tinyint(1) NO SQL
BEGIN
  DECLARE i INT;
  DECLARE a, b POINT;
  DECLARE LineStringNumPoints INT;
  DECLARE Position, PrevPosition TINYINT;

  SET i = 1;
  SET PrevPosition = 0;
  SET LineStringNumPoints = NumPoints(ls);

  WHILE i <= LineStringNumPoints DO
    SET a = PointN( ls, i);
    IF i = LineStringNumPoints THEN
      SET b = StartPoint( ls);
    ELSE
      SET b = PointN( ls, i+1);
    END IF;

    SET Position = GetPointPositionOfLine( a, b, p);
    IF Position <> 0 THEN
      IF Position <> PrevPosition AND PrevPosition <> 0 THEN
        RETURN FALSE;
      END IF;
      SET PrevPosition = Position;
    END IF;
    SET i = i + 1;
  END WHILE;
  RETURN TRUE;
END //


It checks if a point p lies within a polygon of type LINESTRING.

Usage example. Given you have coordinates of cities in the separate float columns (to keep them readable) lat and lng of city table, and the polygon is given as a string too — but of course you can use a column of type LINESTRING:

SELECT * 
FROM `city` c
WHERE IsPointWithin(GeomFromText('LINESTRING(4 8, 15 16, 23 42)'), POINT(c.`lat`, c.`lng`));

The solution works pretty fast, especially if you first limit the cities by the rectangle that the polygon lies in.

Delete Your Code #3: shorter script and style tags

delete your code, development 1 Comment »

At my previous job a team leader was a firm believer that <script> tag must be always written in a full form like:

<script type="text/javascript">

When you have lots of javascript pieces, it becomes exhausting.

Good news, there is a way to tell to browser, that default script language is JavaScript.

To do that, inlcude a special META-tag in the HEAD of your HTML document. So if you have an inline JavaScript code for a onclick attribute of a tag, browser has no doubts it’s JavaScript.

Anyway, I include these lines in my projects HTML:

<meta http-equiv="Content-Script-Type" content="text/javascript" />

After that you can have just this:

<script>

By the way, since it’s http-equiv stuff, you can set it at server side.

P.S. There is the same trick for CSS, but it’s seems there are no alternatives to write styles, so it is useless.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in