Major problems with geo in MySQL

boo, geo, mysql, postgres Comments Off on Major problems with geo in MySQL

Many development projects are started with MySQL on board — it’s free, stable and scalable.

At some point your project might need Geo features (a.k.a. “spatial calculations“): a great example would be to count a distance from where your user is to closest airport, Eiffel tower or hotel.

If you’re at this point, you are in a gray zone. Think twice if you want to stay with MySQL. You can save lots of hours and rude words if you make a right decision now.

Reason is simple: Geo calculations in MySQL are not implemented the way they should. OK, ok, count a distance between 2 points is an easy task and can be solved by a single function, but anything more complex just don’t work. For example, distinguishing if a point is inside a boundary, or calculation of two boundaries overlapping square.

That’s what MySQL website says:

MySQL originally implemented these functions such that they used object bounding rectangles and returned the same result as the corresponding MBR-based functions.

What does it mean? Look at this picture of an area in Egypt called Al Jizah.

Al Jizah

The shape is quite complex, but MySQL cannot process it, so instead of this a Minimal Bounding Rectangle is used — which is exactly the rectangle shape of the picture itself. I cannot find exact words to express how error-prone that is.

You can try to reinvent a wheel and write your own functions (like I did…), but they are very slow — iterating through 4 Kb of points of a polygon takes 1-2 seconds, so if you have hundreds of polygons to compare — say bye-bye to the product performance.

This is fixed only in versions after 5.6.1.

What to do? Use Postgres + special spatial extension called PostGis. It’s super-fast, works in multiple dimensions and does it RIGHT.

Database choice

complain, db, mysql, postgres Comments Off on Database choice

If you consider a database engine for the new project between MySQL and Postgres, choose Postgres.

Reasons are simple.

1. Postgres has stricter datatypes. If a field is integer, you cannot assign a string value to it. In the long run it makes the database data less error-prone. Frameworks+ORMs will make the code transparent anyway.

2. It supports JSON data type.

3. Postgres supports Geo calculations, why MySQL has a very limited support. Really. Many services and websites need geo calculations to count distances, proximity, etc. MySQL supports the most simple operations, but any sophisticated geo math is just not implemented in MySQL.

GIS: parent-child auto-detection. Part I. Data.

db, development, geo, gis, mysql, postgres Comments Off on GIS: parent-child auto-detection. Part I. Data.

In our project, we had to solve a task of definition of administrative districts nesting.

The task was to organize the administrative centers in a clear hierarchy like a Russian doll (nested principle).

Examples could be:

  • Spain → Costa del Sol → Marbella
  • Ukraine → Crimea → South Coast → Yalta

There are touristic destinations in bold, not the political areas.

russian doll

Also we wanted to correct the nesting errors in the current database.

In this article I’ll show you how I solved this problem by using the KML-files of borders polygons and Postgres database + Postgis extension.

The geo data, which we used for our project, is not a commercial one  — it’s user generated open source, and full of mistakes. For example, the most frequent case — there were lots of cities assigned directly to a country and not belonging to any of its regions or areas; we call those the orphaned cities.

Plus, our business is tourist destinations, so that administrative and political fragmentation of countries is not always appropriate, and sometimes we have to  manually add the tourist regions. For example, such an administrative region like “Southern Coast of Crimea”. Although there is no such thing on a political map, that’s what people google for: “home on the South Coast” and not “a house in Yalta, Gaspra, Gursuf, somewhere there“.

The question is how to automatically find a parent for such administrative region (Ukraine for Crimea), and nest all its children (like the city of Yalta and Sudak in Southern Coast of Crimea area).

By the way: a country is made up of regions, regions consist of areas, the areas are filled with subareas, subareas contain cities. A russian doll, really.

To solve this problem it was decided to use the data of the boundaries of the regions, and then to compare parents borders against children borders.

Preparing data

The database of boundaries of regions can be found free on the Internet. I used the files for Google Earth — they are in KML format (XML with coordinates), or KMZ (zipped KML), since it is convenient to preview in Google Earth.

Automatical convert of KMZ  into KML did not work — the Unicode-characters in the names got broken, so I opened the KMZ-files in Google Earth and saved as KML manually. Yes-yes, “It is unworthy of excellent men to lose hours like slaves in thelabour of calculationwhich could safelybe relegated to anyone else if machines were used“, but if was just faster to re-save it manually rather than find a tool to automate.

The other way to get the boundaries file, especially if you need a custom region like the South Coast — is to draw it manually in Google Earth and save as a KML file.

The files I got had boundary data for every region of every country, but there were no relationships between parents and children. For example, one file was called “all regions of Spain” and the other was “all areas of Spain”, and no relation between the regions and areas. In fact, this is the challenge that we have to solve, and now we have all the data prepared to crack the nut.

In next chapter we will talk about the database platform and the data loading. Stay tuned.

(full article in Russian — http://habrahabr.ru/post/164997/)

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in