Major problems with geo in MySQL

boo, geo, mysql, postgres Comments Off on Major problems with geo in MySQL

Many development projects are started with MySQL on board — it’s free, stable and scalable.

At some point your project might need Geo features (a.k.a. “spatial calculations“): a great example would be to count a distance from where your user is to closest airport, Eiffel tower or hotel.

If you’re at this point, you are in a gray zone. Think twice if you want to stay with MySQL. You can save lots of hours and rude words if you make a right decision now.

Reason is simple: Geo calculations in MySQL are not implemented the way they should. OK, ok, count a distance between 2 points is an easy task and can be solved by a single function, but anything more complex just don’t work. For example, distinguishing if a point is inside a boundary, or calculation of two boundaries overlapping square.

That’s what MySQL website says:

MySQL originally implemented these functions such that they used object bounding rectangles and returned the same result as the corresponding MBR-based functions.

What does it mean? Look at this picture of an area in Egypt called Al Jizah.

Al Jizah

The shape is quite complex, but MySQL cannot process it, so instead of this a Minimal Bounding Rectangle is used — which is exactly the rectangle shape of the picture itself. I cannot find exact words to express how error-prone that is.

You can try to reinvent a wheel and write your own functions (like I did…), but they are very slow — iterating through 4 Kb of points of a polygon takes 1-2 seconds, so if you have hundreds of polygons to compare — say bye-bye to the product performance.

This is fixed only in versions after 5.6.1.

What to do? Use Postgres + special spatial extension called PostGis. It’s super-fast, works in multiple dimensions and does it RIGHT.

GIS: parent-child auto-detection. Part I. Data.

db, development, geo, gis, mysql, postgres Comments Off on GIS: parent-child auto-detection. Part I. Data.

In our project, we had to solve a task of definition of administrative districts nesting.

The task was to organize the administrative centers in a clear hierarchy like a Russian doll (nested principle).

Examples could be:

  • Spain → Costa del Sol → Marbella
  • Ukraine → Crimea → South Coast → Yalta

There are touristic destinations in bold, not the political areas.

russian doll

Also we wanted to correct the nesting errors in the current database.

In this article I’ll show you how I solved this problem by using the KML-files of borders polygons and Postgres database + Postgis extension.

The geo data, which we used for our project, is not a commercial one  — it’s user generated open source, and full of mistakes. For example, the most frequent case — there were lots of cities assigned directly to a country and not belonging to any of its regions or areas; we call those the orphaned cities.

Plus, our business is tourist destinations, so that administrative and political fragmentation of countries is not always appropriate, and sometimes we have to  manually add the tourist regions. For example, such an administrative region like “Southern Coast of Crimea”. Although there is no such thing on a political map, that’s what people google for: “home on the South Coast” and not “a house in Yalta, Gaspra, Gursuf, somewhere there“.

The question is how to automatically find a parent for such administrative region (Ukraine for Crimea), and nest all its children (like the city of Yalta and Sudak in Southern Coast of Crimea area).

By the way: a country is made up of regions, regions consist of areas, the areas are filled with subareas, subareas contain cities. A russian doll, really.

To solve this problem it was decided to use the data of the boundaries of the regions, and then to compare parents borders against children borders.

Preparing data

The database of boundaries of regions can be found free on the Internet. I used the files for Google Earth — they are in KML format (XML with coordinates), or KMZ (zipped KML), since it is convenient to preview in Google Earth.

Automatical convert of KMZ  into KML did not work — the Unicode-characters in the names got broken, so I opened the KMZ-files in Google Earth and saved as KML manually. Yes-yes, “It is unworthy of excellent men to lose hours like slaves in thelabour of calculationwhich could safelybe relegated to anyone else if machines were used“, but if was just faster to re-save it manually rather than find a tool to automate.

The other way to get the boundaries file, especially if you need a custom region like the South Coast — is to draw it manually in Google Earth and save as a KML file.

The files I got had boundary data for every region of every country, but there were no relationships between parents and children. For example, one file was called “all regions of Spain” and the other was “all areas of Spain”, and no relation between the regions and areas. In fact, this is the challenge that we have to solve, and now we have all the data prepared to crack the nut.

In next chapter we will talk about the database platform and the data loading. Stay tuned.

(full article in Russian — http://habrahabr.ru/post/164997/)

Detect if city is within area

db, development, geo 1 Comment »

We at SunnyRentals are working a lot with geo data.

The last task I was solving was inclusion of all cities in a touristic regions bounds automatically.

Yes, there are MySQL spatial data types, so region border and cities coordinates can be stored in native data types, but the processing functions are not implemented right: they only operate on MBRs (minimum bounding rectangles) to make things simpler…

So, let’s create our own MySQL function for that.

There are different approaches to distinguish if a point is within a given polygon. I prefered the way to iterate the edges of the polygon and check where the point relatively to the edge is – at right or at left; if the point is always at the same side, it lies within the border.

So we came to a fact that we need 2 functions:

  1. Distinguish the side at which the point lies from the line
  2. Check that for every edge in the polygon.

Here is the first function:

DELIMITER //

DROP FUNCTION `GetPointPositionOfLine`//
CREATE FUNCTION `GetPointPositionOfLine`( a POINT, b POINT, p POINT) RETURNS tinyint NO SQL
  COMMENT 'Distinguish the side at which the point lies from the line'
BEGIN
  DECLARE s DOUBLE;
  SET s = (X(b) - X(a))*(Y(p) - Y(a)) - (Y(b) - Y(a))*(X(p) - X(a));
  IF s > 0 THEN
    RETURN 1;
  END IF;
  IF s < 0 THEN
    RETURN -1;
  END IF;
  RETURN 0;
END //


All three params are of type POINT. Points a and b are the beginning and end of the line. Point p is the point we check.

Here is the main function:

DELIMITER //

DROP FUNCTION `IsPointWithin`//
CREATE FUNCTION `IsPointWithin`( ls LINESTRING, p POINT) RETURNS tinyint(1) NO SQL
BEGIN
  DECLARE i INT;
  DECLARE a, b POINT;
  DECLARE LineStringNumPoints INT;
  DECLARE Position, PrevPosition TINYINT;

  SET i = 1;
  SET PrevPosition = 0;
  SET LineStringNumPoints = NumPoints(ls);

  WHILE i <= LineStringNumPoints DO
    SET a = PointN( ls, i);
    IF i = LineStringNumPoints THEN
      SET b = StartPoint( ls);
    ELSE
      SET b = PointN( ls, i+1);
    END IF;

    SET Position = GetPointPositionOfLine( a, b, p);
    IF Position <> 0 THEN
      IF Position <> PrevPosition AND PrevPosition <> 0 THEN
        RETURN FALSE;
      END IF;
      SET PrevPosition = Position;
    END IF;
    SET i = i + 1;
  END WHILE;
  RETURN TRUE;
END //


It checks if a point p lies within a polygon of type LINESTRING.

Usage example. Given you have coordinates of cities in the separate float columns (to keep them readable) lat and lng of city table, and the polygon is given as a string too — but of course you can use a column of type LINESTRING:

SELECT * 
FROM `city` c
WHERE IsPointWithin(GeomFromText('LINESTRING(4 8, 15 16, 23 42)'), POINT(c.`lat`, c.`lng`));

The solution works pretty fast, especially if you first limit the cities by the rectangle that the polygon lies in.

Reverse Geocoding

development, geo 2 Comments »

Reverse Geocoding is process opposite to Geocoding (when you get map coordinates by city/country given). So the idea is to get city name by coordinates on the map.

Why you may need it? For example, user is supposed to add a marker on the map, and you want to check that the marker is within a particular country or region.

Solutions:

  1. Google GeoCoder (JavaScript) can work with coordinates to retrieve location name. Advantage is that return information is translated to the preferred language of your browser. Here is an example you can play with.
  2. GeoNames.org offers several services, among which you can find Reverse Geocoding web service (REST or JSON) which can work with coordinates and postal codes. Advantage — you can download city/region/country names and POI with coordinates.

GeoNamesBy the way, here are results of benchmarking Google vs. GeoNames.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in