GIS: parent-child auto-detection. Part I. Data.

db, development, geo, gis, mysql, postgres Add comments

In our project, we had to solve a task of definition of administrative districts nesting.

The task was to organize the administrative centers in a clear hierarchy like a Russian doll (nested principle).

Examples could be:

  • Spain → Costa del Sol → Marbella
  • Ukraine → Crimea → South Coast → Yalta

There are touristic destinations in bold, not the political areas.

russian doll

Also we wanted to correct the nesting errors in the current database.

In this article I’ll show you how I solved this problem by using the KML-files of borders polygons and Postgres database + Postgis extension.

The geo data, which we used for our project, is not a commercial one  — it’s user generated open source, and full of mistakes. For example, the most frequent case — there were lots of cities assigned directly to a country and not belonging to any of its regions or areas; we call those the orphaned cities.

Plus, our business is tourist destinations, so that administrative and political fragmentation of countries is not always appropriate, and sometimes we have to  manually add the tourist regions. For example, such an administrative region like “Southern Coast of Crimea”. Although there is no such thing on a political map, that’s what people google for: “home on the South Coast” and not “a house in Yalta, Gaspra, Gursuf, somewhere there“.

The question is how to automatically find a parent for such administrative region (Ukraine for Crimea), and nest all its children (like the city of Yalta and Sudak in Southern Coast of Crimea area).

By the way: a country is made up of regions, regions consist of areas, the areas are filled with subareas, subareas contain cities. A russian doll, really.

To solve this problem it was decided to use the data of the boundaries of the regions, and then to compare parents borders against children borders.

Preparing data

The database of boundaries of regions can be found free on the Internet. I used the files for Google Earth — they are in KML format (XML with coordinates), or KMZ (zipped KML), since it is convenient to preview in Google Earth.

Automatical convert of KMZ  into KML did not work — the Unicode-characters in the names got broken, so I opened the KMZ-files in Google Earth and saved as KML manually. Yes-yes, “It is unworthy of excellent men to lose hours like slaves in thelabour of calculationwhich could safelybe relegated to anyone else if machines were used“, but if was just faster to re-save it manually rather than find a tool to automate.

The other way to get the boundaries file, especially if you need a custom region like the South Coast — is to draw it manually in Google Earth and save as a KML file.

The files I got had boundary data for every region of every country, but there were no relationships between parents and children. For example, one file was called “all regions of Spain” and the other was “all areas of Spain”, and no relation between the regions and areas. In fact, this is the challenge that we have to solve, and now we have all the data prepared to crack the nut.

In next chapter we will talk about the database platform and the data loading. Stay tuned.

(full article in Russian — http://habrahabr.ru/post/164997/)

Comments are closed.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in