MySQL random function

db, development, mysql Comments Off on MySQL random function

MySQL RAND() function returns values between 0 and 1. At times you need a random value between A and B, then you can use this function:

DELIMITER //
DROP  FUNCTION IF EXISTS RANDOM;//
CREATE FUNCTION RANDOM(minimum INT, maximum INT)
    RETURNS INT NO SQL NOT DETERMINISTIC
    COMMENT 'integer random value in the bounds given'
RETURN minimum + 
  ROUND(1000000 * RAND() % (maximum-minimum) );//
DELIMITER ;

You can call it like this to get a random integer value between 0 and 100:

SELECT RANDOM(0, 100);

It’s security, man!

anecdote, db, fun, story Comments Off on It’s security, man!

2 developers are talking:
— Hey, what’s the password for our production database?
— 12354.
— Hmm, why isn’t it just 12345 then?
— Gosh! It’s security, man!

Gosh

Database choice

complain, db, mysql, postgres Comments Off on Database choice

If you consider a database engine for the new project between MySQL and Postgres, choose Postgres.

Reasons are simple.

1. Postgres has stricter datatypes. If a field is integer, you cannot assign a string value to it. In the long run it makes the database data less error-prone. Frameworks+ORMs will make the code transparent anyway.

2. It supports JSON data type.

3. Postgres supports Geo calculations, why MySQL has a very limited support. Really. Many services and websites need geo calculations to count distances, proximity, etc. MySQL supports the most simple operations, but any sophisticated geo math is just not implemented in MySQL.

GIS: parent-child auto-detection. Part I. Data.

db, development, geo, gis, mysql, postgres Comments Off on GIS: parent-child auto-detection. Part I. Data.

In our project, we had to solve a task of definition of administrative districts nesting.

The task was to organize the administrative centers in a clear hierarchy like a Russian doll (nested principle).

Examples could be:

  • Spain → Costa del Sol → Marbella
  • Ukraine → Crimea → South Coast → Yalta

There are touristic destinations in bold, not the political areas.

russian doll

Also we wanted to correct the nesting errors in the current database.

In this article I’ll show you how I solved this problem by using the KML-files of borders polygons and Postgres database + Postgis extension.

The geo data, which we used for our project, is not a commercial one  — it’s user generated open source, and full of mistakes. For example, the most frequent case — there were lots of cities assigned directly to a country and not belonging to any of its regions or areas; we call those the orphaned cities.

Plus, our business is tourist destinations, so that administrative and political fragmentation of countries is not always appropriate, and sometimes we have to  manually add the tourist regions. For example, such an administrative region like “Southern Coast of Crimea”. Although there is no such thing on a political map, that’s what people google for: “home on the South Coast” and not “a house in Yalta, Gaspra, Gursuf, somewhere there“.

The question is how to automatically find a parent for such administrative region (Ukraine for Crimea), and nest all its children (like the city of Yalta and Sudak in Southern Coast of Crimea area).

By the way: a country is made up of regions, regions consist of areas, the areas are filled with subareas, subareas contain cities. A russian doll, really.

To solve this problem it was decided to use the data of the boundaries of the regions, and then to compare parents borders against children borders.

Preparing data

The database of boundaries of regions can be found free on the Internet. I used the files for Google Earth — they are in KML format (XML with coordinates), or KMZ (zipped KML), since it is convenient to preview in Google Earth.

Automatical convert of KMZ  into KML did not work — the Unicode-characters in the names got broken, so I opened the KMZ-files in Google Earth and saved as KML manually. Yes-yes, “It is unworthy of excellent men to lose hours like slaves in thelabour of calculationwhich could safelybe relegated to anyone else if machines were used“, but if was just faster to re-save it manually rather than find a tool to automate.

The other way to get the boundaries file, especially if you need a custom region like the South Coast — is to draw it manually in Google Earth and save as a KML file.

The files I got had boundary data for every region of every country, but there were no relationships between parents and children. For example, one file was called “all regions of Spain” and the other was “all areas of Spain”, and no relation between the regions and areas. In fact, this is the challenge that we have to solve, and now we have all the data prepared to crack the nut.

In next chapter we will talk about the database platform and the data loading. Stay tuned.

(full article in Russian — http://habrahabr.ru/post/164997/)

Delete Your Code #8: Use the default values

db, delete your code, development, mysql Comments Off on Delete Your Code #8: Use the default values

Default values — how trivial this hint might seem…

Although, simple things make life easier. If you have a huge amount of SQL updates in your system, having correct default values helps to build smaller queries.

Also, when playing with data in your PhpMyAdmin, you can afford to focus on what is really important, not on repeating the boring staff. Every time when you have a big table and need to paste a couple of new records manually, you or the pear developer will be thankful that the majority the fields can be just ommited in order to have the proper and meaningful value.

It can be zero for numeric types, current date and time value for timestamp, a most frequent and meaningful value for the ENUM field or a “magic constant” of default value if you have to use one in your application.

Delete Your Code #7: Right encoding

db, delete your code, development, encoding, mysql, php, server Comments Off on Delete Your Code #7: Right encoding

You might be surprised, but a right choise of the project text encoding can affect the project file size and amount of bugs.

To avoid bugs of wrong presentation of text on your page, make sure that all database entities and the application server (PHP) use the same encoding. That helps to forget about issues connected with text presentation.

On database side, make sure you set the correct encoding to:

  • database,
  • tables,
  • columns,
  • import-export tools parameters (a big source of wrong encoding bugs),
  • corresponding SQL server variables

On application server it’s usually just one query –

SET NAMES utf8

Pay attention, that utf8 might be not the best choise for your project: every non-English character needs 2-6 bytes of memory, so if you built a one-language (local) project with lots of database data, consider using a 1 byte encoding like windows-1251 and save about half of the space on server file system.

Delete Your Code #5: Use foreign keys

db, delete your code Comments Off on Delete Your Code #5: Use foreign keys

A foreign key, this classic concept of relational databases, is extremely helpful in many ways.

To start with, it’s a watch dog of your database integrity — for example, it would be impossible to add user images records, if the user does not exist. No additional code, more control.

Another benefit is that all depending data is deleted cascadedly when the parent row gets killed. You remove a user — and voila: all images, comments, ratings and staff fly away too.

And third, it helps to autogenerate admin tools; for instance, in Symfony framework, the generator of admin interface relies on this info.

To be able to use foreign keys in your MySQL database, do 2 following steps:

  • make sure the two tables you are connecting, are driven by InnoDB
  • ensure that the columns you connect (user_id in our example) are of exactly the same type, including NULL and sign allowance.

Detect if city is within area

db, development, geo 1 Comment »

We at SunnyRentals are working a lot with geo data.

The last task I was solving was inclusion of all cities in a touristic regions bounds automatically.

Yes, there are MySQL spatial data types, so region border and cities coordinates can be stored in native data types, but the processing functions are not implemented right: they only operate on MBRs (minimum bounding rectangles) to make things simpler…

So, let’s create our own MySQL function for that.

There are different approaches to distinguish if a point is within a given polygon. I prefered the way to iterate the edges of the polygon and check where the point relatively to the edge is – at right or at left; if the point is always at the same side, it lies within the border.

So we came to a fact that we need 2 functions:

  1. Distinguish the side at which the point lies from the line
  2. Check that for every edge in the polygon.

Here is the first function:

DELIMITER //

DROP FUNCTION `GetPointPositionOfLine`//
CREATE FUNCTION `GetPointPositionOfLine`( a POINT, b POINT, p POINT) RETURNS tinyint NO SQL
  COMMENT 'Distinguish the side at which the point lies from the line'
BEGIN
  DECLARE s DOUBLE;
  SET s = (X(b) - X(a))*(Y(p) - Y(a)) - (Y(b) - Y(a))*(X(p) - X(a));
  IF s > 0 THEN
    RETURN 1;
  END IF;
  IF s < 0 THEN
    RETURN -1;
  END IF;
  RETURN 0;
END //


All three params are of type POINT. Points a and b are the beginning and end of the line. Point p is the point we check.

Here is the main function:

DELIMITER //

DROP FUNCTION `IsPointWithin`//
CREATE FUNCTION `IsPointWithin`( ls LINESTRING, p POINT) RETURNS tinyint(1) NO SQL
BEGIN
  DECLARE i INT;
  DECLARE a, b POINT;
  DECLARE LineStringNumPoints INT;
  DECLARE Position, PrevPosition TINYINT;

  SET i = 1;
  SET PrevPosition = 0;
  SET LineStringNumPoints = NumPoints(ls);

  WHILE i <= LineStringNumPoints DO
    SET a = PointN( ls, i);
    IF i = LineStringNumPoints THEN
      SET b = StartPoint( ls);
    ELSE
      SET b = PointN( ls, i+1);
    END IF;

    SET Position = GetPointPositionOfLine( a, b, p);
    IF Position <> 0 THEN
      IF Position <> PrevPosition AND PrevPosition <> 0 THEN
        RETURN FALSE;
      END IF;
      SET PrevPosition = Position;
    END IF;
    SET i = i + 1;
  END WHILE;
  RETURN TRUE;
END //


It checks if a point p lies within a polygon of type LINESTRING.

Usage example. Given you have coordinates of cities in the separate float columns (to keep them readable) lat and lng of city table, and the polygon is given as a string too — but of course you can use a column of type LINESTRING:

SELECT * 
FROM `city` c
WHERE IsPointWithin(GeomFromText('LINESTRING(4 8, 15 16, 23 42)'), POINT(c.`lat`, c.`lng`));

The solution works pretty fast, especially if you first limit the cities by the rectangle that the polygon lies in.

MySQL Proxy

db, development, mysql Comments Off on MySQL Proxy

At times it’s necessary to know which queries the server runs right now.

Of course, you can try this statement:

SHOW PROCESSLIST

but if you need a more poweful tool, try MySQL Proxy.

Although it’s still an alpha version, it’s a handy tool to monitor, filter and manipulate your queries since there is Lua scripting language interpreter supported which is quite obvious in usage.

The MySQL Proxy (as can be understood from its title) can help you to see communication between a few MySQL servers and a few clients.

My usage pattern is that the Proxy listens to 4040 port on my local machine, and if I change the port setting in connection string of any application (my job project or even PhpMyAdmin), I can see the queries logged in a console window.

You can track lots of characteristics of queries — just take a look at example Lua scripts in the package that you’ve downloaded.

Bulk SQL loading

db, development, ideas, mysql Comments Off on Bulk SQL loading

If you want to load a list of SQL files into your database on Windows, you can create a .cmd file with such content and run it.
Place it in the same folder where your .sql files are.

@echo off
SET DATABASE=my_database
SET USER=root
SET PASS=1
SET FORMAT=*.sql

FOR /F "usebackq" %%i IN (`dir /on /b %FORMAT%`) DO echo %%i && mysql --user=%USER% --password=%PASS% --default-character-set=utf8 %DATABASE% < %%i

Make sure, mysql command can be run. If no, either set the full path to mysql command tool, or add the path to PATH environment variable.

Sphinx pre-query not inheriting

db, sphinx Comments Off on Sphinx pre-query not inheriting

Sphinx is a nice search engine and is used at our project hugely.

Although, I found a case when using a pre-fetch query is not obvious. The pre-fetch query is a query that is run before the data is got from database, so thus you can set character encoding, tune database caching, set variables, etc.

So, if we have 2 sources, one is parent for all other sources since it has database connection params, and second one inherits from the parent.

source www {
    type                    = mysql

    sql_user                = root
    sql_pass                = 

    sql_query_pre           = SET NAMES utf8    
}

source www_country : www {
    sql_query = SELECT * FROM country
}

All works fine – when www_country souce is used, the same sql_query_pre will be used because www_country is a child of parent source.

Although, if one day you decide to add a custom pre-query to the child source, it seems you loose the parent’s ones:

...
source www_country : www {
    sql_query_pre = SET @usefulVariable = 1

    sql_query = SELECT * FROM country 
}

In this case it seems you have to duplicate the parental pre-query in every child =(
So the right child is gonna be:

...
source www_country : www {
    sql_query_pre = SET NAMES utf8    
    sql_query_pre = SET @usefulVariable = 1

    sql_query = SELECT * FROM country 
}

Reverse geocoding in your app

db, development, mysql 1 Comment »

Reverse geocoding is a process of getting toponym name (city name) by its coordinates.

To make this, you can use a remote service like I have already described, or implement it yourself.

To do so you need two things:

  1. A database of cities with coordinates (having a population value at least for major cities is a big plus)
  2. A method to find a closest city

#1 is left for you, let’s talk about #2.

In the previous post I introduced the DISTANCE function (SQL) to count a distance between 2 points by coordinates. Let’s use it to find the biggest city closest to a given point:

DELIMITER //
DROP FUNCTION IF EXISTS REVERSE_GEOCODE;//
CREATE FUNCTION REVERSE_GEOCODE( latitude DOUBLE, longitude DOUBLE, radius DOUBLE ) 
    RETURNS DOUBLE READS SQL DATA DETERMINISTIC
    COMMENT 'Finds a city by coordinates given. Returns City ID.'
BEGIN
    DECLARE cityID INT(10) DEFAULT 0;
    DECLARE d DOUBLE DEFAULT 0; /* useless var */
    
    SELECT id, DISTANCE(`lat`, `lng`, latitude, longitude) AS dist
    INTO cityID, d
    FROM `geo_cities`
    WHERE lat BETWEEN FLOOR(latitude - 0.5) 
                  AND CEIL(latitude  + 0.5)
      AND lng BETWEEN FLOOR(longitude- 0.5)
                  AND CEIL(longitude + 0.5)
    HAVING dist < radius
    ORDER BY IF(population > 0, population, 1) * (1/IF(dist > 0, dist, 1)) DESC
    LIMIT 1;
    
    RETURN cityID;
END; //
DELIMITER ;

Well, this function orders cities by (population and distance) function — so that the biggest closest city is returned.

WHERE clause is supposed to shorten the number of cities which are looked through; otherwise we would have to search through whole world’s cities. In my solution the search is limited by a reasonably small rectangle.

In the next post I’ll tell you how to find a biggest city depending on current Google Maps zoom, so that it won’t give you a small city near Paris if you click on France at a world-level map view.

SQL function to count distance between two points

db, development, mysql 4 Comments »

Useful function.

DELIMITER //

DROP FUNCTION IF EXISTS DISTANCE; //

CREATE FUNCTION DISTANCE( lat1 DOUBLE, lon1 DOUBLE, lat2 DOUBLE, lon2 DOUBLE ) 
    RETURNS DOUBLE NO SQL DETERMINISTIC
    COMMENT 'counts distance (km) between 2 points on Earth surface'
BEGIN
    DECLARE dtor DOUBLE DEFAULT 57.295800;

    RETURN (6371 * acos(sin( lat1/dtor) * sin(lat2/dtor) +
        cos(lat1/dtor) * cos(lat2/dtor) * 
        cos(lon2/dtor - lon1/dtor)));
END; //

DELIMITER ;

In next post I will explain how I used this function to implement a reverse geocoding — it’s useful when you need to find a closest city by given coordinates.

Preferrable sorting in MySQL

db, development, mysql Comments Off on Preferrable sorting in MySQL

At times, some records in a recordset are more important than others — e.g. a featured product in a products list. In such a case you can make 2 requests to get the featured item first, then to load other N-1 items, and then display them.

If you want to make it by one move, you can use a feature of preferrable sorting — you can define what value and in which field goes first if such value is present.

For instance, you want to show an item with item_id=15 first:

SELECT * 
FROM items 
ORDER BY item_id IN (15) DESC

Which returns something like (featured item goes first):

+----------+------------+
| item_id  | name       |
+----------+------------+
| 15       | Product 15 |
| 32       | Product 32 |
| 31       | Product 31 |
| 30       | Product 30 |
| 29       | Product 29 |
| ...      | ...        |
+----------+------------+

Convert latin1 to utf8

db, development, mysql Comments Off on Convert latin1 to utf8

If you import unicode text into latin1 database column, the symbols would be screwed up — russian symbols become a shit like “залоговый депозит”.

To convert such quickly (as a test) you can use Lebedev’s convertor.

To convert the whole table do the following (thanks to dull.ru):

  1. The most important step: dump such data in a file with mysqldump
    mysqldump -u user -p --default-character-set=latin1 --skip-set-charset --no-create-info --extended-insert --complete-insert dbname table > dbname.sql

    If not the whole table data is bad, create another table (CREATE TABLE t2 LIKE t1) and copy wrong rows to the new table.

  2. Replace latin1 by utf8 in the file
    This can be done manually in your editor or by this command:

    sed -r 's/latin1/utf8/g' dbname.sql > dbname_utf.sql
  3. Convert your latin1-table to utf8 (and maybe truncate it):
    ALTER TABLE `table` CONVERT TO CHARACTER SET 'utf8';
  4. Import the utf8 data back to the table:
    mysql -u user -p --default-character-set=utf8 dbname < dbname_utf.sql

That’s it.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in