There are situations when you need to separate processing of big amount of data between several “agents”, e.g.:
you have a long list of websites which must be checked for being alive (404 error check) by your web-clawlers;
a queue of photos to be resized or videos to be converted;
articles that your editors must review;
catalogue of blog feeds that your system must import posts from;
The idea to do this is simple:
Give a small piece of big work to an agent.
Mark this piece as given to him (so that none other starts to do the same job) and remember the time stamp when the job was given or when the job becomes obsolete (this agent is dead, let’s give this job to someone else).
If work is done — go to step #1.
After some period of time (1 hour) check all the time stamps, and if some agents didn’t cope with the job, mark the jobs as free so that others could start to work on it.
The problem is between steps #1 and #2 — while you gave a job to Agent 1 and going to mark it as given to him, what if Agent 2 is given by the same job? If you have many Agents, this can happen at real. This situations is called concurrent read/write.
To overcome this a lock can be used.
In this article I wil explain, how to use locks in Zend project with MySQL database.
First of all, MySQL documentation tells that SELECT .. FOR UPDATE can be used for that purpose. First step is to select records by that statement, and second step is to mark them as locked. Requirements are to use InnoDB storage and to frame these two statements in a transaction.
Happily, Zend_Db_Table_Select has a special method forUpdate() that implements SELECT .. FOR UPDATE statement. Zend_Db can cope with transactions as well. Let’s try it!
To lock a record, we need two fields:
one to remember ID of agent that is processing this record (let’s call this column ‘locked_by‘)
one another to know the time when the lock becomes obsolete (let’s call this column ‘expires_at‘)
I wrote a class that inherits from Zend_Db_Table and helps to get records with locking them.
If the table has a composite primary key (containing more than one column), the ActiveRecord approach is used, so the save() method for every record is called, that’s simple (drawback — multiple update queries). Otherwise, if it is a deep-seated table with one ID column as a primary key, then the IDs are collected in a list and all records are updated by a single statement with IN in where clause (which is much faster).
TTL (‘Time to Live‘) — period of time when lock is allowed. In my application the default is one hour. Format of TTL can be seen in MySQL documentation.
And now how to use it.
Let’s imagine you have several editors that divide the big articles list and review them. My model class has a method fetchForUser() that returns no more than 5 articles for current user (by given user ID).
This is an Article table model, inherited from the class above. Usually such classes are located at
Note: if the editor refreses the page, the expres_at fields is refreshed by current time as well.
As for step four of our algorithm (releasing all obsolete locks) — create an action in your backend controller, call your table model releaseLocks() method in it and call that action periodically by Cron.
To boost the performance of the lock releasing, create an index on the expires_at column. (Because of this reason I rejected the ‘locked_since‘ column in favor of ‘expires_at‘)
P.S. In my database date/time columns have DATETIME type. If you use INT to store timestamps, convert it to unix time and back.