Sharing work between agents

db, development, ideas, php No Comments »

There are situations when you need to separate processing of big amount of data between several “agents”, e.g.:

  • you have a long list of websites which must be checked for being alive (404 error check) by your web-clawlers;
  • a queue of photos to be resized or videos to be converted;
  • articles that your editors must review;
  • catalogue of blog feeds that your system must import posts from;
  • etc.

The idea to do this is simple:

  1. Give a small piece of big work to an agent.
  2. Mark this piece as given to him (so that none other starts to do the same job) and remember the time stamp when the job was given or when the job becomes obsolete (this agent is dead, let’s give this job to someone else).
  3. If work is done — go to step #1.
  4. After some period of time (1 hour) check all the time stamps, and if some agents didn’t cope with the job, mark the jobs as free so that others could start to work on it.

The problem is between steps #1 and #2 — while you gave a job to Agent 1 and going to mark it as given to him, what if Agent 2 is given by the same job? If you have many Agents, this  can happen at real. This situations is called concurrent read/write.

To overcome this a lock can be used.

In this article I wil explain, how to use locks in Zend project with MySQL database.

First of all, MySQL documentation tells that SELECT .. FOR UPDATE can be used for that purpose. First step is to select records by that statement, and second step is to mark them as locked. Requirements are to use InnoDB storage and to frame these two statements in a transaction.

Happily, Zend_Db_Table_Select has a special method forUpdate() that implements SELECT .. FOR UPDATE statement. Zend_Db can cope with transactions as well. Let’s try it!

To lock a record, we need two fields:

  1. one to remember ID of agent that is processing this record (let’s call this column ‘locked_by‘)
  2. one another to know the time when the lock becomes obsolete (let’s call this column ‘expires_at‘)

I wrote a  class that inherits from Zend_Db_Table and helps to get records with locking them.

<?php

class Koodix_Db_Table_Lockable extends Zend_Db_Table
{
    protected $_lockedByField = 'locked_by';
    protected $_expiresAtField = 'expires_at';
    protected $_TTL = '1 HOUR'; //time to live for lock

    public function fetchLocked( Zend_Db_Table_Select $select,
        $lockerID ) {

        $db = $this->getAdapter();
        $db->beginTransaction();

        $column = $db->quoteIdentifier( $this->_lockedByField );
        $select->forUpdate()
             ->where("$column=? OR $column IS NULL", $lockerID);

        $data = $this->fetchAll($select);
        if( empty($data) ) return null;

        $expiresAt = new Zend_Db_Expr('DATE_ADD( NOW(),
            INTERVAL ' . $this->_TTL . ')');
        if( sizeof($this->_primary) > 1 ) {
            foreach( $data as $item ) {
                $item->{$this->_lockedByField} = $lockerID;
                $item->{$this->_expiresAtField} = $expiresAt;

                $item->save();
            }
        }
        else {
            $arrIds = array();
            foreach( $data as $item ) {
                $arrIds[] = $item->id;
            }

            $this->update(
                array(
                    $this->_lockedByField => $lockerID,
                    $this->_expiresAtField => $expiresAt,
                ),
                $db->quoteIdentifier(current($this->_primary)) .
                    ' IN ("'.implode('","', $arrIds).'")'
            );
        }

        $db->commit();
        return $data;
    }

    public function releaseLocks( ) {

        $column = $db->quoteIdentifier( $this->_expiresAtField );

        return $this->update(
            array(
                $this->_lockedByField => null,
                $this->_expiresAtField => null,
            ),
            "$column <= NOW()"
        );
        ;
    }
}

If the table has a composite primary key (containing more than one column), the ActiveRecord approach is used, so the save() method for every record is called, that’s simple (drawback — multiple update queries). Otherwise, if it is a deep-seated table with one ID column as a primary key, then the IDs are collected in a list and all records are updated by a single statement with IN in where clause (which is much faster).

TTL (‘Time to Live‘) — period of time when lock is allowed. In my application the default is one hour. Format of TTL can be seen in MySQL documentation.

And now how to use it.

Let’s imagine you have several editors that divide the big articles list and review them. My model class has a method fetchForUser() that returns no more than 5 articles for current user (by given user ID).

This is an Article table model, inherited from the class above. Usually such classes are located at

application/default/models/ArticleTable.php
<?php
class ArticleTable extends Koodix_Db_Table_Lockable
{
    protected $_name = 'article';

    public function fetchForUser( $userId, $count=5 ) {

        $select = $this->select()
            ->where('reviewed = 0')
            ->order('expires_at DESC')
            ->order('date_imported DESC')
            ->limit( $count );

        return $this->fetchLocked($select, $userId);
    }
}

Note: if the editor refreses the page, the expres_at fields is refreshed by current time as well.

As for step four of our algorithm (releasing all obsolete locks) — create an action in your backend controller, call your table model releaseLocks() method in it and call that action periodically by Cron.

To boost the performance of the lock releasing, create an index on the expires_at column. (Because of this reason I rejected the ‘locked_since‘ column in favor of ‘expires_at‘)

P.S. In my database date/time columns have DATETIME type. If you use INT to store timestamps, convert it to unix time and back.

assembla.com – free development tools

development No Comments »

All sources for my development projects I store at assembla.com — very cool service for dvelopers, offering both free and paied services.

assembla

Free users have almost the same services scope just limited by space:

  • projects (in free plan the source is open for anyone), milestones and tickets
  • SVN/Git repository (you can close/refer tickets by commit messages)
  • wiki with several mark-up languages
  • team collaboration tools
  • agile tools

I started from hosting files of my FireFox addons there — SVN and wiki make it perfect choise.

Then I got used to the handy and comfortable interface so much that became a paied user — for xUSSR person it’s worth mentioning :)

And the last but not the least — Assembla team is very open and keeps in touch with their users.

Several sites on single WordPress installation

development, ideas, php, wordpress 3 Comments »

I have a couple of other WordPress blogs on the same server besides this one. One day I realised that all of them have 3 different WP versions and, as result, different admin areas which is not handy. I decided to make them use the same WordPress installation.

Ok, first of all, I deleted wp-admin and wp-includes folders and created new ones as symbolic links. Though the frontend worked well, I couldn’t log in into admin area, because the browser was redirected to that blog which was the base for all the rest for unknown reason.

The investigation shown, that admin area of WordPress is a separate sub-application, thing in itself, and in my case it resolves the absolute path to its source as the path to the base blog. I wanted each blog to use its own folder because there are config file and uploads folder.

It took me some time to find a solution. It requires two steps.

First, I added this line to the top of the .htaccess to make any PHP request to the blog (the blog front-end and the admin area scripts) call the same script before thier start:

#fix for several sites on the same WP installation
php_value auto_prepend_file "/var/www/site_doc_root/prepend.php"

In this code /var/www/ is the root folder for all my sites, and the site_doc_root is the document root of the current site (folder where all its files are located).

OK, the 2nd step — the contents of the prepend.php script. It is easy — it just must define an absolute path constant which is used all around the WordPress:

<?php
define('ABSPATH', dirname(__FILE__).'/');

OK, after that I decided not to use one of the blogs as source for others, but download a fresh copy of WordPress and make it a source of the symbolic links for all my blogs. This helps to update them.

Then I deleted wp-admin and wp-includes folders and some wp-files and recreated them as symlinks. Attention to wp-config.php — don’t delete it, keep it unique for every site!

To make this task easier, I created setup.sh file, pasted the contents I show below, run this command

chmod 755 setup.sh

then I copied it in every site folder and launched there for every site:


ln -s /var/www/wordpress/wp-admin wp-admin
ln -s /var/www/wordpress/wp-includes wp-includes

ln -s /var/www/wordpress/wp-app.php wp-app.php
ln -s /var/www/wordpress/wp-atom.php wp-atom.php
ln -s /var/www/wordpress/wp-blog-header.php wp-blog-header.php
ln -s /var/www/wordpress/wp-comments-post.php wp-comments-post.php
ln -s /var/www/wordpress/wp-commentsrss2.php wp-commentsrss2.php
ln -s /var/www/wordpress/wp-config-sample.php wp-config-sample.php
ln -s /var/www/wordpress/wp-cron.php wp-cron.php
ln -s /var/www/wordpress/wp-feed.php wp-feed.php
ln -s /var/www/wordpress/wp-links-opml.php wp-links-opml.php
ln -s /var/www/wordpress/wp-load.php wp-load.php
ln -s /var/www/wordpress/wp-login.php wp-login.php
ln -s /var/www/wordpress/wp-mail.php wp-mail.php
ln -s /var/www/wordpress/wp-pass.php wp-pass.php
ln -s /var/www/wordpress/wp-rdf.php wp-rdf.php
ln -s /var/www/wordpress/wp-register.php wp-register.php
ln -s /var/www/wordpress/wp-rss.php wp-rss.php
ln -s /var/www/wordpress/wp-rss2.php wp-rss2.php
ln -s /var/www/wordpress/wp-settings.php wp-settings.php
ln -s /var/www/wordpress/wp-trackback.php wp-trackback.php
ln -s /var/www/wordpress/xmlrpc.php xmlrpc.php

That’s not all ;)

I decided to update my WordPress installation every one or two months.

To do that, in the /var/www/ folder (where all my sites reside) let’s create an update script update_wordpress.sh with the following contents:

wget --timestamping http://wordpress.org/latest.zip
unzip -o latest.zip

This will download a fresh copy of the wordpress if it’s changed (though wordpress team doesn’t show the file Last-Modified header, I think one day they will) and unzip it to /var/www/wordpress/ folder which is the source for our symlinks.

Yes, you got it right — launching this script is all I need to update all my blogs.

Let’s make it periodic:

crontab -e

and then add this line to run the update process automatically every 1st day of every month at 9 AM:

0 9 1 * * /var/www/update_wordpress.sh > mail -s "WordPress updated" your@email.com

P.S. Of course, SVN checkout can be used for that purpose :)

Documents in repository

development No Comments »

I rent a server in a data center. Why not to use it also for storing my documents in SVN repo?

Advantages:

  • the access from anywhere — home and work, and forget about flash card.
  • SVN client can show changes even for Microsoft Word documents
Drawbacks:
  • a server must be in place. I already have it, so why not to use it.
  • an SVN client must be installed at every machine where I going to work with the documents. However, if I just need to read it, there is a web interface to download it.

What more can you get from bug tracking?

development No Comments »

Some thoughts about bug tracking.

Low priority bugs

Leave low priority tickets for new comers — such bugs let you show internals of your product doing something useful. Among such issues are all simple kinds of work with text (for example, fixing typos).

Underestimated issues

Find the issues that have required much more time than it was estimated (2-3 times and more) — it can mean, that the developer who worked with it faced something unusual, so he can share the experience gained with the rest of the team.

Bugs on paper

Make sure you (or your developers) don’t keep a list of bugs to fix on the sheet of paper. We do it due to the fear of appearing as non productive developers, right? Just don’t do this. Report them.

Developer ratio

There are plugins to bug tracking systems (I saw such one for Jira) that calculate under/over-estimate ratio for every developer, i.e. how much is the difference between the estimate and actual spent time amout. This ratio is multiplied by all further developer’s estimates, so that his manager can find out how much the real estimate is (most adequate developers have this ratio equal to 1, of course). If you show the ratios to the developers, their estimates could become more accurate.

By the way, do you know that Jira appeared in Star Wars? :)

RSS feed of error log

development, ideas, php No Comments »

Your PHP application logs every error to error log. Do you want to keep track of them in your favourite Feed Reader? Then follow up reading this article!

First of all, you must tell to your application to log errors to a log file. Two parameters must be set: what to log and where to save it. First can be done by error_reporting setting, second — by setting error_log PHP value:

error_reporting( E_ALL & ~E_NOTICE ); //all except notices
ini_set( 'error_log', 'temp/error.log' );

Put it in the beginning of your application, for example, at the top of index.php file.

By the way, you can write something custom in the error log by calling error_log function:

error_log('something');

OK, now, how to make a feed from your error log.

You can do this in a few ways.

First, let’s do it manually.

Manual solution

1. Remove ini_set setting described above. Then create .htaccess file in the top folder of your application and place this line in it:

php_value error_log temp/error.log

It will tell to all scripts of your application to take this setting into consideration without manual calling of ini_set function.

2. Create feed.php file. Place this line inside it:

$contents = file_get_contents( ini_get( 'error_log' ) );
echo nl2br( $contents );

It’s dirty for now, but enough as a check.

3. Make sure that your error log file is not empty. Now you can access your feed.php file via browser and check that it’s showing you the contents of your error log.

4. Now you should create an RSS feed from contents of the file. You can rely on your framework or use a custom solution, e.g. download EasyRSS class and fill the contents of the feed with data from the error log (you would need to think about regexp to parse date and error text from it).

After that you could feed your feed reader with the feed address ;]

Ready-made solution

Download a ready PHP class that will do all it for you — RSS Feed. Yes, it’s simple. You can even protect your feed with a password.

Notes:

  • on the production server change your error_reporting value, for example, to log fatal errors only.
  • log errors in the catch() part of the exception handling mechanism:
    try{ ... }
    catch(Exception $e) {
      error_log( $e->getMessage() );
    }

Quick CSV import with visual mapping

development, php, php classes 75 Comments »

Several years ago I created PHP class Quick_CSV_Import to import CSV files to a database table very quickly (LOAD DATA INFILE statement). I found that the class became quite popular in India due to feedback received :]

Now I want to share a little application on the basis of that class.

Quick CSV Import with Mapping” is a PHP example application that imports CSV file to a database table with visual mapping of CSV columns to table columns.

CSV Mapping Master - click to download

Of course, you can get a copy of the application source – just go to the SVN repository.

Don’t forget to send me your feedback and donations ;]

P.S. If you are having problems with going to step 2 while using the app, try the following:

  1. Open Quick_CSV_import.class and find “LOAD DATA INFILE
  2. change it to “LOAD DATA LOCAL INFILE

It’s connected with permissions at your Linux server. Thanks to Noel for the solution.

Deleting the code: time calculations

development, php No Comments »

You can be (rich and healthy) or (poor and ill).

You can have (readable and laconic code) or (confusing one).

Oh no, I’ve said too much. I haven’t said enough” © REM. The following two examples do the same – so choose any ;]

echo time() + (60 * 60 * 24 * 14); //14 days from now
echo strtotime( '14 day');

Second variant also allows things like ‘-2 years‘.

Want the same in MySQL? Not a problem:

SELECT DATE_ADD( NOW(), INTERVAL 14 DAY )

Deleting the code: Timestamp

development 1 Comment »

In many cases your application must keep track of an entity creation or update time. Do you know, that this can be done automatically?

Yes, I know, there is such feature out of the box in many frameworks (for example, Symfony). Naming convention guaranties that database table’s column named ‘stamp_created‘ will be set by entity creation date, and ’stamp_updated” – by time when any column of the entity has been changed. This done by application server side code (e.g. PHP).

The same can be done by good database table defenition (in MySQL).

  • Create a TIMESTAMP column with CURRENT_TIMESTAMP as default – and that’d be an auto-initializing ‘stamp_created field. 
  • Create a TIMESTAMP column with ON UPDATE CURRENT_TIMESTAMP  directive – and that’d be an autoupdating ‘stamp_updated’ field. 

Other options can be found in MySQL manual.

All this can be done in PHPMyAdmin application by correspondent settings. 

Drawback of such solution: table cannot have both ‘stamp_created’ and ‘stamp_updated’ fields – choose one pill, Neo.

Phonetic won!

development, php, php classes 4 Comments »

winner1.gif align=Yes, my class Phonetic won the first place of the PHPClasses Innovation Award.

Thanks to everyone voted for me!

You can check a demo here.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in