Class to parse email content recursively

emails, parsing, zend No Comments »

I needed to parse emails coming to our system, and it turned out that there is no way to parse it recursively, since Multi-part emails can be nested one into another unlimited number of times.

So here is the class that parses and returns the first part of email, no matter how deep the recursion is. You are welcome!

<?php

/**
 * a service to parse an Multipart/Plain emails
 * supports base64 and quoted-printable formats
 * supports recursively nested parts (the most tricky part)
 * @author Alexander Skakunov
 * @since 2014-04-04
 */
class Service_Mail_Parser {

    /**
     * @param Zend_Mail_Message $message
     * @return string $content
     */
    public function parse(Zend_Mail_Message $message) {
        return $this->_getContent($message);
    }

    /**
     * gets content from the email object
     * @param Zend_Mail_Message $message
     * @return string $content
     * @throws Exception
     */
    protected function _getContent(Zend_Mail_Message $message) {
        if ($message->isMultipart()) {
            $content = $this->_getMultipartContent($message);
        }
        else {
            $content = $message->getContent();
        }

        if (empty($content)) {
            throw new Exception('Content is not parsable');
        }
        return $content;
    }

    /**
     * gets content from the multipart email
     * @param Zend_Mail_Message $message
     * @return string $content
     */
    protected function _getMultipartContent(Zend_Mail_Message $message) {
        $plainPart = $this->_getFirstPlainPart($message);
        if (empty($plainPart)) {
            return;
        }

        switch ($plainPart->contentTransferEncoding) {
            case 'base64':
                return base64_decode($plainPart->getContent());
            case 'quoted-printable':
                return quoted_printable_decode($plainPart->getContent());
        }
    }

    /**
     * recursively (!) gets the first plain content from the multipart email
     * i.e. the one that is not multipart
     * @param Zend_Mail_Message $message
     * @return Zend_Mail_Message $part
     */
    protected function _getFirstPlainPart(Zend_Mail_Part $message) {
        if (!$message->isMultipart()) {
            return $message;
        }
        $part = $message->getPart(1);
        return $this->_getFirstPlainPart($part);       
    }
}

This is how to use it:

$service = new Service_Mail_Parser;
$content = $service->parse($message);

Grab a list of emails from a website with paging

development, ideas No Comments »

A one-liner to grab a list of emails.

wget -q -O - http://server.com/?page={1..42} | grep -ioE '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' | sort -ui > emails.txt

Just replace the page URL and define the start-finish numbers of paging: the {1..42} part means paging from page #1 to page #42 — of course, this is what you should investigate to form a proper final URL.

The sorted results are in emails.txt file.

Yes, there is no phone numbers or first\last names parsed. Fast and easy solution.

MySQL stored procedures debugging

development, mysql No Comments »

It seems there are no built-in ways to debug user-defined functions or procedures in MySQL. So if the function behaves bad, it’s hard to find out why.

Here is what I do in order to trace a function. Yes, it’s a bit ugly, but better than nothing.

1. Run this in your MySQL command line (or PhpMySQL with // as delimiter, there is a field for it below the SQL window).

DELIMITER //

DROP PROCEDURE IF EXISTS Debug; //
CREATE PROCEDURE Debug(Message TEXT)
BEGIN
    CREATE TABLE IF NOT EXISTS _debug (
        `id` int(10) unsigned NOT NULL auto_increment,
        `msg` TEXT DEFAULT NULL,
        `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY  (`id`)
    );
    INSERT INTO _debug(`msg`)  VALUES(Message);
END; //

DROP PROCEDURE IF EXISTS ClearDebugMessages; //
CREATE PROCEDURE ClearDebugMessages()
BEGIN
    TRUNCATE TABLE _debug;
END; //

2. In your function, this can be made:

CALL Debug('Debug message goes here');

3. There gonna be _debug table in your database that contains debug messages with date of creation of each.

4. If you want to clear all the debug messages, make a call:

CALL ClearDebugMessages();

Alternative is to truncate the _debug table with PhpMyAdmin.

MySQL random function

db, development, mysql No Comments »

MySQL RAND() function returns values between 0 and 1. At times you need a random value between A and B, then you can use this function:

DELIMITER //
DROP  FUNCTION IF EXISTS RANDOM;//
CREATE FUNCTION RANDOM(minimum INT, maximum INT)
    RETURNS INT NO SQL NOT DETERMINISTIC
    COMMENT 'integer random value in the bounds given'
RETURN minimum + 
  ROUND(1000000 * RAND() % (maximum-minimum) );//
DELIMITER ;

You can call it like this to get a random integer value between 0 and 100:

SELECT RANDOM(0, 100);

It’s security, man!

anecdote, db, fun, story No Comments »

2 developers are talking:
— Hey, what’s the password for our production database?
— 12354.
— Hmm, why isn’t it just 12345 then?
— Gosh! It’s security, man!

Gosh

How to earn $50 on a bookmark

fun, ideas, story No Comments »

I gonna tell a story how I sold a browser bookmark for $50.

A client contacted me, he wanted a Firefox add-on that would do a simple, but important thing — the client needed to see contact details on a website, and in order to do this he had to click multiple “Show contact details” buttons. These buttons loaded the contact details by an AJAX call.

So he needed a solution to click these multiple buttons on the same page.

Plus the second requirement was to make this feature password protected. Once the password was entered, it’s “cached” for a long time. The client needed that so other people couldn’t reuse his code, and he knew that those users are not tech guys.

So, the budget was $50.

I told him that I can make a solution that would work not only in Firefox, but in any browser. He agreed.

So I created a bookmark and edited the URL.

Maybe you know, you can replace a usual HTTP-link by a javascript code.

So instead of “http://server.com/my/link” you can type “javascript:alert('this is a message');“. Like this:

$50 bookmark

So clicking this browser bookmark shows a JavaScript alert.

That’s what I used. The password was asked by a prompt() function and saved to a cookie. Then every button with a given title was pushed, as a queue.

Client is happy. $50 for a bookmark.

Major problems with geo in MySQL

boo, geo, mysql, postgres No Comments »

Many development projects are started with MySQL on board — it’s free, stable and scalable.

At some point your project might need Geo features (a.k.a. “spatial calculations“): a great example would be to count a distance from where your user is to closest airport, Eiffel tower or hotel.

If you’re at this point, you are in a gray zone. Think twice if you want to stay with MySQL. You can save lots of hours and rude words if you make a right decision now.

Reason is simple: Geo calculations in MySQL are not implemented the way they should. OK, ok, count a distance between 2 points is an easy task and can be solved by a single function, but anything more complex just don’t work. For example, distinguishing if a point is inside a boundary, or calculation of two boundaries overlapping square.

That’s what MySQL website says:

MySQL originally implemented these functions such that they used object bounding rectangles and returned the same result as the corresponding MBR-based functions.

What does it mean? Look at this picture of an area in Egypt called Al Jizah.

Al Jizah

The shape is quite complex, but MySQL cannot process it, so instead of this a Minimal Bounding Rectangle is used — which is exactly the rectangle shape of the picture itself. I cannot find exact words to express how error-prone that is.

You can try to reinvent a wheel and write your own functions (like I did…), but they are very slow — iterating through 4 Kb of points of a polygon takes 1-2 seconds, so if you have hundreds of polygons to compare — say bye-bye to the product performance.

This is fixed only in versions after 5.6.1.

What to do? Use Postgres + special spatial extension called PostGis. It’s super-fast, works in multiple dimensions and does it RIGHT.

Sphinx DocumentID for complex queries

development No Comments »

Every document in Sphinx has to have an unique Document ID. Usually it’s the database table’s primary key value.

But what if your table doesn’t have a primary key or is a very complex query?

In that case we used this approach.
1. Create a table called _sequence with just one integer field called id.
2. Run this query:

DELIMITER //
DROP  FUNCTION IF EXISTS GET_NEW_ID; //
CREATE FUNCTION GET_NEW_ID( ) RETURNS INT READS SQL DATA
BEGIN
    UPDATE _sequence SET id = LAST_INSERT_ID(id+1);
    SET @id = LAST_INSERT_ID();
    IF 0 = @id THEN
        INSERT INTO _sequence VALUES (0); -- fix if there were no data to increment
    END IF;
    RETURN @id;
END; //
DELIMITER ;

3. In your Sphinx queries you can use this function to generate an unique document ID:

SELECT GET_NEW_ID() as id, title, ...

Database choice

complain, db, mysql, postgres No Comments »

If you consider a database engine for the new project between MySQL and Postgres, choose Postgres.

Reasons are simple.

1. Postgres has stricter datatypes. If a field is integer, you cannot assign a string value to it. In the long run it makes the database data less error-prone. Frameworks+ORMs will make the code transparent anyway.

2. It supports JSON data type.

3. Postgres supports Geo calculations, why MySQL has a very limited support. Really. Many services and websites need geo calculations to count distances, proximity, etc. MySQL supports the most simple operations, but any sophisticated geo math is just not implemented in MySQL.

Ant build script to optimize images

ant, development, phpunit No Comments »

My PHP-projects are built with help of Apache Ant tool. It just runs a set of predefined applications for many different reasons you define: to make a smoke test of your code, to prepare files for commit, grab code stats, etc.

So, Ant without params runs a default ruleset, which can be a simple smoke test: PHP syntax check of modified files + PHPUnit tests.

One more reason I use it is the recommendation from Google PageSpeed Insights to optimize the images — exclude unnecessary technical information (EXIF) from all your JPGs and PNGs pictures.

Here is how I did it.

First, they recommend to install optipng to optimize the PNG files and the jpegoptim for JPEG (look in “Tools and parameter tuning” section).

So, here is my Ant script:

<?xml version="1.0" encoding="UTF-8"?>
<!-- http://ant.apache.org/manual/using.html -->

<project name="yasno.tv" default="build" basedir=".">
  <target name="build" depends="..., optimg" />
  ...

  <target name="optimg" description="Optimizes images in public folder">
    <apply executable="optipng" failonerror="true" description="Optimizes PNG files">
      <arg value="-o7" />
      <fileset dir="${basedir}/public/img/">
        <include name="**/*.png" />
        <modified>
          <param name="cache.cachefile" value="${basedir}/etc/build/cache.properties"/>
        </modified>
      </fileset>  
    </apply>

    <apply executable="jpegoptim" failonerror="true" description="Optimizes JPG files">
      <arg value="--strip-all" />
      <fileset dir="${basedir}/public/img/">
        <include name="**/*.jpg" />
        <modified>
          <param name="cache.cachefile" value="${basedir}/etc/build/cache.properties"/>
        </modified>
      </fileset>  
    </apply>
  </target>
    
</project>

It optimizes all image files from public/img folder.

The etc/build/cache.properties is the file that contains a file name and the last modification time, so the script optimizes only the changed files.

As result, the image files usually loose 10-20% of their size.

The command to do is:

ant optimg

or just

ant

if you want to run the whole ruleset.

Specifying CSS class of a Zend_Navigation li element

delete your code, development, php, zend No Comments »

Zend Navigation is a really nice tool to handle menus.

You just specify an array of your menu items:

$pages = array(
    array(
        'label'      => 'Privacy Statement',
        'controller' => 'terms',
        'action'     => 'privacy-statement',
        'class'      => 'firstNav',
    ),
    array(
        'label'      => 'General Terms of Use',
        'controller' => 'terms',
        'action'     => 'general-terms-of-use',
    ),
);

and initialize the navigation object with it — and it works:

$container = new Zend_Navigation($pages);
$this->view->navigation($container)->menu()
    ->setUlClass('nav')
    ->setActiveClass('active');

Result is the following:

<ul class="nav">
    <li class="active">
        <a class="firstNav" href="/terms/privacy-statement">Privacy Statement</a>
    </li>
    <li>
        <a href="/terms/general-terms-of-use">General Terms of Use</a>
    </li>
</ul>

There are options to specify the CSS class of the whole UL tag. If you specify CSS of a menu item (“firstNav” in my example), it’s added to to the A tag, not the LI tag as required by sliced design I have.

Googling shows that people are trying to work-around that by jQuery fixes.

It seems there is a proper way to solve this; you just need to add this option:

$this->view->navigation()->menu()->addPageClassToLi(true);

Enjoy!

Google Authorship: how to setup properly

google, SEO, site No Comments »

It’s seems Google tries to find a silver bullet to split high quality content from shit. The way they chose is to distinguish an author of this or that article. This fact means the text is better than an anonymous staff.

The new way to distinguish the authorship is called Google Authorship. Simply put, you can tell Google “Hey, I am the author of this content”.

Google Authorship example

There are 2 roles of “authors”: authors and publishers. A site usually usually represents a single publisher, but can have multiple authors. For example, New York Times or developers.org.ua is a publisher. Every article on those resources has its own author.

Set up

Both roles are setup via Gooogle+ profile. It’s a must, otherwise nothing works.

There are two and half ways to tell to Google who you are.

Way #1. Email on the same domain

Prove your relation to the website via an email address on the same domain. You can read Google guide on it first.

Way #2.1. A HYPER-link to Google+ profile

Add a usual hyperlink to your Google+ page with “?rel=author” at the end of the URL. Disadvantage is that you need to have a hyper link to an externtal resource which is not always a good idea.

Way #2.2. A link to Google+ profile

Just add a link, not “hyper”, just add this tag anywhere on the page:

<link href="http://plus.google.com/...." rel="author" />>

Test

You can test if your setup worked well with help of Google Structured Data Testing Tool.

Pay attention, that it takes a few days/weeks for that to appear in live search results, and Google says it’s optional. They don’t promise you see your face there, put many people have it worked.

What else can I do?

1. It seems Google shows your pic only if you some kind of a known specialist of the topic, but it worked without that for many other people.

2. Also they advise to claim your authorship only for guest articles, not for articles of your own blog.

3. Another thing to pay attention to is how many circles your Google+ profile is in. The more popular you are the better.

Good luck with it!

Another dozen of things Assembla could have fixed

assembla, complain No Comments »

Often Assembla makes me remember these wise words — “any fool can make things bigger and more complex” (I started this topic in a previous article “10 reasons not to use Assembla” where I even got reply from their guy, but seems things don’t change fast).

In general project is one of the best on the market, but the devil is in details, and those details just make me crazy at times.

The project is not stuck, development is active, but it seems that the focus of the Assembla development is 2 things:

  • where to put a sidebar of ticket details page — left or right? (it jumps every 6 months)
  • tiny CSS changes of menu and buttons.

All in all, the project is a decade mature, but still feels amateur at times. No offense.

Here is a list of things that irritate me the most.

Assembla spaces fail #1: Inactive open-source project

My piece of code called “CSV import with visual mapping” is quite popular — people download it (from Assembla repository), ask questions, request support. I don’t develop new features, people just use it as it is.

At the same time, Assembla thinks, that due to the fact that I didn’t visit the space and don’t commit new code, space is not used. They marked it as inactive. There is now way to reactivate it apart from either buy a credit (no, thanks, it’s open source and you allow free spaces) or spend some time to recreate a new free space, recommit the code and update the links (that’s what I gonna do when they finally kill the old one).

Assembla spaces fail #2: cannot remove a space

Wanna kill a space? Maybe you want to switch to a cheaper plan this way? All you get clicking a “Delete this space” button is a “Space was successfully queued to be deleted” message and the space remaining forever.

Tickets fail #1: Parent story is NOT a parent ticket

This feature announced not so much time ago made me remember those words regarding making things bigger and more complex.

You have to choose either relations between tickets are “parent — child” or “parent story — child story”. A bit confusing, ha? Stories are made for Agile guys, the other option is for the rest.

To make it easier to distinguish a difference, child stories have a special icon =)

Behavior is different too — closing a parent story will also silently close all children tickets.

Having stories doesn’t help since…

…you cannot see tickets hierarchy (tickets fail #2)

Even if you spent some time defining tickets relations, it not possible to make a tickets filter sort stories so that their children are still shown under the parent story. All tickets are equal!

assembla parent story

Your filter becomes a mess of parents and different children. Agile.

Tickets fail #3: the details editor

Make a list (numbered or not) out of a few lines, make a hyperlink out of a URL — just not possible. You get a template of a hyperlink and you are supposed to edit it in order to form the proper hyperlink. These things work for decades on other sites — making a hyperlink in Gmail is just sexy.

Take me right. It’s kind of ok if it doesn’t work in a free plug-in, but the tickets are the heart of Assembla as project management tool, these guys charge money for it — and it simply doesn’t work.

Tickets fail #4: the filter totals – NO WAY

You have a filter of tickets. Do you want to quickly find out the total estimated / spent time? Open tickets one by one and use your calculator.

Wiki fail #1: Editor

WISWIG editor adds a lot of crappy tags. Many times I experienced that after changes are saved (something complex like a big table with hyperlinks and new lines) what you get is a mess.

No preview button is available (anymore), but there is a half of the screen of Wiki format reference text, so you cannot just go down to the page bottom and click Save — scroll carefully not to miss it!

Wiki fail #2: Never change the format!

OK, you realized that in Assembla world what you see is not what you get. Yes, they allow to change the format of Wiki pages from WISWIG to something more reliable like Textile or Markdown, but get ready — it applies to existing pages as well!

Yes, your pages are not readable anymore after that.

Subversion fail: No care about old customers

If you are stuck with Assembla and old-school SVN+Trak repository (like we do), you cannot add a new nice Subversion repository — you are supposed to kill the old one first. You cannot do that without having your code gone.

One work around is to stop development for a few hours, export the old repository to a file, cross the fingers, kill the old one, add a new Subversion feature, then import the repo file.

Another way is to start with a new fresh space, although, might be not an option for those who has a simple plan.

Time feature fail #1: Date filter

Filter by one date (starts and ends the same day), and as result you’ll see time tracked for a day before that. Very “useful” when you half way finishing to make a report in a spreadsheet and realize that dates are wrong.

So, to check what you did today so far, you have to set dates in the future (so called “tomorrow”). If you set the dates wrong, you get this self-explanatory error message:

assembla wrong date message

Time feature fail #1: edit a time entry

Filter your time entries, then edit or delete a time entry — and voila, your filter is lost! Nice how this application “cares” about your time.

The main question I would ask Assembla dev team — do you guys use your own tool for the project management?

Domain Driven Design as billiards game

DDD, development, fun No Comments »

I am becoming a fan of Domain Driven Design (DDD).

Here is a fun idea how DDD could look if you come to a pub to play a billiards game.

You are asked which game exactly you are going to play; so you say “pool”. As result you get a pool table and a pool rules agreement. This is an abstract factory.

The balls are numbered and colored. Black ball has a custom behavior. Since they all have an identity, they are entities.

As for the billiard cue — it’s a bit trickier. If the cues have different size, and you prefer to play by your own one, it’s an entity. Although, if you are drunk and don’t care about the stick you are playing with, it’s a value object.

A small pieces of chalk is a value object — any one works. The adjustment triangle is a value object too.

The rules of pool is a domain service.

The kick, the chalk break, the balls setup — all these form a infrastructure service.

A wall shelf for the balls is a repository.

Using non-alphanumeric characters in Sitemap URLs

development, encoding, google, links, php No Comments »

This article in Google Help explains how to deal with special characters in Sitemaps that you can submit to Webmaster tools in order to increase the number of indexed pages of your website.

The main point is: the URLs must contain ASCII symbols only.

It can be done this way:

  • (obvious) ampersand, both quotes and <> symbols must be encoded,
  • Unicode symbols must be encoded, eg. ü must be converted to %FC sequence,
  • URLs that you submit must follow the  RFC-3986

If you use PHP, pay attention to one thing: it seems rawurlencode should be used instead of the usual urlencode since it’s follows the RFC-3986 as stated in PHP documentation.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in