Grab a list of emails from a website with paging

development, ideas No Comments »

A one-liner to grab a list of emails.

wget -q -O -{1..42} | grep -ioE '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' | sort -ui > emails.txt

Just replace the page URL and define the start-finish numbers of paging: the {1..42} part means paging from page #1 to page #42 — of course, this is what you should investigate to form a proper final URL.

The sorted results are in emails.txt file.

Yes, there is no phone numbers or first\last names parsed. Fast and easy solution.

It’s security, man!

anecdote, db, fun, story No Comments »

2 developers are talking:
— Hey, what’s the password for our production database?
— 12354.
— Hmm, why isn’t it just 12345 then?
— Gosh! It’s security, man!


How to earn $50 on a bookmark

fun, ideas, story No Comments »

I gonna tell a story how I sold a browser bookmark for $50.

A client contacted me, he wanted a Firefox add-on that would do a simple, but important thing — the client needed to see contact details on a website, and in order to do this he had to click multiple “Show contact details” buttons. These buttons loaded the contact details by an AJAX call.

So he needed a solution to click these multiple buttons on the same page.

Plus the second requirement was to make this feature password protected. Once the password was entered, it’s “cached” for a long time. The client needed that so other people couldn’t reuse his code, and he knew that those users are not tech guys.

So, the budget was $50.

I told him that I can make a solution that would work not only in Firefox, but in any browser. He agreed.

So I created a bookmark and edited the URL.

Maybe you know, you can replace a usual HTTP-link by a javascript code.

So instead of “” you can type “javascript:alert('this is a message');“. Like this:

$50 bookmark

So clicking this browser bookmark shows a JavaScript alert.

That’s what I used. The password was asked by a prompt() function and saved to a cookie. Then every button with a given title was pushed, as a queue.

Client is happy. $50 for a bookmark.

Major problems with geo in MySQL

boo, geo, mysql, postgres No Comments »

Many development projects are started with MySQL on board — it’s free, stable and scalable.

At some point your project might need Geo features (a.k.a. “spatial calculations“): a great example would be to count a distance from where your user is to closest airport, Eiffel tower or hotel.

If you’re at this point, you are in a gray zone. Think twice if you want to stay with MySQL. You can save lots of hours and rude words if you make a right decision now.

Reason is simple: Geo calculations in MySQL are not implemented the way they should. OK, ok, count a distance between 2 points is an easy task and can be solved by a single function, but anything more complex just don’t work. For example, distinguishing if a point is inside a boundary, or calculation of two boundaries overlapping square.

That’s what MySQL website says:

MySQL originally implemented these functions such that they used object bounding rectangles and returned the same result as the corresponding MBR-based functions.

What does it mean? Look at this picture of an area in Egypt called Al Jizah.

Al Jizah

The shape is quite complex, but MySQL cannot process it, so instead of this a Minimal Bounding Rectangle is used — which is exactly the rectangle shape of the picture itself. I cannot find exact words to express how error-prone that is.

You can try to reinvent a wheel and write your own functions (like I did…), but they are very slow — iterating through 4 Kb of points of a polygon takes 1-2 seconds, so if you have hundreds of polygons to compare — say bye-bye to the product performance.

This is fixed only in versions after 5.6.1.

What to do? Use Postgres + special spatial extension called PostGis. It’s super-fast, works in multiple dimensions and does it RIGHT.

Specifying CSS class of a Zend_Navigation li element

delete your code, development, php, zend No Comments »

Zend Navigation is a really nice tool to handle menus.

You just specify an array of your menu items:

$pages = array(
        'label'      => 'Privacy Statement',
        'controller' => 'terms',
        'action'     => 'privacy-statement',
        'class'      => 'firstNav',
        'label'      => 'General Terms of Use',
        'controller' => 'terms',
        'action'     => 'general-terms-of-use',

and initialize the navigation object with it — and it works:

$container = new Zend_Navigation($pages);

Result is the following:

<ul class="nav">
    <li class="active">
        <a class="firstNav" href="/terms/privacy-statement">Privacy Statement</a>
        <a href="/terms/general-terms-of-use">General Terms of Use</a>

There are options to specify the CSS class of the whole UL tag. If you specify CSS of a menu item (“firstNav” in my example), it’s added to to the A tag, not the LI tag as required by sliced design I have.

Googling shows that people are trying to work-around that by jQuery fixes.

It seems there is a proper way to solve this; you just need to add this option:



Google Authorship: how to setup properly

google, SEO, site No Comments »

It’s seems Google tries to find a silver bullet to split high quality content from shit. The way they chose is to distinguish an author of this or that article. This fact means the text is better than an anonymous staff.

The new way to distinguish the authorship is called Google Authorship. Simply put, you can tell Google “Hey, I am the author of this content”.

Google Authorship example

There are 2 roles of “authors”: authors and publishers. A site usually usually represents a single publisher, but can have multiple authors. For example, New York Times or is a publisher. Every article on those resources has its own author.

Set up

Both roles are setup via Gooogle+ profile. It’s a must, otherwise nothing works.

There are two and half ways to tell to Google who you are.

Way #1. Email on the same domain

Prove your relation to the website via an email address on the same domain. You can read Google guide on it first.

Way #2.1. A HYPER-link to Google+ profile

Add a usual hyperlink to your Google+ page with “?rel=author” at the end of the URL. Disadvantage is that you need to have a hyper link to an externtal resource which is not always a good idea.

Way #2.2. A link to Google+ profile

Just add a link, not “hyper”, just add this tag anywhere on the page:

<link href="" rel="author" />>


You can test if your setup worked well with help of Google Structured Data Testing Tool.

Pay attention, that it takes a few days/weeks for that to appear in live search results, and Google says it’s optional. They don’t promise you see your face there, put many people have it worked.

What else can I do?

1. It seems Google shows your pic only if you some kind of a known specialist of the topic, but it worked without that for many other people.

2. Also they advise to claim your authorship only for guest articles, not for articles of your own blog.

3. Another thing to pay attention to is how many circles your Google+ profile is in. The more popular you are the better.

Good luck with it!

Another dozen of things Assembla could have fixed

assembla, complain No Comments »

Often Assembla makes me remember these wise words — “any fool can make things bigger and more complex” (I started this topic in a previous article “10 reasons not to use Assembla” where I even got reply from their guy, but seems things don’t change fast).

In general project is one of the best on the market, but the devil is in details, and those details just make me crazy at times.

The project is not stuck, development is active, but it seems that the focus of the Assembla development is 2 things:

  • where to put a sidebar of ticket details page — left or right? (it jumps every 6 months)
  • tiny CSS changes of menu and buttons.

All in all, the project is a decade mature, but still feels amateur at times. No offense.

Here is a list of things that irritate me the most.

Assembla spaces fail #1: Inactive open-source project

My piece of code called “CSV import with visual mapping” is quite popular — people download it (from Assembla repository), ask questions, request support. I don’t develop new features, people just use it as it is.

At the same time, Assembla thinks, that due to the fact that I didn’t visit the space and don’t commit new code, space is not used. They marked it as inactive. There is now way to reactivate it apart from either buy a credit (no, thanks, it’s open source and you allow free spaces) or spend some time to recreate a new free space, recommit the code and update the links (that’s what I gonna do when they finally kill the old one).

Assembla spaces fail #2: cannot remove a space

Wanna kill a space? Maybe you want to switch to a cheaper plan this way? All you get clicking a “Delete this space” button is a “Space was successfully queued to be deleted” message and the space remaining forever.

Tickets fail #1: Parent story is NOT a parent ticket

This feature announced not so much time ago made me remember those words regarding making things bigger and more complex.

You have to choose either relations between tickets are “parent — child” or “parent story — child story”. A bit confusing, ha? Stories are made for Agile guys, the other option is for the rest.

To make it easier to distinguish a difference, child stories have a special icon =)

Behavior is different too — closing a parent story will also silently close all children tickets.

Having stories doesn’t help since…

…you cannot see tickets hierarchy (tickets fail #2)

Even if you spent some time defining tickets relations, it not possible to make a tickets filter sort stories so that their children are still shown under the parent story. All tickets are equal!

assembla parent story

Your filter becomes a mess of parents and different children. Agile.

Tickets fail #3: the details editor

Make a list (numbered or not) out of a few lines, make a hyperlink out of a URL — just not possible. You get a template of a hyperlink and you are supposed to edit it in order to form the proper hyperlink. These things work for decades on other sites — making a hyperlink in Gmail is just sexy.

Take me right. It’s kind of ok if it doesn’t work in a free plug-in, but the tickets are the heart of Assembla as project management tool, these guys charge money for it — and it simply doesn’t work.

Tickets fail #4: the filter totals – NO WAY

You have a filter of tickets. Do you want to quickly find out the total estimated / spent time? Open tickets one by one and use your calculator.

Wiki fail #1: Editor

WISWIG editor adds a lot of crappy tags. Many times I experienced that after changes are saved (something complex like a big table with hyperlinks and new lines) what you get is a mess.

No preview button is available (anymore), but there is a half of the screen of Wiki format reference text, so you cannot just go down to the page bottom and click Save — scroll carefully not to miss it!

Wiki fail #2: Never change the format!

OK, you realized that in Assembla world what you see is not what you get. Yes, they allow to change the format of Wiki pages from WISWIG to something more reliable like Textile or Markdown, but get ready — it applies to existing pages as well!

Yes, your pages are not readable anymore after that.

Subversion fail: No care about old customers

If you are stuck with Assembla and old-school SVN+Trak repository (like we do), you cannot add a new nice Subversion repository — you are supposed to kill the old one first. You cannot do that without having your code gone.

One work around is to stop development for a few hours, export the old repository to a file, cross the fingers, kill the old one, add a new Subversion feature, then import the repo file.

Another way is to start with a new fresh space, although, might be not an option for those who has a simple plan.

Time feature fail #1: Date filter

Filter by one date (starts and ends the same day), and as result you’ll see time tracked for a day before that. Very “useful” when you half way finishing to make a report in a spreadsheet and realize that dates are wrong.

So, to check what you did today so far, you have to set dates in the future (so called “tomorrow”). If you set the dates wrong, you get this self-explanatory error message:

assembla wrong date message

Time feature fail #1: edit a time entry

Filter your time entries, then edit or delete a time entry — and voila, your filter is lost! Nice how this application “cares” about your time.

The main question I would ask Assembla dev team — do you guys use your own tool for the project management?

Domain Driven Design as billiards game

DDD, development, fun No Comments »

I am becoming a fan of Domain Driven Design (DDD).

Here is a fun idea how DDD could look if you come to a pub to play a billiards game.

You are asked which game exactly you are going to play; so you say “pool”. As result you get a pool table and a pool rules agreement. This is an abstract factory.

The balls are numbered and colored. Black ball has a custom behavior. Since they all have an identity, they are entities.

As for the billiard cue — it’s a bit trickier. If the cues have different size, and you prefer to play by your own one, it’s an entity. Although, if you are drunk and don’t care about the stick you are playing with, it’s a value object.

A small pieces of chalk is a value object — any one works. The adjustment triangle is a value object too.

The rules of pool is a domain service.

The kick, the chalk break, the balls setup — all these form a infrastructure service.

A wall shelf for the balls is a repository.

Using non-alphanumeric characters in Sitemap URLs

development, encoding, google, links, php No Comments »

This article in Google Help explains how to deal with special characters in Sitemaps that you can submit to Webmaster tools in order to increase the number of indexed pages of your website.

The main point is: the URLs must contain ASCII symbols only.

It can be done this way:

  • (obvious) ampersand, both quotes and <> symbols must be encoded,
  • Unicode symbols must be encoded, eg. ü must be converted to %FC sequence,
  • URLs that you submit must follow the  RFC-3986

If you use PHP, pay attention to one thing: it seems rawurlencode should be used instead of the usual urlencode since it’s follows the RFC-3986 as stated in PHP documentation.

Install Sphinx on Mac

development, mac, server, sphinx No Comments »

To install Sphinx search on Mac, I had to find pieces of information here and there.

Here is my list how to sort it out:

1. Download a fresh stable source from Sphinx website to your /tmp folder. It’s a tar.gz file.

2. Go to /tmp folder and run this command to untar the arhieve:

tar -zxvf sphinx-2.0.6-release.tar.gz

Fix the file name since the version can change.

3. Go into the “sphinx-2.0.6-release” folder.

4. Fix this command by my comments below and then run:

./configure --enable-id64  --prefix=/usr/local --with-mysql=/usr/local/mysql-5.1.63-osx10.6-x86/ LDFLAGS="-arch i386"
  • --enable-id64 mean you want to support really long integers as document IDs, depends on your application needs
  • --with-mysql – fix the path to the file

5. Run this command:

sudo make

Check the response, there must be no errors. Mine ends like this:

...[scary looking C commands]...
Making all in test
make[1]: Nothing to be done for `all'.
Making all in doc
All docs are already pre-built by developer.
If you want to rebuild them, install docbook-xsl
and xsltproc and then run 'make docs' instead of simple 'make'.
make[1]: Nothing to be done for `all-am'.

6. Run this:

sudo make install

7. Test if all is OK. If you run this, you must get a list of parameters of this tool:


Good luck!

Fix “Danish to go” player position

development No Comments »

You can always have the player visible in 3 easy steps.

1. Create a bookmark of current page in your browser (Google Chrome in my example, but it doesn’t really matter).
2. Right-click on the bookmark and choose “Edit…” menu item.
3. Replace the bookmark URL with this snippet of code and click Save:

javascript: $('#playerViewer').css('position', 'fixed').css('top', '5px').css('right', '100px');

Looks like this:

Edit the bookmark

So now, when you start to listen to audio on the Danish to go page, click the bookmark, and the player will be shown in top right position of your browser, no matter how fast you scroll. Magic ;]

PHPUnit: difference between assertEquals and assertSame

phpunit, TDD No Comments »

It turned out to be easy:

  • assertEquals — check by value
  • assertSame — check by reference

So here is an example:

  $objectA = new StdClass;
  $objectB = new StdClass;
  $objectA->value = 42;
  $objectB->value = 42;
  $container = array($objectA);

  $this->assertEquals($objectA, $objectB); //true
  $this->assertSame($objectA, $objectB); //false

  $this->assertEquals($objectA, $container[0]); //true
  $this->assertSame($objectA, $container[0]); //true

Problems with download in InternetExplorer

development, php No Comments »

Don’t forget to include this simple header, otherwise Internet Explorer users over SSL will get a weird error message that the download cannot be done:

header('Pragma: public');

It’s better to make a filename nice and simple, avoid slashes in it and don’t quote it — again in respect to IE users.

The best way to go is just copy the list of headers to implement the download of a fil from PHP documentation page.

Delete Your Code #8: Use the default values

db, delete your code, development, mysql No Comments »

Default values — how trivial this hint might seem…

Although, simple things make life easier. If you have a huge amount of SQL updates in your system, having correct default values helps to build smaller queries.

Also, when playing with data in your PhpMyAdmin, you can afford to focus on what is really important, not on repeating the boring staff. Every time when you have a big table and need to paste a couple of new records manually, you or the pear developer will be thankful that the majority the fields can be just ommited in order to have the proper and meaningful value.

It can be zero for numeric types, current date and time value for timestamp, a most frequent and meaningful value for the ENUM field or a “magic constant” of default value if you have to use one in your application.

Email address RFC

development, fun No Comments »

It’s was entertaining to dig into email address format while working on a corresponding task.

Here are a few facts that I didn’t expect to be allowed for email address:

  • the local part of an email address can contain spaces, and it must be quoted and escaped by a back slash like “\ “
  • the local part of an email address can contain comments! It’s put in parentheses and can be omitted. Example: “john(comment)” equals to “
  • domain part can have IP instead of domain. To do that, it must be put in square braces like “john@[]“

Here are examples of VALID email addresses:

  • '@[]
  • user@[IPv6:2001:db8:1ff::a0b:dbd0]
  • "much.more\ unusual"
  • ""
  • "very.(),:;<>[]\".VERY.\"very@\\\ \"very\".unusual"
  • 0@a
  • !#$%&'*+-/=?^_`{}|
  • "()<>[]:;@,\\\"!#$%&'*+-/=?^_`{}|\ \ \ \ \ ~\ \ \ \ \ \ \ ?\ \ \ \ \ \ \ \ \ \ \ \ ^_`{}|~.a"
  • ""
WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in