Friday, September 30, 2011

Annoying select.error

I needed to move from Gevent back to wsgiref. The change was quite minimal thanks to WSGI abstraction.

But to my surprise, and much dismay, the server kept on crashing with select.error on CentOS 5.6. I believe it also exhibits this behavior on other POSIX compliant OSes. Some searching brought me to an issue filed back in February 2010.

Among the messages, Antoine Pitrou said select was wrapped in a try/finally block in Python 2.6 presumably to support signals and interrupted system calls. Why, then, do I still see interrupted syscall error?

In the end, I had to put a try/except clause similar to what was suggested in the second message (copied from Twisted) around serve_forever. This code looks quite funny with while True and serve_forever. You get what I mean?

Thursday, September 29, 2011

My first PyPy try-out

My first PyPy try-out was not so smooth.

I have a CentOS 5.6 64-bit server. Of course the OS comes with Python 64-bit. I, however, wanted to run some 32-bit modules so I compiled Python 2.6 32-bit on my own.

I grabbed PyPy 1.6 source and tried to translate the 32-bit Python installation. After about 12 hours, I had to hold Ctrl and type C to kill the process. But it wasn't terminated! A prejudice kill -9 was required.

The source installation was a no-go for me so I tried with a pre-built package. It didn't work out of the box on CentOS due to different library naming. Things such as libssl, libcrypto, libbz2 needed new symlinks. With those shiny new symlinks, pypy ran breezily through simple interactive tests.

Then I tried running Hsalf with PyPy. And it crashed! A new bug was filed  My initial investigation showed that PyPy missed out an opcode in its translation process. This opcode is used in Python Imaging Library to bridge between its Python module and its C extension.

Now, I'm waiting for a new PyPy build with that opcode plugged. A cursory look at PyPy source code shows many promises. I surely hope it delivers.

Wednesday, September 28, 2011

Intel AMT KVM and free viewer

Intel latest chips (most of Core i5 and Core i7 chips) support remote KVM functionality based on VNC protocol. Basically, the chip runs an embedded VNC server that can talk out-of-band over optional TLS layer with a supported VNC client.

The problem is supported VNC clients are usually costly. The one that is often recommended for use is RealVNC Viewer Plus. That costs $100 a license.

Intel, however, provides one free viewer in their AMT Software Development Kit. It is the Integrated Viewer Application sample.

Tuesday, September 27, 2011

Efika MX and H264 video tag

Just a quick post to say that I have managed to get Chromium to play HTML5 video tag with H264 source.

The trick is to install extra codecs for Chromium.

apt-get install chromium-codecs-ffmpeg-extra

If you do that, it will ask for your permission to remove the chromium-codecs-ffmpeg package. It's okay to do so. That's all, folks.

Monday, September 26, 2011

Extracting Screen Video with Hsalf

Hsalf repository has a new utility script called You can use this script to extract Screen Video frames to separate image files or to dump raw RGB values to stdout.

First, you need to list out all Screen Video streams: -i screen.swf
Found Screen Video stream ID 2

Then, extract that stream (2): -i screen.swf -s 2 -o output_name

By default, the output format is PNG.

You will have many files whose names are output_name concatenated with number and a proper extesion (.png) such as output_name00001.png.

You can change the output format with -f switch. Use jpg if you want to write JPEG files, or rgb if you want to dump raw RGB data. -i screen.swf -s 2 -o output_name -f jpg

If the output file name is a lone hyphen (-), stdout will be used. This is useful if you want to pipe the pictures to video encoders such as FFmpeg or MEncoder. -i screen.swf -s 2 -o - -f rgb | ffmpeg -r 24 -s 640x480 -f rawvideo -pix_fmt rgb24 -i - screen.mp4

The above command dumps Screen Video frames to raw RGB data and sends the output through a pipe to FFmpeg to produce a MPEG-4 video file.

Friday, September 23, 2011

Light is no longer impassable

Scientists at CERN have found that the neutrino particles could travel faster than light in vacuum. This breaks one of the pillars in modern physics!

This is serious. Too serious. We will have to wait for confirmation from other scientists.

Wednesday, September 21, 2011

Anti-virus and Windows 8

As anyone could have guessed it earlier, Windows 8 promises to include Microsoft Security Essentials out of the box. This is a major step forward in the right direction for Microsoft because Windows is still the most popular target for viruses. This move certainly will help curb most malicious codes from infecting Joe's PC.

Also, it raises the bar for other AV vendors. The question is can your products beat something free, something usable, something common, something built-in, and enabled by default? I reckon some AV vendors will have to move up the ladder and leave this ground to Microsoft. Maybe they'll shift their focus on "managing"/cooperating with Security Essentials instead of competing with it.

We shall see.

Tuesday, September 20, 2011

Fansipan & Python 3

Some inconsiderate soul pasted the damn sticker on this landmark. And some inconsiderate soul put this piece of metal object in this peaceful nature. Could we not use some wood, or rock? Duh!

Python rocks, though.

Monday, September 19, 2011

Extracting MP3 from SWF with Hsalf

With revision 13a5495aa50b, Hsalf provides a tool to extract MP3 from SWF.

The gist of this tool is these few simple lines:
# open swf file
fi = swf.SwfFile(inp)
# open output file
fo = open(outp, 'wb')
# iterate through all tags
for tag in fi.iter_body():
  # is this a SoundStreamBlock?
  if isinstance(tag, swf.SoundStreamBlockTag):
    data = StringIO(tag.sound_data)
    mp3 = swf.Mp3StreamSoundData().deserialize(data)
    data = mp3.sound_data.frames
# done, close all files
Basically, we find all SoundStreamBlock tags. These tags hold data of embedded sound streams. With each tag, we take the data, strip off Flash-related headers, and write out pure MP3 frames to output file.

That's quite simple, yea?

Friday, September 16, 2011

Is Flash dying?

With Windows 8 going as HTML5-only as possible, will Flash be losing more grounds?

So, Microsoft cannot compete against Adobe with Silverlight, they make a right move by removing both Flash and Silverlight, opting for HTML5.

Luckily, I've been working on Hsalf to hopefully help me make the transition from Flash to HTML5 less troublesome. I already have some ideas how to play video with plain JavaScript. Maybe one day I'll make a demo.

Thursday, September 15, 2011

Game of Life Contest

The Python for Vietnamese Group is holding a contest taking Game of Life as its theme.

The best performing code will walk away with a prize equivalent to SGD 250!

Come join the fun!

Wednesday, September 14, 2011


Thật là buồn khi đọc

Đã đành rằng báo chí có thể thổi phồng, hoặc dìm ép hay nói chung là bóp méo sự thật, nhưng sự việc này quả thật là một sự thờ ơ đáng khinh.

Bài báo này làm tôi nhớ đến một câu chuyện trong Superfreakonomics

Ước chi ai cũng có một tinh thần trách nhiệm với những người xung quanh như với chính họ.

Tuesday, September 13, 2011

Python automatic int to long conversion

The language says that if a value fits within an int, it should be an int, otherwise, it is automatically promoted to a long.

Let's take a look at these examples on a 32-bit CPython, these could make great interview questions ;):

>>> type(2**31)
<type 'long'>
>>> type(-2**31)
<type 'long'>
>>> type((-2)**31)
<type 'int'>

The first value is obviously off. Two to the power of thirty-one is one greater than what could be presented by an int. The second one seems to be an int at first, but operator precedence requires power to be evaluated first so we got a minus of a long which should be a long. The final value is within bound.

Just a side note, I wanted to have the minimum 32-bit integer value without using sys.maxint (because it may be different on 64-bit interpreter) while working on hsalf. And these were the experiments ;).

Monday, September 12, 2011

Introducing Hsalf, the reversed Flash

I am delighted to introduce a Python library to read and write Flash file formats: Hsalf.

This library is a result of my attempt at extracting Screen Video from SWF file. Naturally, it supports SWF format, and is capable of dumping embedded video frame :-). More complete tag support will be added eventually.

Friday, September 9, 2011

Game of Life

I was asked an interesting question today: What I understand about John Conway's Game of Life.

Game of Life is the name of a board simulation. The board is a matrix of NxM cells which can either be on or off. Initially, the some cells on the board are on, the rest is off. Each simulation step produces a new board state. The rules are:
  1. A cell neighbors are those surrounding it. There are at most eight of them.
  2. If a cell is surrounded by exactly three cells, it will be turned on.
  3. If a cell is surrounded by exactly two cells, it will remain what it is currently.
  4. Otherwise, a cell will be turned off.
Eric Raymond uses a pattern of this game called a Glider as Hacker Emblem.
My answer was that this simulation was truly Life. When you live in an overcrowded place (more than 3), you get crushed, you die. When you live in a deserted place (less than 2), you also die. Only when you are surrounded by enough people (2 or 3), you remain unchanged. And when the community is just good (3), new life is born. Generation after generation. Things come and go.

How about you? What do you feel about the simulation?

Thursday, September 8, 2011

A fail case of optimization

I had a piece of code like this:

if (is_enemy(side, new_r, new_c)) {
    if (tolower(board[new_r][new_c]) == 'n') {
        return true;

I optimized away the second if statement like this:

if (is_enemy(side, new_r, new_c)) {
    return tolower(board[new_r][new_c]) == 'n';

At the surface, this optimized version seems equivalent to the original version. Integrated into a bigger picture, this code fails terribly. Original code is like this:

for (int i = 0; i < 8; ++i) {
    if (is_enemy(side, new_r, new_c)) {
        if (tolower(board[new_r][new_c]) == 'n') {
            return true;

The optimized version forces a return maybe too early in the loop.

Wednesday, September 7, 2011

Screen Video in Flash SWF

I had a chance to play with Flash two days ago. I wanted to extract a frame from a Swift (SWF) file version 7.

[03c]        10 DEFINEVIDEOSTREAM defines id 0001 (79 frames, 466x311 codec 0x03)
                -=> 01 00 4f 00 d2 01 37 01 00 03
[01a]         6 PLACEOBJECT2 places id 0001 at depth 0001
                -=> 06 01 00 01 00 00
[03d]    182562 VIDEOFRAME adds information to id 0001 (frame 0) 352x288 P-frame deblock 1  quant: 14 
                -=> 01 00 00 00 13 31 d2 31 37 0f ee 78 da d5 5a 59
                -=> 93 db c6 11 9e 1e 80 dc 55 25 cf a9 24 92 ac d5
                -=> 1e bc 00 82 20 0e 72 57 eb 54 1e 52 f9 ff bf 20

Dumping with swfdump -d produced something like the above.

After skimming through Adobe's SWF File Format Specification version 10, I got the meanings of two related tags DEFINEVIDEOSTREAM and VIDEOFRAME.

DEFINEVIDEOSTREAM says I am going to create a video from embedded data. The codec that I am going to use is 0x03, which is the identifier for Screen Video codec.

VIDEOFRAME augments the other tag with actual data of a frame. This is what I am going after.

The specification says VIDEOFRAME composes of the tag, two bytes to identify the stream that was defined with DEFINEVIDEOSTREAM, two bytes to identify the frame in that stream, and then the payload. So, with the above dump, the first two bytes (01 00) identify the stream 0001, the next two (00 00) identify frame 0 in this stream, then the rest from 13 31 d2 31 37... is the payload. The payload is to be interpreted accordingly to the codec that was defined in DEFINEVIDEOSTREAM.

I then read up on Screen Video codec (from page 239). A SCREENVIDEOPACKET composes of 4 bits for block width, 12 bits for image width, 4 bits for block height, 12 bits for image height, and the remaining data are for image blocks. Using that to dissect 13 31 we get block width of 1, which means the actual block width is 32 pixels, and image width of 0x331 (817) pixels.

Obviously, that value isn't sound. The movie is not that wide. Because the interpretation is correct, there has got to be something wrong with the specs.

So I checked out SWF File Format version 7, when Screen Video codec was introduced, to see if they had better explanation. Surely, they had different descriptions and discrepancies (I'll come back to this later) but nothing related to the problem I was facing. The tag formats stay the same in two documents.

So I reread and reread, searched and searched on Screen Video, hoped to find some light. And light I found. In the introductory paragraph, the specs says
In a keyframe, every block is sent. In an interframe, one or more blocks will contain no data...
Ahh, keyframe and interframe. What are they? How to determine if one is a keyframe? A search for keyframe in the v7 spec brought me to VIDEODATA tag. This tag belongs to FLV format, not SWF format. It says that the first nibble is a CodecID, the next nibble is FrameType and then comes the payload which could be SCREENVIDEOPACKET if the CodecID is 3. This information is not mentioned in v10 of the spec.

Applying this interpretation to five bytes 13 31 d2 31 37 did not exactly yield desirable result. The first nibble (1) is not a known CodecID. However, switching the order of CodecID and FrameType gave reasonable meaning to these values. FrameType 1 is a keyframe, CodecID 3 is Screen Video. Then comes the block width of 3, image width of 0x1d2 (466), block height of 3, image height of 0x137 (311). Much more sensible. Continued with that interpretation, I was able to decode the whole VIDEOFRAME packet.

Ultimately, I needed to produce an image out of these raw data. Here comes the discrepancy between v7 and v10. In v7, the blocks are arranged from top left to bottom right row by row. In each block, pixels are arranged from top left to bottom right row by row. In v10, the blocks are arranged from bottom left to top right row by row. In each block, pixels are arranged from bottom left to top right row by row.

Funnily, I followed a mixed approach at first, blocks are arranged from bottom left to top right, but in each block, pixels are arranged from top left to bottom right. The reason was I read v10 first, then while fixing the interpretation above, I switched to v7 and continued with it. So, half the idea came from v10, the other half came from v7. In the end, the correct arrangement is depicted in v10, bottom up, left to right, similar to a BMP file.

Here is the first frame in that SWF file.

First frame extracted from screen.swf
 If this whole post is rather too long for you, here are the takeaways:
  1. If CodecID is 3 in DEFINEVIDEOSTREAM, it is Screen Video.
  2. If it is Sreen Video, the VIDEOFRAME packet, as documented in the spec, is wrong. In reality, it has two extra nibbles right before the SCREENVIDEOPACKET payload. The first nibble is FrameType (either 1 if this is a keyframe, or 2 if this is an interframe), the second is CodecID (which is 3).
  3. A Screen Video frame is divided into blocks with the first one located at the bottom left of the frame, going left to right, bottom to top.
  4. In each block, pixels are arranged bottom up, left to right.
<sarcastic>So much thanks to Adobe for opening up SWF file format.</sarcastic>

Tuesday, September 6, 2011

Hacked Certificate Authority

DigiNotar, a root CA, was compromised last week. That led to many rogue certificates for respected names such as Google, Yahoo being issued. It is assumed that these certificates were issued for the sake of carrying on man in the middle attack against these companies. The list of rogue certificates can be found on Tor project web site:

Of particular interest among the rogue certificates are two for *.*.com and *.*.org. RFC 2818 does not say if these certificates are valid. Accordingly ,the interpretation of whether they are allowed to pass host verification is up to the browsers. Fortunately, most browsers do the logical thing, they require at least two concrete domain name components. This is in alignment with RFC 2109 for cookie management, too.

Let's hope all browsers are sane.

Monday, September 5, 2011

__repr__ and unicode

__repr__ method returning Unicode string does not work in Python 2.5, 2.6 and 2.7 when called implicitly or via repr function.

class TestObject(object):
    def __repr__(self):
        return u'\u1234'
t = TestObject()
print (t.__repr__(), )
print (t, ) # same as print (repr(t), )

Running the above code produces

(Traceback (most recent call last):
  File "", line 6, in <module>
    print (t, )
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in position 0: ordinal not in range(128)

This bug has been filed in 2009,

Thursday, September 1, 2011

Blogger has a new face

Sweet. Very Google-Plusy. Sweet.

Please, Google, could you integrate Blogger with Google Plus too?