Saturday, July 26, 2014

Quick and easy "software diversification"

The Economist published an article Divided we stand about the work that Full Professor Michael Franz did at University of California at Irvine to further secure software applications.

The gist of the technique is to compile the same source code into many variant binaries that perform the same function, but differ structurally, at the machine instruction level. For example: the sequence MOV EAX, EBX; XOR ECX, EDX can be rearranged into XOR ECX, EDX; MOV EAX, EBX, or made into MOV EAX, EBX; NOP; XOR ECX, EDX without affecting any functionality of the sequence. The team modified compilers (both LLVM clang and GCC) to automatically (and deterministically) introduce diversity (randomness) in the instruction scheduler. As such, exploit writers will have a harder time targeting all variants.

This immediately reminds me of a simply trick I used many years ago to achieve a similar effect: Randomizing the linking order of object files. Consider this, if you have object main.o, strcpy.o, and puts.o, you can create 6 (3 factorial) variants by linking them in different permutation orders:

  1. main strcpy puts
  2. main puts strcpy
  3. strcpy main puts
  4. strcpy puts main
  5. puts main strcpy
  6. puts strcpy main
$ gcc -o t1 f1.o main.o
$ gcc -o t2 main.o f1.o
$ nm t1
0000000100001040 S _NXArgc
0000000100001048 S _NXArgv
0000000100001058 S ___progname
0000000100000000 T __mh_execute_header
0000000100001050 S _environ
                 U _exit
0000000100000ec0 T _f1
0000000100000ed0 T _main
0000000100001000 s _pvars
                 U dyld_stub_binder
0000000100000e80 T start
$ nm t2
0000000100001040 S _NXArgc
0000000100001048 S _NXArgv
0000000100001058 S ___progname
0000000100000000 T __mh_execute_header
0000000100001050 S _environ
                 U _exit
0000000100000ef0 T _f1
0000000100000ec0 T _main
0000000100001000 s _pvars
                 U dyld_stub_binder
0000000100000e80 T start

In the first variant, f1 is at ~EC0 and main is at ~ED0. In the second variant, f1 is at ~EF0 and main is at ~EC0. There is a clear difference in the structure of the binaries but no functionality is affected.

This trick is performed at the final stage (linking) in the whole build process. Therefore, intermediate object files can be reused without recompilation. Furthermore, no source code is required for this "diversification" process to happen.

Clear tradeoffs are in the granularity of the diversification. In the context of Prof Franz's work, which is mainly in defense against ROP exploit, I'll happily ignore such granularity.

Oh, by the way, I did not use this trick to "secure" the application. It seems like a wrong tool for that purpose due to distribution and debugging problems it creates.

Friday, March 7, 2014

Functor optimization

I have this piece of code that can be compiled with -On (n > 0) but cannot be compiled with -O0.

#include <iostream>

class Functor1 {
    void operator()() const {
        std::cout << "Functor 1" << "\n";

class Functor2 {
    void operator()() const {
        std::cout << "Functor 2" << this << "\n";

template <typename FunctorType>
class TemplateWithStaticMember {
    TemplateWithStaticMember() {
    static const FunctorType functor_;  // THIS LINE!!!

/* Incomplete fix:
template <typename FunctorType>
const FunctorType TemplateWithStaticMember<FunctorType>::functor_; */

int main(int argc, char* argv[]) {
    TemplateWithStaticMember<Functor1> f1;
    // TemplateWithStaticMember<Functor2> f2;

Under GCC 4.8, when compile with -O0, we get this error:

/tmp/ccFIY33S.o: In function `TemplateWithStaticMember::TemplateWithStaticMember()': main.cpp:(.text._ZN24TemplateWithStaticMemberI8Functor1EC2Ev[_ZN24TemplateWithStaticMemberI8Functor1EC5Ev]+0xd): undefined reference to `TemplateWithStaticMember::functor_' collect2: error: ld returned 1 exit status

At other optimization levels, the code can be compiled and executed just fine.

If we uncomment the second functor, the code always fails, regardless of optimization levels.

The reason is our template declares a static constant variable functor_ (at the line marked with THIS LINE!!!). At high level optimization, the compiler finds out that we only use the functor object to execute a function so the compiler inlines the function and optimizes away the functor object. Without optimization, the compiler requires a definition of functor_ and fails to find one.

When we use f2, its functor_'s operator() refers back to itself via this. That requires the functor object to actually be allocated. But because we have not defined any such functor object, the compiler will fail to compile our code.

I find this piece of code interesting because usually higher (not lower) optimizations make code fail. For example: Prof. John Regehr initially blogged about undefined behavior under optimizations, and STACK team at MIT published a paper about optimization-safe code.

Wednesday, December 11, 2013

BrowsePass as a Chrome app

I am pleased to announce the availability of BrowsePass Chrome app.

This is an easier way to set BrowsePass up on the Google Chrome browser (as well as its open source cousin Chromium). The Chrome app alleviates the most cumbersome steps in setting up BrowsePass: finding a web host for the source code. Everything else remains the same, including the convenience of opening your password database with a browser.

Though there is not yet extension/add-on/app for other browsers (such as Firefox, Safari, Opera), they can still run BrowsePass normally.

As usual, the source code (including the Chrome app) is at BitBucket.

Tuesday, December 10, 2013

Chainload from GRUB to Chameleon boot loader

A quick note to myself. Assuming OS X is installed in the 2nd partition of the 1st hard disk, modify your /etc/grub.d/40_custom file to have:

menuentry "OS X" {
    insmod hfsplus
    set root="(hd0,gpt2)"
    chainload /usr/standalone/i386/boot0

Then remember to run update-grub2.

The file /usr/standalone/i386/boot0 can be found in your hackintosh. It is the boot code from Chameleon. If you do not find this file, grab it from Chameleon package.

Wednesday, November 27, 2013

Language definition file vietnam.ldf not found

Just a quick note to myself. VnTeX recently moved its vietnam.ldf file to babel's contrib as vietnamese.dtx. The move happened on April 14, 2013. The babel package in MiKTeX is currently at March 23, 2013. The vntex package in MiKTeX is currently at May 21, 2013. That is to say vntex package no longer provides vietnam.ldf, yet babel package is still not update-to-date enough to have vietnamese.dtx.

The fix is to maintain your own vietnam.ldf file. The code (that was taken before the move) is pasted below.

% Copyright 2000-2005 Werner Lemberg .
% This file is part of vntex.  License: LPPL, version 1.3 or newer,
% according to
% vietnam.ldf
% written by Werner LEMBERG 
% History
%   1.0  2000/09/01
%     First version.
%   1.1  2001/05/26
%     Moved \endlinechar downwards.
%   post 1.1  ?
%     Don't check for dblaccnt.sty.
%     Add support for ucs.sty.
%     Don't define \captionsvietnam but load vncaps.tex.
%   1.2  2005/04/21
%     Add copyright message.
%     Minor clean-ups.

    [2005/04/21 v1.2 Vietnamese support from the babel system]


\ifx\l@vietnam \@undefined
  \adddialect\l@vietnam 0


  {\message{Loading definitions for the Vietnamese font encoding}}
  {\errhelp{I can't find the file `t5enc.def' for Vietnamese fonts}
   \errmessage{Since I do not know what the T5 encoding means^^J
               I can't typeset Vietnamese.^^J
               I stop here, while you get a suitable `t5enc.def' file}

  {\PackageWarning{babel}{No input encoding specified for Vietnamese}}

\endlinechar \m@ne

  \ifx \UnicodeCharFilter \@undefined
%   \UCSProtectionUnichar

\let\viet \viettext





\endlinechar `\^^M


% end of vietnam.ldf

Sunday, September 15, 2013

Introducing BrowsePass, a KeePass on the web

Update: There is now a Chrome app. If you use the app, you do not need to carry out steps 5 to 7 below.

Update: Please file bugs and feature requests at the project's page.

This post was initially intended to be named "How to have your own open source XXX" where XXX is your favorite commercial online password manager. Then I realized that would have been too deceiving because I don't know jack about them. I have never used them. How can you trust a closed source password management software?

In contrast, BrowsePass is a GPL licensed JavaScript application (and library) to open KeePass password databases in (modern) browsers. It is great when you're on the move and don't bring KeePass with you or cannot install it on your machine but you need to access a password in your vault.

Setting up BrowsePass is as easy as uploading all the files to your web hosting provider. I would recommend you to use MyDrive because it offers WebDAV, and serves files with sane MIME type instead of forcing you to download them. You can use MyDrive as your HTTPS web host, as well as the remote storage for KeePass.

BrowsePass can read a dropped in local database file as well as a remote database file. In the second case, cross origin policy might prevent the loading of remote file, unless the remote server has enabled Cross Origin Resource Sharing (CORS).

In summary, here're the steps to set BrowsePass up for read, with KeePass for write.

  1. Register an account with MyDrive.
  2. Create a database with KeePass.
  3. Upload your database to MyDrive. Let's say you have uploaded it as personal.kdbx.
  4. In KeePass, open URL From now on, whenever you save, your file on MyDrive will be updated.
  5. Download latest version of BrowsePass.
  6. Unpack and upload all the files to MyDrive.
  7. Browse to
  8. You can either drag and drop a local KeePass database file to the second box, or type in an URL to your database in the first box. The URL can be either relative such as personal.kdbx or an absolute one such as

Here are some mandatory screenshots. I hope you'll like it.

How to read .NET GZipStream with Python zlib

Just a quick post, sort of a note to myself. The code below can be used to inflate (unzip, decompress) the byte string that was created by .NET GZipStream.

import zlib
data = 'abcdef' # get the input bytestring
data = [10 : ]  # skip the first 10 bytes, .gz file header
decompressed_data = zlib.decompress(data, -zlib.MAX_WBITS)