Saturday, July 26, 2014

Quick and easy "software diversification"

The Economist published an article Divided we stand about the work that Full Professor Michael Franz did at University of California at Irvine to further secure software applications.

The gist of the technique is to compile the same source code into many variant binaries that perform the same function, but differ structurally, at the machine instruction level. For example: the sequence MOV EAX, EBX; XOR ECX, EDX can be rearranged into XOR ECX, EDX; MOV EAX, EBX, or made into MOV EAX, EBX; NOP; XOR ECX, EDX without affecting any functionality of the sequence. The team modified compilers (both LLVM clang and GCC) to automatically (and deterministically) introduce diversity (randomness) in the instruction scheduler. As such, exploit writers will have a harder time targeting all variants.

This immediately reminds me of a simply trick I used many years ago to achieve a similar effect: Randomizing the linking order of object files. Consider this, if you have object main.o, strcpy.o, and puts.o, you can create 6 (3 factorial) variants by linking them in different permutation orders:

  1. main strcpy puts
  2. main puts strcpy
  3. strcpy main puts
  4. strcpy puts main
  5. puts main strcpy
  6. puts strcpy main
$ gcc -o t1 f1.o main.o
$ gcc -o t2 main.o f1.o
$ nm t1
0000000100001040 S _NXArgc
0000000100001048 S _NXArgv
0000000100001058 S ___progname
0000000100000000 T __mh_execute_header
0000000100001050 S _environ
                 U _exit
0000000100000ec0 T _f1
0000000100000ed0 T _main
0000000100001000 s _pvars
                 U dyld_stub_binder
0000000100000e80 T start
$ nm t2
0000000100001040 S _NXArgc
0000000100001048 S _NXArgv
0000000100001058 S ___progname
0000000100000000 T __mh_execute_header
0000000100001050 S _environ
                 U _exit
0000000100000ef0 T _f1
0000000100000ec0 T _main
0000000100001000 s _pvars
                 U dyld_stub_binder
0000000100000e80 T start

In the first variant, f1 is at ~EC0 and main is at ~ED0. In the second variant, f1 is at ~EF0 and main is at ~EC0. There is a clear difference in the structure of the binaries but no functionality is affected.

This trick is performed at the final stage (linking) in the whole build process. Therefore, intermediate object files can be reused without recompilation. Furthermore, no source code is required for this "diversification" process to happen.

Clear tradeoffs are in the granularity of the diversification. In the context of Prof Franz's work, which is mainly in defense against ROP exploit, I'll happily ignore such granularity.

Oh, by the way, I did not use this trick to "secure" the application. It seems like a wrong tool for that purpose due to distribution and debugging problems it creates.