Thursday, November 13, 2014

A short joke

Thought of this joke on the way home (based on another religiously-themed version from a colleague at work).

How can computer scientists help NASA find the next habitable planet? They do a star search.

Saturday, July 26, 2014

Quick and easy "software diversification"

The Economist published an article Divided we stand about the work that Full Professor Michael Franz did at University of California at Irvine to further secure software applications.

The gist of the technique is to compile the same source code into many variant binaries that perform the same function, but differ structurally, at the machine instruction level. For example: the sequence MOV EAX, EBX; XOR ECX, EDX can be rearranged into XOR ECX, EDX; MOV EAX, EBX, or made into MOV EAX, EBX; NOP; XOR ECX, EDX without affecting any functionality of the sequence. The team modified compilers (both LLVM clang and GCC) to automatically (and deterministically) introduce diversity (randomness) in the instruction scheduler. As such, exploit writers will have a harder time targeting all variants.

This immediately reminds me of a simply trick I used many years ago to achieve a similar effect: Randomizing the linking order of object files. Consider this, if you have object main.o, strcpy.o, and puts.o, you can create 6 (3 factorial) variants by linking them in different permutation orders:

  1. main strcpy puts
  2. main puts strcpy
  3. strcpy main puts
  4. strcpy puts main
  5. puts main strcpy
  6. puts strcpy main
$ gcc -o t1 f1.o main.o
$ gcc -o t2 main.o f1.o
$ nm t1
0000000100001040 S _NXArgc
0000000100001048 S _NXArgv
0000000100001058 S ___progname
0000000100000000 T __mh_execute_header
0000000100001050 S _environ
                 U _exit
0000000100000ec0 T _f1
0000000100000ed0 T _main
0000000100001000 s _pvars
                 U dyld_stub_binder
0000000100000e80 T start
$ nm t2
0000000100001040 S _NXArgc
0000000100001048 S _NXArgv
0000000100001058 S ___progname
0000000100000000 T __mh_execute_header
0000000100001050 S _environ
                 U _exit
0000000100000ef0 T _f1
0000000100000ec0 T _main
0000000100001000 s _pvars
                 U dyld_stub_binder
0000000100000e80 T start

In the first variant, f1 is at ~EC0 and main is at ~ED0. In the second variant, f1 is at ~EF0 and main is at ~EC0. There is a clear difference in the structure of the binaries but no functionality is affected.

This trick is performed at the final stage (linking) in the whole build process. Therefore, intermediate object files can be reused without recompilation. Furthermore, no source code is required for this "diversification" process to happen.

Clear tradeoffs are in the granularity of the diversification. In the context of Prof Franz's work, which is mainly in defense against ROP exploit, I'll happily ignore such granularity.

Oh, by the way, I did not use this trick to "secure" the application. It seems like a wrong tool for that purpose due to distribution and debugging problems it creates.

Friday, March 7, 2014

Functor optimization

I have this piece of code that can be compiled with -On (n > 0) but cannot be compiled with -O0.

#include <iostream>

class Functor1 {
public:
    void operator()() const {
        std::cout << "Functor 1" << "\n";
    }
};

class Functor2 {
public:
    void operator()() const {
        std::cout << "Functor 2" << this << "\n";
    }
};

template <typename FunctorType>
class TemplateWithStaticMember {
public:
    TemplateWithStaticMember() {
        functor_();
    }
private:
    static const FunctorType functor_;  // THIS LINE!!!
};

/* Incomplete fix:
template <typename FunctorType>
const FunctorType TemplateWithStaticMember<FunctorType>::functor_; */

int main(int argc, char* argv[]) {
    TemplateWithStaticMember<Functor1> f1;
    // TemplateWithStaticMember<Functor2> f2;
}

Under GCC 4.8, when compile with -O0, we get this error:

/tmp/ccFIY33S.o: In function `TemplateWithStaticMember::TemplateWithStaticMember()': main.cpp:(.text._ZN24TemplateWithStaticMemberI8Functor1EC2Ev[_ZN24TemplateWithStaticMemberI8Functor1EC5Ev]+0xd): undefined reference to `TemplateWithStaticMember::functor_' collect2: error: ld returned 1 exit status

At other optimization levels, the code can be compiled and executed just fine.

If we uncomment the second functor, the code always fails, regardless of optimization levels.

The reason is our template declares a static constant variable functor_ (at the line marked with THIS LINE!!!). At high level optimization, the compiler finds out that we only use the functor object to execute a function so the compiler inlines the function and optimizes away the functor object. Without optimization, the compiler requires a definition of functor_ and fails to find one.

When we use f2, its functor_'s operator() refers back to itself via this. That requires the functor object to actually be allocated. But because we have not defined any such functor object, the compiler will fail to compile our code.

I find this piece of code interesting because usually higher (not lower) optimizations make code fail. For example: Prof. John Regehr initially blogged about undefined behavior under optimizations, and STACK team at MIT published a paper about optimization-safe code.