Friday, December 23, 2011

Pure message passing and fault tolerance

I just finished watching Joe Armstrong's talk on Systems That Never Stop. Around 38:00, he mentioned this general consensus:

198x: pure message passing (where all parameters are passed by values) was considered inefficient because one could pass a pointer instead of copying data.
200x: pure message passing is considered efficient because it allows massive parallelization.

That's a 180-degree flip! I think it makes senses. And it is very much in alignment with The Network Is The Computer.

His talk is highly recommended. Here are the six laws with the most important one bolded to keep a system from failing:

  1. Isolation: Processes must be totally separated from each other. Failure on one must not affect others. (This is why pure message passing is required. You definitely don't want to pass a point from one process to another.)
  2. Concurrency: Spawn million of processes!
  3. Failure detection: Failures must be remotely detectable.
  4. Failure identification: And failures must be analyzable, often after the fact, to find root causes.
  5. Live code upgrade: The system must be able to roll-forward, or backward, without shutting down.
  6. Stable storage: If you store something, it should be there forever. This implies multiple copies, distributions.

No comments:

Post a Comment