Monday, June 11, 2012

Invalidating cookie-backed session from the server

I joined in a really interesting discussion about session management in web application this week. One of the issues raised in this discussion was how to invalidate a cookie-backed session. In this post, I am going to explain how I would tackle this problem. All comments are welcome.

Brief context

Every web developer knows that basically HTTP is a stateless protocol. The connection is terminated after one request has been served. Each request appears to the web server as a new request.

It is easy, however, to make requests related to each other, essentially establish a long session spanning many requests, via a mechanism called cookie. A cookie is a piece of information that the web server wants the user agent, the browser, to remember and pass back to the server in next requests. Therefore, a cookie is set by the server, but stored on the client. The value of a cookie (or the lack thereof) that is received by the server is not trustable.

To make a session more meaningful, web application usually associates more information than just the identifier to a session. These associated data are often stored on the server so that from a small session identifier, the web application can retrieve bigger session data. The browser would only know about the session identifier and nothing about those associated data. These session identifiers are generally random, and long enough and the associated data could be user objects, some binary blobs, or almost anything else.

Sometimes, web application embeds session data together with session identifier into the cookie itself. This is called cookie-backed session. The advantage of this is that no server side storage is required. The disadvantage is that the server can no longer trust session data because they are controlled by the client.

Challenge

In order to ensure the integrity of cookie backed session, the server must have a way to identify whether a session has been invalidated before accepting its data. The challenge is in doing exactly that without requiring as much server side storage space as other session back ends.

Solution

And here's a solution that I derived.

The session cookie would have extra mandatory fields timestamp and checksum. The timestamp is the moment in time that this cookie was created or updated. The checksum is the MAC value of the whole cookie.

The cookie is considered invalid when one of these conditions are true:
  1. The elapsed time based on the timestamp is over the life time of the cookie.
  2. The MAC is not equal to the calculated value.
  3. The session identifier is in a list of invalidated session identifiers.
So the question boils down to if I can quickly work out the third condition without requiring too much space and time.

And I can, by following these steps:
  1. I will have a set called invalidated_sessions that stores the session identifiers.
  2. I will have a first in first out queue called logbook whose elements are tuples of (timestamp, session identifier).
  3. When a session is invalidated, the logbook is updated with a new tuple. Its timestamp value is of the session timestamp plus cookie life time plus some extra window to prevent race condition, and the session identifier of the session being invalidated. The invalidated_sessions set will also record the same session identifier.
  4. Before validating the session, the web application needs to prune expired entries in the logbook and invalidated_sessions. The application will peek into the first element of the queue to see if it has a timestamp that is older than the current time. If that is the case, the corresponding entry in invalidated_sessions is removed, this entry popped out of the queue, and the process repeated.
  5. After validating its timestamp and MAC value, the session is then matched against invalidated_sessions. The session is consider invalid if it is in the set.
In short, I am fundamentally implementing a limited memcache in the web application.

Limitation

When the web server goes down, this in-memory memcache is lost, and hence all current, and untampered sessions are valid.

The set is not shared across all insances of the web application. This can cause a session to be wrongly accepted when it is served by a different instance than the one that it has been invalidated on.

After all, if this is just memcache, one might as well use the real memcache and take advantage of its cache expiration mechanism and sharing of data across all instances.

5 comments:

  1. How do you cope with leaked MAC key?

    ReplyDelete
  2. I guess a leaked MAC key would lead to a change of MAC key which would lead to all sessions being invalidated.

    This causes a mild annoyance for valid users because they have to go through session creation again. And there's a window of opportunity that someone can create arbitrary session with the leaked key.

    So I guess, I'd just go ahead and change the MAC key. The change in MAC key does not make invalidated sessions valid.

    ReplyDelete
  3. I'd have stated my question clearer. Here it goes:

    One of the main benefits of using a cookie backed session scheme is scalability, i.e., you have so many web servers, and you want each of them to process requests independently. This is what is known as a "shared-nothing architecture".

    Of course you have to distribute your MAC key to all of your web servers. The more server you have, the more likely that your MAC key would be stolen. It's obvious that anybody with the key would be able to impersonate any users, and you have no way to detect him, don't you?

    How would you modify your scheme to prevent, detect and respond to such an incident?

    ReplyDelete
  4. The security of any cryptosystem usually relies only on the secrecy of its key. If I understand you right, the situation that you raised here puts that one basic assumption to rest. It is like asking how do we build a trusted architecture from nothing that we can trust.

    Nevertheless, there could be something done in this case to minimize the impact of such an incident. First, let's see what we control here. We control the hosts, and we control the associated data in the session. Now, let's cook up another scheme.

    First of all, we could use different keys on different hosts and rely on the load balancing to direct the request to this particular host. For this, we need to add another field in the cookie so that the LB device can route requests.

    Secondly, and less effectively, we can store __active__ session identifiers and some (short) associated data such as timestamp in, say, memcache. A coming request would be validated against these active records instead of a much smaller set of invalidated records as I described in the blog post. It would be so much more difficult for an attacker to generate a valid session identifier (which is random and long enough) as well as its associated data. As an optimization, one can use a (counted) Bloom Filter to weed out invalid session identifiers.

    Thanks for bringing up an interesting discussion point.

    ReplyDelete
  5. Well, the point here is to make the system secure against realistic threats; I don't really care much about the principles of anything. You want to use crypto, so you must take care of the case that your keys are compromised. It's that simple, isn't it? Anyway I think you come up with a nice way to address my concern.

    ReplyDelete