Every web developer knows that basically HTTP is a stateless protocol. The connection is terminated after one request has been served. Each request appears to the web server as a new request.
It is easy, however, to make requests related to each other, essentially establish a long session spanning many requests, via a mechanism called cookie. A cookie is a piece of information that the web server wants the user agent, the browser, to remember and pass back to the server in next requests. Therefore, a cookie is set by the server, but stored on the client. The value of a cookie (or the lack thereof) that is received by the server is not trustable.
To make a session more meaningful, web application usually associates more information than just the identifier to a session. These associated data are often stored on the server so that from a small session identifier, the web application can retrieve bigger session data. The browser would only know about the session identifier and nothing about those associated data. These session identifiers are generally random, and long enough and the associated data could be user objects, some binary blobs, or almost anything else.
Sometimes, web application embeds session data together with session identifier into the cookie itself. This is called cookie-backed session. The advantage of this is that no server side storage is required. The disadvantage is that the server can no longer trust session data because they are controlled by the client.
In order to ensure the integrity of cookie backed session, the server must have a way to identify whether a session has been invalidated before accepting its data. The challenge is in doing exactly that without requiring as much server side storage space as other session back ends.
And here's a solution that I derived.
The session cookie would have extra mandatory fields timestamp and checksum. The timestamp is the moment in time that this cookie was created or updated. The checksum is the MAC value of the whole cookie.
The cookie is considered invalid when one of these conditions are true:
- The elapsed time based on the timestamp is over the life time of the cookie.
- The MAC is not equal to the calculated value.
- The session identifier is in a list of invalidated session identifiers.
And I can, by following these steps:
- I will have a set called invalidated_sessions that stores the session identifiers.
- I will have a first in first out queue called logbook whose elements are tuples of (timestamp, session identifier).
- When a session is invalidated, the logbook is updated with a new tuple. Its timestamp value is of the session timestamp plus cookie life time plus some extra window to prevent race condition, and the session identifier of the session being invalidated. The invalidated_sessions set will also record the same session identifier.
- Before validating the session, the web application needs to prune expired entries in the logbook and invalidated_sessions. The application will peek into the first element of the queue to see if it has a timestamp that is older than the current time. If that is the case, the corresponding entry in invalidated_sessions is removed, this entry popped out of the queue, and the process repeated.
- After validating its timestamp and MAC value, the session is then matched against invalidated_sessions. The session is consider invalid if it is in the set.
When the web server goes down, this in-memory memcache is lost, and hence all current, and untampered sessions are valid.
The set is not shared across all insances of the web application. This can cause a session to be wrongly accepted when it is served by a different instance than the one that it has been invalidated on.
After all, if this is just memcache, one might as well use the real memcache and take advantage of its cache expiration mechanism and sharing of data across all instances.