Personal data in cookies

Personal data in cookies is bad. I accepted this for quite some time, until I actually stopped to think about it, and realised how much simpler our scaling would be if we could store a lot more state information in cookies.

So… conventional wisdom. This is bad:

myCookie=Andrew,andrew@trib.tv,123456,TE51 7NG,4,2009-05-22 17:43:06,138cb381a04dba00340ab450deea934;

This imaginary cookie contains my display name, email address, user identifier, postcode, user level, cookie generation time, and security signature.

In my experience the first argument people have against the data I have in my cookie is that ‘the user could just change it! You’ve got their permissions level in there and everything. Are you mad!?’.

The ‘user could just change it’ argument

Yes, they could. So in order to solve this problem you only need to make it impossible for a user to write a legitimate cookie value, or edit a cookie such that it remains legitimate. This is trivially simple to do and validate, much more so than implementing a scalable session store at the server end.

All you do is include, within your cookie, two additional items – the generation time of the cookie, and a signature string. The signature string is the result of running the rest of the cookie (excluding the signature), AND a constant value NOT included in the cookie, through a hashing algorithm, such as md5, crc32 or sha1 (use sha1 for preference).

Here’s an example, using the cookie above. The constant value not included in the cookie (normally a string of gibberish) will for our purposes be ‘password’ (obviously you should use something a lot more random than that). Sticking all the parameters of the cookie together, and adding the super-secure, super secret constant on the end, we get:

Andrewandrew@trib.tv123456TE51 7NG42009-05-22 17:43:06password

Then we run this through our hashing algorithm of choice, and end up with:

138cb381a04dba00340ab450deea934

Which we pop on the end of the cookie before it is sent to the user in a Set-Cookie header. When that cookie is then received in a subsequent request (to a different server), the server only needs to know the super-secret constant and recalculate the hash to be able to verify that whoever wrote the cookie ALSO knows the super secret constant. The user doesn’t know it, so the user can’t write a cookie that we’re going to believe. No matter how much the user wants to get to user level 10, they’re just gonna have to earn it.

But, wail the doubters, you can see all the data on the wire in plain text!

The ‘you can see it on the wire’ argument

When you put together the web page your user has requested – your shopping cart or social network or Tom Jones fan page or whatever – you are probably going to print a nice friendly message in the top right of the page that says ‘Hello Andrew!’ with links to exciting places such as ‘My Account’, ‘Preferences’ and ‘Log out’. When the user goes to the my account page, you’re going to include loads more details as well, like their email address, phone number, postcode, and so on. And of course because your app has a ton of JavaScript making life easy for the user, you’ve probably got their email address in a JavaScript variable somewhere so you can prepopulate ‘email a friend’ dialogue boxes and overlays.

So all the content that you object to including in a cookie, you are including in the HTML responses that you send back to the user. If it’s OK in the response, why not allow it in the request?

There is a possible sub argument here: using DNS cache poisoning an attacker could intercept a request more easily than they could intercept a response. But if your site is at risk from attempts at DNS poisoning (which is a lot more difficult than it used to be), then you probably have bigger worries than just leaking some personal data.

Conclusion

In many cases, storing personal data in cookies is a great idea, and is not a privacy issue. If you’re storing something that you’re not intending to use in JavaScript, and want to make it less readable (though not actually secure) you could easily obfuscate it, or make it actually secure using a cheap-to-decrypt symmetric encryption with a key known to all your servers, which still avoids the need to store the data in a server-side session store.


Post a Comment