Wed 7 Feb 2007
Why are we using hex strings as unique identifiers for certain resources in our system?
Here are two sample URLS for accessing the same resource. One uses the primary key which in this case is a simple auto-incrementing integer field. The second case is a big random number in hex format.
- integer auto_increment key: http://example.com/product/100
- large random integer encoded in hex: http://example.com/product/de34f46a3b7fea74
They can be used for security. For things like sessions, you want to make a unique session ID ‘hard to guess’. You want to make it difficult to brute-force the ID to gain access to an individual session.
The point of this post is to address the other reasons for using a ‘large random integer encoded in hex’. Obscurity. By *not* using a simple auto_increment field as a primary key and using some sort of digest or random number encoded as a hex string, you are doing three things:
- Obscuring counts for resources from the public.
- Hiding Rates of change of those counts from the public.
- Removing the ability for the public to easily enumerate items in the collection.
By exposing the value from a primary key field that uses a simple auto-increment for an id you are disclosing information about the number of records in a particular table. This may or may not be a concern. If you don’t care if your competition knows how many of a particular object (accounts) you have created, this isn’t an issue.
Not only do you get an absolute count, but you also can get the rate of change. Competitors can potentially monitor the primary keys and determine how many new accounts you signed up in a month (just an example). This doesn’t mean “rush to obscure this data”. Think about what you are disclosing. If it’s not a concern, move on.
You can remove the ability for a user of the system to iterate over the set of one of your collections. For example, if you include the email address in the profile for a user (even a rendered one), it would be more difficult for someone to collect all of the email addresses for your users directly through your system.
Questions:
Why don’t you just start the sequence out at some fixed number? This doesn’t stop someone from finding out the rate of change. Again, that may or may not be valuable information.
Why not start the sequence at some fixed number and increment by some random amount? This helps obscure, but you could still probably determine what the range was on the randomness that you added to the equation. With enough data points you could form an average and then determine the rate of change.
Summary:
Think about what you are disclosing. If that data is valuable, consider using a different identifier in public.
Also, keep in mind that there are other ways of find out counts of certain information. There are various reporting services out there that can measure your web-presence and give data based on that.
One of my co-workers also points out that they are nice to look at, ‘hexy‘ even. That’s one nice hexy digest right there!
2007-11-23 Update: At the time of writing this I had not considered any performance concerns of using a non-auto-increment primary key with InnoDB. Toad from #facebook on irc.freenode.net mentions “data is stored in PK order in innodb, which means there’s a lot more page splitting going on when you do inserts”. He goes on to suggest that obfuscating the id in the URL only is a better approach. You can use a symmetric cypher encryption mechanism to obfuscate the id when using it in an URL. There are some trade-offs with this method as well though. You have to take steps to protect that key you are using to encrypt the primary key, else developers on your project will hold the knowledge to figure out what your primary keys are based on the encrypted version. One other thing I’ve done is to not use the obfuscated unique id as the primary key, but as a separate unique key on the same table, while using an auto-increment field as the primary id.
2 Responses to “Obfuscating IDs for Records”
Leave a Reply
You must be logged in to post a comment.
February 28th, 2007 at 12:19 am
[...] I wrote about obfuscating the primary keys for your critical tables here: http://www.kanske.com/?p=7 [...]
March 6th, 2007 at 10:54 am
You the man. I never looked at it like that. I have some hidden form fields in my shit that has that info in it. Good looking out!