Why are we using hex strings as unique identifiers for certain resources in our system?

Here are two sample URLS for accessing the same resource. One uses the primary key which in this case is a simple auto-incrementing integer field. The second case is a big random number in hex format.

  • integer auto_increment key: http://example.com/product/100
  • large random integer encoded in hex: http://example.com/product/de34f46a3b7fea74

They can be used for security. For things like sessions, you want to make a unique session ID ‘hard to guess’. You want to make it difficult to brute-force the ID to gain access to an individual session.

The point of this post is to address the other reasons for using a ‘large random integer encoded in hex’. Obscurity. By *not* using a simple auto_increment field as a primary key and using some sort of digest or random number encoded as a hex string, you are doing three things:

  1. Obscuring counts for resources from the public.
  2. Hiding Rates of change of those counts from the public.
  3. Removing the ability for the public to easily enumerate items in the collection.

By exposing the value from a primary key field that uses a simple auto-increment for an id you are disclosing information about the number of records in a particular table. This may or may not be a concern. If you don’t care if your competition knows how many of a particular object (accounts) you have created, this isn’t an issue.

Not only do you get an absolute count, but you also can get the rate of change. Competitors can potentially monitor the primary keys and determine how many new accounts you signed up in a month (just an example). This doesn’t mean “rush to obscure this data”. Think about what you are disclosing. If it’s not a concern, move on.

You can remove the ability for a user of the system to iterate over the set of one of your collections. For example, if you include the email address in the profile for a user (even a rendered one), it would be more difficult for someone to collect all of the email addresses for your users directly through your system.

Questions:

Why don’t you just start the sequence out at some fixed number? This doesn’t stop someone from finding out the rate of change. Again, that may or may not be valuable information.

Why not start the sequence at some fixed number and increment by some random amount? This helps obscure, but you could still probably determine what the range was on the randomness that you added to the equation. With enough data points you could form an average and then determine the rate of change.

Summary:

Think about what you are disclosing. If that data is valuable, consider using a different identifier in public.

Also, keep in mind that there are other ways of find out counts of certain information. There are various reporting services out there that can measure your web-presence and give data based on that.

One of my co-workers also points out that they are nice to look at, ‘hexy‘ even. That’s one nice hexy digest right there!

2007-11-23 Update: At the time of writing this I had not considered any performance concerns of using a non-auto-increment primary key with InnoDB. Toad from #facebook on irc.freenode.net mentions “data is stored in PK order in innodb, which means there’s a lot more page splitting going on when you do inserts”. He goes on to suggest that obfuscating the id in the URL only is a better approach. You can use a symmetric cypher encryption mechanism to obfuscate the id when using it in an URL. There are some trade-offs with this method as well though. You have to take steps to protect that key you are using to encrypt the primary key, else developers on your project will hold the knowledge to figure out what your primary keys are based on the encrypted version. One other thing I’ve done is to not use the obfuscated unique id as the primary key, but as a separate unique key on the same table, while using an auto-increment field as the primary id.

I was looking into pricing for an all-in-one printer-fax-scanner-copier after a friend recommended the HP PSC 2410 Multifunction. Amazon had one listed here HP PSC 2410 Multifunction

What caught my eye was the product weight vs what they said the shipping weight was.

Product weight vs shipping weight
Does that mean that i’ll get 21 pounds of packing peanuts and other assorted packaging?


Driving north on the windward side of Oahu on Kamehameha Highway you will come across an area called Laie Bay. Just north of the Polynesian Cultural Center there is a stoplight at a shopping center. Turning right onto Anemoku and then right again onto Naupaka St will bring you to a parking area where you can wander out on the spit. You will probably find people fishing or having a picnic.


Holly

Originally uploaded by Kanske 2.

My parents took a picture of my cat Holly and Doug posted it. She’s got to be over 20 years old at this point. My dad says that if you give her a little push she’ll roll over on her side and drool on herself. She doesn’t go out hunting much anymore but still gets around okay. This picture was taken after much of her matted hair was shaved off.


Terd

Originally uploaded by Kanske 2.

Here’s my parents’ dog ‘Terd’. He looks so happy!

I picked up a new 2GB USB Flash Drive (aka Thumb Drive, whatever) and started copying files over to it. I had all of my documentation and keys copied over and thought, how handy would it be to have a copy of one of my Subversion repositories on here? What sounded like a good idea turned out to work, but wasn’t exactly fast.

I knew that it was going to be slow, but I figured it would be manageable.

First I started the process of checking out the repo to the flash drive. I’ll report those results later. Next I checked out a copy to my local hard drive.

Here are the results:

laptop:~/repos dustin$ time svn co
https://svnhost.com/svn/repo/trunk repo
--snip 596 A entries--
Checked out revision 133.
real    0m36.646s
user    0m2.690s
sys     0m5.628s

The results were lightning fast! Less than 37 seconds. So far, 53 entries have been checked out to the flash drive.

Okay, time to go clean the apartment a bit.

Back. It’s still going. So far i’ve cleaned up all of the trash and taken that out, washed the dishes, cleaned the stove, sink, counter, etc. The bathroom is clean now and i’ve got a load of clothes going.

Oh look. Joy:

svn: REPORT request failed on '/svn/repo/!svn/vcc/default'
svn: REPORT of '/svn/repo/!svn/vcc/default': Could not read response body: Secure connection truncated (https://svnhost.com)
real    52m50.427s
user    0m2.777s
sys     0m13.751s

almost 53 minutes. it didn’t complete and there’s a lock. svn cleanup.

laptop:/flash/repos dustin$ cd repo
laptop:/flash/repos dustin$ svn status

not good. minutes later it needs cleanup. 7 minutes later the cleanup finished. time to update. 12 minutes later the update finished. I now have a cleanly checked out copy of the repository.

So, after an hour and a half, i’ve got the repo checked out. Score!

« Previous Page