Welcome to WindowsClient.net | Sign in | Join
Wednesday, June 25, 2008 8:36 AM rendle

Varchar length stored in 3 bits!?

So I was reading Angry Hacker's 10 reasons why SQL 2008 is going to rock and one of the reasons was compression, and one of the things about compression was that:

For instance, length of the varchar will be stored in 3 bits.

That bothered me. 3 bits represent a value from 0 to 7. How can that store all the possible lengths of a varchar, up to 8192? I kind of had the problem running in background all last night (at the same time as learning Ruby and watching the BBC's Bill Gates documentary; who says men can't multitask?). Finally, this morning, I came up with a theory: it's a lookup table.

Looking at the database I primarily work with, we only actually use a handful of different varchar sizes. 80% of the text columns are varchar(254), due to a limitation of a legacy development system's ODBC implementation. There are a few 50s where people probably just forgot to change the default in the table designer. For some reason there are a lot of 40s. And there are a bunch of isolated odd lengths, which I'll probably do something about if purchasing approve my request for VSTS Database Edition. Point is, >95% of the columns are covered by the top 7 values. So those 7 values could be held in an int[7] somewhere, and the actual metadata stored as the index to that table. The few extraordinary values are stored as regular ints (you use the remaining 3-bit value to indicate that happening).

Of course, that's just my database. I don't know what other people's databases are like. But Microsoft do. Microsoft have got the Customer Experience Improvement Program. I'd always wondered about the kind of information that they collected; I'm guessing that distributions of data types and lengths is in there somewhere. They've got enough information about enough usage patterns to be able to do stuff like this. So now I'm thinking about adding my own CEIP to my company's applications, just to see if there are any obvious optimisations waiting to stare me in the face.

Disclaimer: as I say, this is purely guesswork. But if anyone from the SQL team reads this, maybe you could let me know whether I'm warm or freezing?

Comments

No Comments

Leave a Comment

(required) 
(required) 
(optional)
(required) 
 
Page view counter