Schema changes and row size

I was working on a stored procedure that was misbehaving on a rather massive and embarrassing scale. The code seemed solid. The data model made sense. And yet… Every time I ran this proc it crashed and burned. There were some odd things going on and I knew I needed to get under the hood for a bit.


It had been a while since I had done serious study on SQL Server Internals so I figured it was a good time to do so. I had a copy of Dmitri Korotkevithc’s Pro SQL Sever Internals and I thought I’d skim through it as I watched test runs of my proc failing miserably.


In the first chapter, I found something useful that I had either forgotten or never known. Mr. Korotkevitch writes: “Unfortunately, table alteration never decreases the size of a data row.” (Emphasis in original.) He quite generously provides some code to prove this point. If you start out with a table full of BIGINTs and then change them all to INTs or even TINYINTs, the rows will still be the same size.


The tables in my database are pretty big and I was wondering if there were some table alterations that might have given me some wasted space. So, I took his query and modified it a bit to ferret out any tables that might benefit from an ALTER TABLE REBUILD.


I ended up with the following query.



WITH [offset_to_expected]
AS (SELECT [t].[name] AS [table_name],
[c].[name] AS [column_name],
[pc].[leaf_offset] AS [offset_in_row],
[pc].[max_inrow_length] AS [max_length],
LAG([pc].[max_inrow_length], 1, 0)
PARTITION BY [t].[object_id]
ORDER BY [c].[column_id]) AS [prev_max_length],
LAG([pc].[leaf_offset], 1, -1)
PARTITION BY [t].[object_id]
ORDER BY [c].[column_id]) AS [prev_offset_in_row],
LAG([pc].[max_inrow_length], 1, 0) OVER( PARTITION BY [t].[object_id] ORDER BY [c].[column_id])
+ LAG([pc].[leaf_offset], 1, 0) OVER( PARTITION BY [t].[object_id] ORDER BY [c].[column_id]) AS [expectd_offset_in_row]
FROM [sys].[system_internals_partition_columns] AS [pc]
INNER JOIN [sys].[partitions] AS [p]
ON [pc].[partition_id] = [p].[partition_id]
INNER JOIN [sys].[columns] AS [c]
ON [c].[column_id] = [pc].[partition_column_id]
AND [c].[object_id] = [p].[object_id]
INNER JOIN [sys].[tables] AS [t]
ON [c].[object_id] = [t].[object_id]
WHERE [p].[index_id] = 1 --clustered index only
AND [pc].[leaf_offset] > 0 --simple case of fixed-length, in-row data
SELECT [ote].[table_name],
FROM [offset_to_expected] [ote]
WHERE [ote].[offset_in_row] > [ote].[expectd_offset_in_row]
AND [ote].[expectd_offset_in_row] > 0;



The basic principle is that we ae comparing the OFFSET of a column to the combination of OFFSET and MAX_LENGTH of the previous column in the table. To do this we need to use the Window Function LAG. Sadly, Window Functions are evaluated – logically, at least – after the FROM and WHERE clauses, so if we want to use the result of the OFFSET calculation in a filter, we need to put the results in some sort of temporary object.


Why are Window Functions evaluated after the FROM and WHERE clauses? Wouldn’t it be a lot more useful to evaluate them earlier so they could be used more flexibly? I’m no expert on the subject, but if you think about it a bit, you’ll see that Window Functions basically are giving you a way to compare the data in a given row to some subset of the total dataset returned. And how could you do that if the total dataset had not already been fully specified? And to do that, you’d basically be evaluating the FROM and WHERE clauses.


So, I decided to put the results of my LAG calculations in a CTE. You could easily use a temp table or some other object, but I tend to like the clarity of the CTE syntax. (If you have a huge data set, temp tables often perform better, but for the purposes of this demo, that’s not relevant.)


Now, I knew up front that the query had some limitations. For one, it only works for fixed-length, in-row data – eg, numbers, dates, fixed-length character data, etc. But I’m working on a data warehouse at the moment and most of the data is numeric, so that’s not a big deal for me. (If anyone cares to come up with a solution that would work for variable length data, I’d love to hear about it.)


Let’s see it in action. For demo purposes, create the following table in an empty database.


[col2] BIGINT,
[col3] INT,
[col4] INT,
[col5] TINYINT,
[col6] SMALLINT,
[col7] DATETIME,



I run the offset analysis query and get no rows back. There have been no schema changes so there’s nothing to cause the offsets to be unexpected. Now, I change some columns to a smaller data type.





And then I re-run the offset analysis query.



Now I see that the offsets are unexpected and that indicates that the table might benefit from a rebuild.


I execute a rebuild.




And now I can re-run the offset analysis and no results some back.


But there is another limitation. If I increase the size of a column then SQL Server may change the internal column order and that, too, will give me some unexpected offsets. To demonstrate this, I will create another table.


[col2] BIGINT,
[col3] INT,
[col4] INT,
[col5] TINYINT,
[col6] SMALLINT,
[col7] DATETIME,


I run the offset analysis query and nothing comes back. But if I increase the size of a column like so:

ALTER TABLE [dbo].[t2]

Then I re-run the query, I can see that the new table comes back.



Now this is a pretty serious limitation. If you are in an environment where you do a lot of schema changes using the ALTER TABLE syntax (as opposed to some tools like E/R Studio that rebuild tables to accommodate schema changes) then you might get a lot of false positives. But I encourage you to play around with the query anyway, if only to get you thinking about how the internal representation of your data might be very different than what you expect. Look up the system views and DMVs – learn what they do and see if you can come up with some of your own ideas. Maybe start with sys.partitions. And let me know what you discover!

So what system views, DMVs or DMFs that you’ve found useful?


This entry was posted in SQL, SQL Server Internals and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s