I was working on a stored procedure that was misbehaving on a rather massive and embarrassing scale. The code seemed solid. The data model made sense. And yet… Every time I ran this proc it crashed and burned. There were some odd things going on and I knew I needed to get under the hood for a bit.
It had been a while since I had done serious study on SQL Server Internals so I figured it was a good time to do so. I had a copy of Dmitri Korotkevithc’s Pro SQL Sever Internals and I thought I’d skim through it as I watched test runs of my proc failing miserably.
In the first chapter, I found something useful that I had either forgotten or never known. Mr. Korotkevitch writes: “Unfortunately, table alteration never decreases the size of a data row.” (Emphasis in original.) He quite generously provides some code to prove this point. If you start out with a table full of BIGINTs and then change them all to INTs or even TINYINTs, the rows will still be the same size.
The tables in my database are pretty big and I was wondering if there were some table alterations that might have given me some wasted space. So, I took his query and modified it a bit to ferret out any tables that might benefit from an ALTER TABLE REBUILD.
I ended up with the following query.
WITH [offset_to_expected] AS (SELECT [t].[name] AS [table_name], [t].[object_id], [c].[column_id], [c].[name] AS [column_name], [pc].[leaf_offset] AS [offset_in_row], [pc].[max_inrow_length] AS [max_length], LAG([pc].[max_inrow_length], 1, 0) OVER( PARTITION BY [t].[object_id] ORDER BY [c].[column_id]) AS [prev_max_length], LAG([pc].[leaf_offset], 1, -1) OVER( PARTITION BY [t].[object_id] ORDER BY [c].[column_id]) AS [prev_offset_in_row], LAG([pc].[max_inrow_length], 1, 0) OVER( PARTITION BY [t].[object_id] ORDER BY [c].[column_id]) + LAG([pc].[leaf_offset], 1, 0) OVER( PARTITION BY [t].[object_id] ORDER BY [c].[column_id]) AS [expectd_offset_in_row] FROM [sys].[system_internals_partition_columns] AS [pc] INNER JOIN [sys].[partitions] AS [p] ON [pc].[partition_id] = [p].[partition_id] INNER JOIN [sys].[columns] AS [c] ON [c].[column_id] = [pc].[partition_column_id] AND [c].[object_id] = [p].[object_id] INNER JOIN [sys].[tables] AS [t] ON [c].[object_id] = [t].[object_id] WHERE [p].[index_id] = 1 --clustered index only AND [pc].[leaf_offset] > 0 --simple case of fixed-length, in-row data ) SELECT [ote].[table_name], [ote].[object_id], [ote].[column_id], [ote].[column_name], [ote].[offset_in_row], [ote].[expectd_offset_in_row] FROM [offset_to_expected] [ote] WHERE [ote].[offset_in_row] > [ote].[expectd_offset_in_row] AND [ote].[expectd_offset_in_row] > 0;
The basic principle is that we ae comparing the OFFSET of a column to the combination of OFFSET and MAX_LENGTH of the previous column in the table. To do this we need to use the Window Function LAG. Sadly, Window Functions are evaluated – logically, at least – after the FROM and WHERE clauses, so if we want to use the result of the OFFSET calculation in a filter, we need to put the results in some sort of temporary object.
Why are Window Functions evaluated after the FROM and WHERE clauses? Wouldn’t it be a lot more useful to evaluate them earlier so they could be used more flexibly? I’m no expert on the subject, but if you think about it a bit, you’ll see that Window Functions basically are giving you a way to compare the data in a given row to some subset of the total dataset returned. And how could you do that if the total dataset had not already been fully specified? And to do that, you’d basically be evaluating the FROM and WHERE clauses.
So, I decided to put the results of my LAG calculations in a CTE. You could easily use a temp table or some other object, but I tend to like the clarity of the CTE syntax. (If you have a huge data set, temp tables often perform better, but for the purposes of this demo, that’s not relevant.)
Now, I knew up front that the query had some limitations. For one, it only works for fixed-length, in-row data – eg, numbers, dates, fixed-length character data, etc. But I’m working on a data warehouse at the moment and most of the data is numeric, so that’s not a big deal for me. (If anyone cares to come up with a solution that would work for variable length data, I’d love to hear about it.)
Let’s see it in action. For demo purposes, create the following table in an empty database.
CREATE TABLE [t1] ( [col1] BIGINT PRIMARY KEY, [col2] BIGINT, [col3] INT, [col4] INT, [col5] TINYINT, [col6] SMALLINT, [col7] DATETIME, [col8] SMALLDATETIME );
I run the offset analysis query and get no rows back. There have been no schema changes so there’s nothing to cause the offsets to be unexpected. Now, I change some columns to a smaller data type.
ALTER TABLE [t1] ALTER COLUMN [col3] SMALLINT; GO
And then I re-run the offset analysis query.
Now I see that the offsets are unexpected and that indicates that the table might benefit from a rebuild.
I execute a rebuild.
ALTER TABLE [dbo].[t1] REBUILD;
And now I can re-run the offset analysis and no results some back.
But there is another limitation. If I increase the size of a column then SQL Server may change the internal column order and that, too, will give me some unexpected offsets. To demonstrate this, I will create another table.
CREATE TABLE [t2] ( [col1] BIGINT PRIMARY KEY, [col2] BIGINT, [col3] INT, [col4] INT, [col5] TINYINT, [col6] SMALLINT, [col7] DATETIME, [col8] SMALLDATETIME );
I run the offset analysis query and nothing comes back. But if I increase the size of a column like so:
ALTER TABLE [dbo].[t2] ALTER COLUMN [col4] BIGINT;
Then I re-run the query, I can see that the new table comes back.
Now this is a pretty serious limitation. If you are in an environment where you do a lot of schema changes using the ALTER TABLE syntax (as opposed to some tools like E/R Studio that rebuild tables to accommodate schema changes) then you might get a lot of false positives. But I encourage you to play around with the query anyway, if only to get you thinking about how the internal representation of your data might be very different than what you expect. Look up the system views and DMVs – learn what they do and see if you can come up with some of your own ideas. Maybe start with sys.partitions. And let me know what you discover!
So what system views, DMVs or DMFs that you’ve found useful?