Stupid SQL Tricks

Random posts from a SQL Server geek

0 notes &

In with the new

I had a client ask about merging two companies in their ERP database. I needed to find which of the hundreds of tables would require some work. I’ve got to keep current as SQL Server changes its metadata schema, so I wrote this without using any of the compatibilty views:

SELECT OBJECT_NAME(i.object_id)
FROM sys.indexes AS i JOIN sys.index_columns AS c ON i.object_id = c.object_id AND i.index_id = c.index_id
JOIN sys.partitions AS p ON i.object_id = p.object_id AND i.index_id = p.index_id
WHERE p.rows > 0 AND COL_NAME(c.object_id,c.column_id) = ‘CoID’ AND i.is_primary_key | i.is_unique_constraint = 1;

I needed to know which (populated) tables had the company ID in a candidate key.

0 notes &

It’s not nothing

I was giving a newbie developer some help today, and he sent me a code example like this:

IF EXISTS
(SELECT ID
 FROM dbo.MyTable
 WHERE ID = @ID)
BEGIN
 UPDATE dbo.MyTable SET Col1 = @Col1
END
ELSE
BEGIN
 INSERT dbo.MyTable (ID, Col1)
 VALUES (@ID, @Col1)
END;

(Actually, his code wasn’t even that good, but I cleaned up the issues I don’t want to address today.)  I would never use BEGIN…END where the “code blocks” are single statements; they’re not necessary and just add noise to the code.  The other issue I’ve included is his use of a column identifier when testing for existence; just let the optimizer decide what column (i.e., index) to use in a case like this:

IF EXISTS
(SELECT *
 FROM dbo.MyTable
 WHERE ID = @ID)
UPDATE dbo.MyTable SET Col1 = @Col1
WHERE Col1 <> @Col1
ELSE
INSERT dbo.MyTable (ID, Col1)
VALUES (@ID, @Col1);

Note that I also added a WHERE clause to the UPDATE statement so that the row is untouched if the value isn’t changing.  That will prevent blocking data and logging a transaction if the value isn’t changed.

I suggested he instead use the MERGE statement, assuming the application won’t be supported on SQL Server versions prior to SQL Server 2008:

MERGE dbo.MyTable m
USING
(SELECT @ID, @Col1) t (ID, Col1) ON m.ID = t.ID
WHEN MATCHED AND m.Col1 M <> t.Col1 THEN
UPDATE SET Col1 = t.Col1
WHEN NOT MATCHED THEN
INSERT (ID, Col1)
VALUES (t.ID, t.Col1);

I won’t cover the advantages of MERGE in this post.

That looks fine, but…  What if the table is something like this:

CREATE TABLE dbo.MyTable(
ID int PRIMARY KEY,
Phone varchar(15) NULL);

Yes; that’s an oversimplified, nonsense schema, but bear with me.

It’s always a good idea to explicitly define columns as nullable or not.  The PRIMARY KEY constraint on the first column means it is non-nullable.  The second column (Phone) is explicitly defined as nullable, because we may not know what the phone number is.  It may be unlisted, or perhaps we asked but received no answer.  In any case we want to distinguish among phone numbers that are blank (meaning no phone number) and phone numbers that are unknown.  In case you don’t know, NULL means missing, i.e., unknown, and one must use ternary logic with such values.  By using NULL when the value is unknown, we can make this distinction, e.g.:

SELECT ID
FROM dbo.MyTable
WHERE Phone = ”;

Will give different results from:

SELECT ID
FROM dbo.MyTable
WHERE Phone IS NULL;

(One cannot use WHERE Phone = NULL in SQL because that implies equivalence, and the language is strict enough to remind you by not allowing such syntax.)

So the first result set will be rows where the phone is known to not exist, and the second result set should be those rows where we don’t know what the phone number is (or even if there is one).  If you don’t need to distinguish information like this, just create your columns as NOT NULL.  Please.

Now look back at my “improvement” to the above UPDATE statement, rewriting it for the table:

UPDATE dbo.MyTable SET Phone = @Phone
WHERE Phone <> @Phone

What happens if we didn’t know the phone number, but now have gotten one?  I.e., the Phone column has the NULL value, but now we are updating it with a received phone number?  Well, the above code will not update the row.  Because the value is unknown, a predicate like NULL <> ‘2125551212’ will not return true (or false), but instead its result is also unknown (NULL).  If we don’t know the phone number, how can we say what it isn’t?

What if we get clever and restate our predicate?  If we want this to work where the value isn’t equal to the new value, we can instead ask for the rows where that’s the case:

UPDATE dbo.MyTable SET Phone = @Phone
WHERE NOT Phone = @Phone

No; that doesn’t work either.  We still are asking about whether a known value “equals” an unknown.  The negation operator with unknown (NULL) still returns unknown.  As Phone is unknown, it may actually be the same value as @Phone; we just don’t know.

The boolean logic of just Trues and Falses doesn’t work when NULLs are involved.  There is a third possibility where we don’t know if the result is true or false.  Here’s an example of ternary logic to work with NULLs:

UPDATE dbo.MyTable SET Phone = @Phone
WHERE Phone <> @Phone OR ((Phone IS NULL AND @Phone IS NOT NULL) OR (Phone IS NOT NULL AND @Phone IS NULL))

Slightly less typing:

UPDATE dbo.MyTable SET Phone = @Phone
WHERE Phone <> @Phone OR ((Phone IS NULL OR @Phone IS NULL) AND ISNULL(Phone, @Phone) IS NOT NULL)

In the real world I am a proponent of constraints in the schema.  For a table holding phone numbers, I would definitely use a CHECK constraint on the column:

ALTER TABLE dbo.MyTable
ADD CONSTRAINT ValidPhoneNumber CHECK (Phone = ” OR (LEN(Phone) > 9 AND Phone NOT LIKE ‘%[^0-9]%’));

That will prevent the insertion of many invalid values, but still allow empty strings for rows where there is no phone number.  Note that NULL values are also allowed (because we defined the column as nullable), even though we didn’t explicitly address NULLs in this constraint.

By having a hard and fast constraint on the column, we can also use a simpler form of ternary logic with our UPDATE statement:

UPDATE dbo.MyTable SET Phone = @Phone
WHERE ISNULL(Phone, ‘A’) <> ISNULL(@Phone, ‘A’)

Because the constraint prevents there being a value such as ’A’ in that column, we can substitute this impossible value for any NULLs in our predicate.

Please note that all the above ternary code also allows one to update a row with a known value to a null value, e.g., we had a phone number, but tried it and it didn’t work, so we need to change the value to unknown (NULL).

Also note that this type of logic isn’t limited to the case I began with; virtually any DML statement may need ternary logic when nullable columns are involved.

0 notes &

Column lists for INSERTs

I’ve been forced to develop on SQL Server 2000 recently, so I was reminded about one nice thing about Query Analyzer (QA).

I worked out my own form of pretty SQL code formatting many years ago, and I still like my style better than anything else I’ve seen.  Here’s an example:

SELECT a.Col1, b.Col2, c.Col3 FROM dbo.TableA a JOIN dbo.TableB b ON a.PKCol = b.FKCol JOIN (SELECT FKCol, SUM(ValCol) Tot FROM dbo.TableC GROUP BY PKCol) c ON b.PKCol = c.FKCol;

I like compact code (and I’m no C# developer), so I can see as much as possible on the screen without scrolling.  My single character indents work just fine with SQL, particularly with a fixed-pitch typeface.  Plus, this prolix post to the contrary, I tend to be terse.

I’m working with an ERP database, and there are close to a hundred columns in many of the tables, with virtually no DEFAULT constraints (or nullable columns).  This means that INSERT lists are huge and, particularly for a two-fingered typist like me, a pain to create.  In QA, I can just script the table to the clipboard “As Insert” and paste it into my code.  (Then I delete the useless VALUES clause and replace all the [s and ]s (identifier delimiters) with empty strings, but that’s just me…)  In SQL Server Management Studio (SSMS), the code beautifier insists on each column name on its own line, so I’d have to spend a lot of time removing all that extra whitespace.  Otherwise I’d spend more time scrolling up and down the editor window just seeing which column is next on the list.

In SSMS, one can expand the table object in the Object Explorer and see the Columns category.  Then drag the word Columns into the editor window and one has a nice, comma-delimited list of the columns (without delimiters).  That’s very useful to me with SELECT lists, but it doesn’t have the intelligence of the scripter, so I then must delete columns that cannot take inserts.  I wrote a little UDF…

CREATE FUNCTION dbo.InsertCols(@table sysname) RETURNS varchar(max) AS BEGIN; RETURN 'INSERT dbo.' + @table + STUFF((SELECT ', ' + CASE WHEN name LIKE '% %' THEN '[' + name + ']' ELSE name  END FROM sys.columns WHERE is_identity = 0 AND is_rowguidcol = 0 AND is_computed = 0 AND system_type_id <> 189 AND object_id = OBJECT_ID(@table) ORDER BY column_id FOR XML PATH('')),1,2,'(') + ')'; END;

This lists all insertable columns (system_type_id 189 is rowversion), delimits only the identifiers that have embedded spaces, and even saves me from having to type the beginning of the clause.

0 notes &

tumblrbot asked: WHERE WOULD YOU MOST LIKE TO VISIT ON YOUR PLANET?

The Great Barrier Reef, before one of us is gone.

0 notes &

Really stupid trick

Just to start things off, here’s a no-brainer. If you typically go into SQL Server Management Studio (yes; this blog is primarily about the Microsoft product) to open a query window and code, change your default database to tempdb (Security, Logins, choose you (or your group) and then Properties). That way you won’t be inadvertently creating test objects in master (the default default).

Any objects you create in tempdb will also self-destruct when you close the session, which can be a good thing. This is particularly useful with any local instances on your development machine (you know—the one with the Developer edition of SQL Server).

Alternatively, if this is your own server, create a “play” database and make that your default. Then all your clever DDL won’t be lost or misplaced.