I was giving a newbie developer some help today, and he sent me a code example like this:
IF EXISTS
(SELECT ID
FROM dbo.MyTable
WHERE ID = @ID)
BEGIN
UPDATE dbo.MyTable SET Col1 = @Col1
END
ELSE
BEGIN
INSERT dbo.MyTable (ID, Col1)
VALUES (@ID, @Col1)
END;
(Actually, his code wasn’t even that good, but I cleaned up the issues I don’t want to address today.) I would never use BEGIN…END where the “code blocks” are single statements; they’re not necessary and just add noise to the code. The other issue I’ve included is his use of a column identifier when testing for existence; just let the optimizer decide what column (i.e., index) to use in a case like this:
IF EXISTS
(SELECT *
FROM dbo.MyTable
WHERE ID = @ID)
UPDATE dbo.MyTable SET Col1 = @Col1
WHERE Col1 <> @Col1
ELSE
INSERT dbo.MyTable (ID, Col1)
VALUES (@ID, @Col1);
Note that I also added a WHERE clause to the UPDATE statement so that the row is untouched if the value isn’t changing. That will prevent blocking data and logging a transaction if the value isn’t changed.
I suggested he instead use the MERGE statement, assuming the application won’t be supported on SQL Server versions prior to SQL Server 2008:
MERGE dbo.MyTable m
USING
(SELECT @ID, @Col1) t (ID, Col1) ON m.ID = t.ID
WHEN MATCHED AND m.Col1 M <> t.Col1 THEN
UPDATE SET Col1 = t.Col1
WHEN NOT MATCHED THEN
INSERT (ID, Col1)
VALUES (t.ID, t.Col1);
I won’t cover the advantages of MERGE in this post.
That looks fine, but… What if the table is something like this:
CREATE TABLE dbo.MyTable(
ID int PRIMARY KEY,
Phone varchar(15) NULL);
Yes; that’s an oversimplified, nonsense schema, but bear with me.
It’s always a good idea to explicitly define columns as nullable or not. The PRIMARY KEY constraint on the first column means it is non-nullable. The second column (Phone) is explicitly defined as nullable, because we may not know what the phone number is. It may be unlisted, or perhaps we asked but received no answer. In any case we want to distinguish among phone numbers that are blank (meaning no phone number) and phone numbers that are unknown. In case you don’t know, NULL means missing, i.e., unknown, and one must use ternary logic with such values. By using NULL when the value is unknown, we can make this distinction, e.g.:
SELECT ID
FROM dbo.MyTable
WHERE Phone = ”;
Will give different results from:
SELECT ID
FROM dbo.MyTable
WHERE Phone IS NULL;
(One cannot use WHERE Phone = NULL in SQL because that implies equivalence, and the language is strict enough to remind you by not allowing such syntax.)
So the first result set will be rows where the phone is known to not exist, and the second result set should be those rows where we don’t know what the phone number is (or even if there is one). If you don’t need to distinguish information like this, just create your columns as NOT NULL. Please.
Now look back at my “improvement” to the above UPDATE statement, rewriting it for the table:
UPDATE dbo.MyTable SET Phone = @Phone
WHERE Phone <> @Phone
What happens if we didn’t know the phone number, but now have gotten one? I.e., the Phone column has the NULL value, but now we are updating it with a received phone number? Well, the above code will not update the row. Because the value is unknown, a predicate like NULL <> ‘2125551212’ will not return true (or false), but instead its result is also unknown (NULL). If we don’t know the phone number, how can we say what it isn’t?
What if we get clever and restate our predicate? If we want this to work where the value isn’t equal to the new value, we can instead ask for the rows where that’s the case:
UPDATE dbo.MyTable SET Phone = @Phone
WHERE NOT Phone = @Phone
No; that doesn’t work either. We still are asking about whether a known value “equals” an unknown. The negation operator with unknown (NULL) still returns unknown. As Phone is unknown, it may actually be the same value as @Phone; we just don’t know.
The boolean logic of just Trues and Falses doesn’t work when NULLs are involved. There is a third possibility where we don’t know if the result is true or false. Here’s an example of ternary logic to work with NULLs:
UPDATE dbo.MyTable SET Phone = @Phone
WHERE Phone <> @Phone OR ((Phone IS NULL AND @Phone IS NOT NULL) OR (Phone IS NOT NULL AND @Phone IS NULL))
Slightly less typing:
UPDATE dbo.MyTable SET Phone = @Phone
WHERE Phone <> @Phone OR ((Phone IS NULL OR @Phone IS NULL) AND ISNULL(Phone, @Phone) IS NOT NULL)
In the real world I am a proponent of constraints in the schema. For a table holding phone numbers, I would definitely use a CHECK constraint on the column:
ALTER TABLE dbo.MyTable
ADD CONSTRAINT ValidPhoneNumber CHECK (Phone = ” OR (LEN(Phone) > 9 AND Phone NOT LIKE ‘%[^0-9]%’));
That will prevent the insertion of many invalid values, but still allow empty strings for rows where there is no phone number. Note that NULL values are also allowed (because we defined the column as nullable), even though we didn’t explicitly address NULLs in this constraint.
By having a hard and fast constraint on the column, we can also use a simpler form of ternary logic with our UPDATE statement:
UPDATE dbo.MyTable SET Phone = @Phone
WHERE ISNULL(Phone, ‘A’) <> ISNULL(@Phone, ‘A’)
Because the constraint prevents there being a value such as ’A’ in that column, we can substitute this impossible value for any NULLs in our predicate.
Please note that all the above ternary code also allows one to update a row with a known value to a null value, e.g., we had a phone number, but tried it and it didn’t work, so we need to change the value to unknown (NULL).
Also note that this type of logic isn’t limited to the case I began with; virtually any DML statement may need ternary logic when nullable columns are involved.