Executive Summary:
Many Structured Query Language (SQL) functions seem similar at first glance but really aren’t. Learn the difference between SQL functions such as UNION and UNION ALL; INNER JOIN and OUTER JOIN; DISTINCT and GROUP BY; NVARCHAR, VARCHAR, and CHAR; and timestamp and GETDATE().
|
I frequently receive basic but interesting SQL questions
on my Web site. It’s heartening to see so
many junior DBAs and developers eager to thoroughly
understand the basics. In addition to helping new
technologists learn the fundamentals on my Web site,
I started this series to give monthly guidance in SQL
Server Magazine. I recommend SQL Server Books
Online (msdn2.microsoft.com/en-us/library/ms130214
.aspx) if you want to check the accuracy of definitions
and example syntax. I look forward to receiving questions
from readers. I’ll plan future columns based on your
concerns, so keep those emails coming!
Q: What is the difference between UNION and UNION
ALL?
A: Both UNION and UNION ALL combine result
sets of two SELECT statements. UNION performs a
SELECT DISTINCT operation on the final result set,
whereas UNION ALL combines both result sets without
removing any duplicate records.
For example, suppose Table 1’s result set is Red,
White,Blue,Orange, and Table 2’s result set is Red,Blue,
Purple,Brown. Running UNION on both result sets
gives you Red,White,Blue,Orange,Purple,Brown
(duplicate values removed); running UNION
ALL gives you Red,Red,White,Blue,Blue,Orange,
Purple,Brown (duplicate values not removed).
UNION ALL gives faster results than UNION
does. Therefore, use UNION ALL if you know that all
the records that will be
returned are unique.
Q: What is the difference
between INNER JOIN
and OUTER JOIN?
A: INNER JOIN
returns all rows from
both tables where there’s
a match on a common
field. (Typically, this is
the primary key in the
first table and a foreign
key in the second table.) OUTER JOIN returns
all the records from the
first table and, from the second table, only those records
in which there’s a match on the common key field.
OUTER JOIN is further divided into three types of
joins: LEFT OUTER JOIN, RIGHT OUTER JOIN,
and FULL OUTER JOIN. LEFT OUTER JOIN and
RIGHT OUTER JOIN are also known as LEFT JOIN
and RIGHT JOIN. FULL OUTER JOIN combines
LEFT OUTER JOIN and RIGHT OUTER JOIN.
LEFT OUTER JOIN returns all rows from the first table
whether or not there’s a match in the second table. For
those rows in the first table that have a match in the second
table, the join is completed normally. For those rows in
the first table that don’t have a match in the second table,
the values in the result set are padded out with NULL.
RIGHT OUTER JOIN works similarly, except all rows
from the second table are returned whether or not there’s
a match in the first table, and all missing values are padded
out with NULL. FULL OUTER JOIN returns all rows
from both tables, padding out missing values with NULL
where no match occurs.
Q: What is the difference between DISTINCT and
GROUP BY?
A: DISTINCT and GROUP BY clauses typically generate
the same execution plan. However, if a subquery is used,
the plans will differ. Use a GROUP BY clause to apply
aggregate operators to each column listed in the GROUP
BY clause. Use a DISTINCT clause in the SELECT statement
to remove any duplicate values returned in the result
set. Listings 1, Listing 2, and Listing 3 provide examples of DISTINCT
and GROUP BY clauses. (see .Zip File)
Q: What are the differences among NVARCHAR, VARCHAR,
and CHAR with respect to storage?
A: NVARCHAR requires 2 bytes to store a single character
(such as “A” or “a”). VARCHAR requires 1 byte to store
a single character. The NVARCHAR data type enables
multilingual characters because it represents all characters
in Unicode. The VARCHAR data type doesn’t support
multilingual characters. If you’re working only in English,
and you’re sure you’ll never translate any part of your
application into another language, then you might not need
to use NVARCHAR.
CHAR is a fixed-length data type, and the disk storage
size is equal to the maximum size of the column as defined when you created the table. VARCHAR is a variablelength
data type, and the storage size equals the length
of the data entered plus 2 bytes in the variable character
column offset array.
If you’re concerned about storage, use VARCHAR
rather than CHAR. For a thorough analysis, see Kalen
Delaney, Inside Microsoft SQL Server 2005: The Storage
Engine (Microsoft Press, 2006).
Q: What is the difference between timestamp and
GETDATE()?
A: You can’t compare timestamp to a GETDATE() value,
because timestamp is a data type and GETDATE() is a
function. A timestamp is a unique binary number generated
by SQL Server that doesn’t have anything to do with
actual time. GETDATE() is a nondeterministic function
that generates the current system date and time.
The timestamp data type is used mainly for marking
different versions of a row in a table. When any data in
a row are changed, the timestamp column is populated
with a new unique binary number. This newly generated
number compared with the previously stored timestamp
for the same row indicates that some data in the row has
been changed. If any actions (such as inserting rows in
other tables) are to be taken when particular rows are
updated, timestamp should be used. Because the timestamp
changes when rows are updated, its previously
stored value (in the other table) can be compared with the
new values, rows that are changed can be identified, and
actions can be taken accordingly.
GETDATE() returns the current system date and
time, which can be stored in a column that is defined
as a datetime data type. GETDATE() can be stored as
a smalldatetime data type (4 bytes) or datetime data
type (8 bytes).
End of Article