We will take a look at the following new data types, each of
which is available in all editions of SQLServer 2008:
•
Date and Time: Four new date and time data types have been
added, making working with
time much easier than it ever has in the
past. They include: DATE, TIME, DATETIME2, and
DATETIMEOFFSET.
• Spatial: Two new spatial data types have been
added--GEOMETRY and GEOGRAPHY--
which you can use to natively store
and manipulate location-based information, such as Global
Positioning
System (GPS) data.
• HIERARCHYID:
The HIERARCHYID data type is used to enable database applications to
model
hierarchical tree structures, such as the organization chart of a
business.
• FILESTREAM:
FILESTREAM is not a data type as such, but is a variation of the
VARBINARY(MAX)
data type that allows unstructured data to be stored in the file system
instead
of inside the SQL Server database. Because this option requires a lot
of involvement
from both the DBA administration and development side,
I will spend more time on this topic
than the rest.
Date and Time
In SQL
Server 2005 and earlier, SQL Server only offered two date and time data
types: DATETIME
and SMALLDATETIME. While they were useful in many
cases, they had a lot of limitations,
including:
- Both the date value and the time
value are part of both of these data types, and you can’t choose to
store one or the other. This can cause several problems:
- It often causes a lot of wasted
storage because you store data you don’t need or want.
- It adds unwanted complexity to
many queries because the data types often have to be converted to a
different form to be useful.
- It often reduces performance because WHERE clauses
with these data and time data types often have to include functions to
convert them to a more useful form, preventing these queries from using
indexes.
- They
are not time-zone aware, which requires extra coding for time-aware
applications.
- Precision is only .333 seconds, which is not granular enough
for some applications.
- The range of supported dates is not adequate for some
applications, and the range does not match the range of the .NET CLR
DATETIME data type, which requires additional conversion code.
To
overcome these problems, SQL Server 2008 introduces four new date and
time data types, described in the following sections. All of
these new date and time data types work with SQL Server 2008 date and time
functions, which have been enhanced in order to properly understand the
new In
addition, some new date and time functions have been added to take
advantage of the capabilities of these new data types. The
new functions include SYSDATETIME, TODATETIMEOFFSET, SYSUTCDATETIME, and
DATE.
As you can imagine, the DATE data type only
stores a date in the format of YYYY-MM-DD. It has
a range of 0001-01-01
through 9999-12-32, which should be adequate for most business and
scientific
applications. The accuracy is 1 day, and it only takes 3 bytes to store
the date.
--Sample DATE output
DECLARE @datevariable
as DATE
SET @datevariable =
getdate()
PRINT @datevariable
Result: 2008-08-15
TIME
TIME is stored in the
format: hh:mm:ss.nnnnnnn, with a range of 00:00:00.0000000 through
23:59:59:9999999 and is
accurate to 100 nanoseconds. Storage depends on the precision and scale
selected,
and runs from 3 to 5 bytes.
--Sample TIME output
DECLARE @timevariable
as TIME
SET @timevariable =
getdate()
PRINT @timevariable
Result:
14:26:52.3100000
DATETIME2
DATETIME2 is very
similar to the older DATETIME data type, but has a greater range and
precision. The format
is YYYY-MM-DD hh:mm:ss:nnnnnnnm with a range of 0001-01-01
00:00:00.0000000
through 9999-12-31 23:59:59.9999999, with an accuracy of 100
nanoseconds. depends on the precision and scale
selected, and runs from 6 to 8 bytes.
--Sample DATETIME2
output with a precision of 7
DECLARE @datetime2variable datetime2(7)
SET @datetime2variable =
Getdate()
PRINT
@datetime2variable
Result: 2008-08-15 14:27:51.5300000
DATETIMEOFFSET
DATETIMEOFFSET is
similar to DATETIME2, but includes additional information to track the
time zone. The format
is YYYY-MM-DD hh:mm:ss[.nnnnnnn] [+|-]hh:mm with a range of 0001-
01-01 00:00:00.0000000
through 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999.
Universal Time (UTC), with an accuracy of 100 nanoseconds. Storage
depends on the and scale selected, and runs from 8 to 10
bytes.
zone aware means a time zone identifier is stored as a part of
DATETIMEOFFSET column. time zone identification is represented by
a [-|+] hh:mm designation. A valid time zone falls in range of -14:00 to
+14:00, and this value is added or subtracted from UTC to obtain the
local
--Sample DATETIMEOFFSET
output with a precision of 0
--Specify a date, time, and time zone
DECLARE
@datetimeoffsetvariable DATETIMEOFFSET(0)
SET @datetimeoffsetvariable = '2008-10-03
09:00:00 -10:00'
--Specify a different date, time and time
zone
DECLARE
@datetimeoffsetvariable1 DATETIMEOFFSET(0)
SET @datetimeoffsetvariable1 = '2008-10-04
18:00:00 +0:00'
--Find the difference
in hours between the above dates, times,
--and timezones
SELECT
DATEDIFF(hh,@datetimeoffsetvariable,@datetimeoffsetvariable1)
Result: 23
Spatial
While spatial data has been stored in many
SQL Server databases for many years (using conventional data types), SQL Server
2008 introduces two specific spatial data types that can make it easier
for developers
to integrate spatial data in their SQL Server-based applications. In
addition, by storing spatial data in relational tables, it
becomes much easier to combine spatial data with other kinds of business data. For
example, by combining spatial data (such as longitude and latitude) with
the physical
address of a business, applications can be created to map business
locations on a map.
The two new spatial data types in SQL 2008
are:
• GEOMETRY: Used to store planar
(flat-earth) data. It is generally used to store XY coordinates that
represent points, lines, and polygons in a two-dimensional space. For
example storing XY coordinates
in the GEOMETRY data type can be used to map the exterior of a building.
• GEOGRAPHY:
Used to store ellipsoidal (round-earth) data. It is used to store
latitude and longitude coordinates
that represent points, lines, and polygons on the earth’s surface. For example, GPS data that
represents the lay of the land is one example of data that can be stored
in the GEOGRAPHY data
type.
GEOMETRY and GEOGRAPHY
data types are implemented as .NET CLR data types. This means that they can support
various properties and methods specific to the data. For example, a
method can
be used to calculate the distance between two GEOMETRY XY coordinates,
or the distance between two GEOGRAPHY latitude and
longitude coordinates. Another example is a method to see if two spatial objects
intersect or not. Methods defined by the Open Geospatial Consortium standard, and
Microsoft extensions to that standard, can be used. To take full
advantage of these methods, you will have to be an expert in
spatial data.Another feature of spatial data types is
that they support special spatial indexes. Unlike conventional indexes, spatial
indexes consist of a grid-based hierarchy in which each level of the
index subdivides the grid sector that is defined in the
level above. But like conventional indexes, the SQL Server query optimizer can use
spatial indexes to speed up the performance of queries that return
spatial data.Spatial data is an area unfamiliar to many
DBAs. If this is a topic you want to learn more about, you will need a good math
background, otherwise you will get lost very quickly.
HIERARCHYID
While hierarchical tree
structures are commonly used in many applications, SQL Server has, up
to not
made it easy to represent and store them in relational tables. In SQL
Server 2008, the HIERARCHYID data type has been added
to help resolve this problem. It is designed to store that represent the
position of nodes in a hierarchal tree structure. For example, the
HIERARCHYID data type makes it easier to express the following types of
relationships
without requiring multiple parent/child tables and complex joins:
- Organizational
structures
- A set of tasks that
make up a larger projects (like a GANTT chart)
- File
systems (folders and their sub-folders)
- A
classification of language terms
- A bill of materials to
assemble or build a product
- A graphical
representation of links between web pages
Unlike
standard data types, the HIERARCHYID data type is a CLR
user-defined type, and it exposes many methods that allow you to manipulate
the date stored within it. For example, there are methods to get the current
hierarchy level, get the previous level, get the next level, and many
more. In fact, the HIERARCHYID data type is only used to
store hierarchical data; it does not automatically represent a
hierarchical structure. It is the responsibility of the application to
create and assign HIERARCHYID values in a way that represents
the desired relationship. Think of a HIERARCHYID data type as a place to store
positional nodes of a tree structure, not as a way to create the tree
structure.
FILESTREAM
SQL Server is great for
storing relational data in a highly structured format, but it has never
been particularly
good at storing unstructured data, such as videos, graphic files, Word
documents, Excel spreadsheets, and so on. In the past, when
developers wanted to use SQL Server to manage such unstructured data, they
essentially had two choices:
- Store it in
VARBINARY(MAX) columns inside the database
- Store
the data outside of the database as part of the file system, and include
pointers inside a column that pointed to the file’s
location. This allowed an application that needed access to the file to find it
by looking up the file’s location from inside a SQL Server table.Neither
of these options was perfect. Storing unstructured data in
VARBINARY(MAX) columns offers less than ideal performance, has a 2
GB size limit, and can dramatically increase the size of a database. Likewise,
storing unstructured data in the file system requires the DBA to
overcome several difficulties.
For
example:
- Files
have a unique naming system that allows hundreds, if not thousands of
files to be keep track of and requires very careful
management of the folders to store the data.
- Security
is a problem and often requires using NTFS permissions to keep people
from accessing the files inappropriately.
- The DBA
has to perform separate backups of the database and the files
- Problems
can occur when outside files are modified or moved and the database is
not updated to reflect this.
To help resolve these
problems, SQL Server 2008 has introduced what is called FILESTREAM storage, essentially a
hybrid approach that combines the best features of the previous two
options.
Benefits of FILESTREAM
FILESTREAM storage
is implemented in SQL Server 2008 by storing VARBINARY(MAX) binary large objects (BLOBs)
outside of the database and in the NTFS file system. While this sounds
very similar
to the older method of storing unstructured data in the file system and
pointing to it from a column, it is much more sophisticated.
Instead of a simple link from a column to an outside file, the SQL Server Database
Engine has been integrated with the NTFS file system for optimum performance and ease
of administration. For example, FILESTREAM data uses the Windows
OS system
cache for caching data instead of the SQL Server buffer pool. This
allows SQL Server to do what it does best: manage structured data,
and allows the Windows OS to do what is does best: manage large files. In
addition, SQL Server handles all of the links between database columns
and the
files, so we don’t have to. In addition, FILESTREAM storage
offers these additional benefits:
- Transact-SQL can be
used to SELECT, INSERT, UPDATE, DELETE FILESTREAM data.
- By
default, FILESTREAM data is backed up and restored as part of the
database file. If you want, there is an option available so you can
backup a database without the FILESTREAM data.
- The
size of the stored data is only limited by the available space of the
file system. Standard VARBINARY(MAX) data is limited to 2 GB.
Limitations
of FILESTREAM
As you might expect, using FILESTREAM
storage is not right for every situation. For example, it is best used under the
following conditions:
- When the BLOB file
sizes average 1MB or higher.
- When fast read access
is important to your application.
- When applications are
being built that use a middle layer for application logic.
- When
encryption is not required, as it is not supported for FILESTREAM data.
If your
application doesn’t meet the above conditions, then using the standard
VARBINARY(MAX) data type might be your best option. If you are used to
storing binary data inside your database, or outside your database (but
using pointers
inside the database that point to the binary files), then you will find
using FILESTREAM storage to be substantially different. You
will want to thoroughly test your options before implementing one option
or the other, in any new applications you build.
How to Implement
FILESTREAM Storage Enabling SQL Server to use FILESTREAM data
is a multiple-step process, which includes:
- Enabling
the SQL Server instance to use FILESTREAM data
- Enabling
a SQL Server database to use FILESTREAM data
- Creating
FILESTREAM-enabled columns in a table, by specifying the
"VARBINARY(MAX) FILESTREAM" data type.