Thursday, 8 September 2011

CROSS APPLY and OUTER APPLY

ProblemSQL Server 2005 introduced the APPLY operator, which is very much like a join clause and which allows joining between two table expressions i.e. joining a left/outer table expression with a right/inner table expression. The difference between join and APPLY operator becomes evident when you have a table-valued expression on the right side and you want this table-valued expression to be evaluated for each row from the left table expression. In this tip I am going to demonstrate what APPLY operator is, how it differs from regular JOINs and what are few of its applications.

SolutionThe APPLY operator allows you to join two table expressions; the right table expression is processed every time for each row from the left table expression. As you might have guessed, the left table expression is evaluated first and then right table expression is evaluated against each row of the left table expression for final result-set. The final result-set contains all the selected columns from the left table expression followed by all the columns of right table expression.

The APPLY operator comes in two variants, the CROSS APPLY and the OUTER APPLY. The CROSS APPLY operator returns only those rows from left table expression (in its final output) if it matches with right table expression. In other words, the right table expression returns rows for left table expression match only. Whereas the OUTER APPLY operator returns all the rows from left table expression irrespective of its match with the right table expression. For those rows for which there are no corresponding matches in right table expression, it contains NULL values in columns of right table expression. So you might now conclude, the CROSS APPLY is semantically equivalent to INNER JOIN (or to be more precise its like a CROSS JOIN with a correlated sub-query) with a implicit join condition of 1=1 whereas OUTER APPLY is semantically equivalent to LEFT OUTER JOIN.

You might be wondering if the same can be achieved with regular JOIN clause then why and when to use APPLY operator? Though the same can be achieved with normal JOIN, the need of APPLY arises if you have table-valued expression on right part and also in some cases use of APPLY operator boost the performance of your query. Let me explain you with help of some examples.

http://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/

Filtered Index

Filtered Index Design Guidelines

A filtered index is an optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes.

Filtered indexes can provide the following advantages over full-table indexes:

Improved query performance and plan quality
A well-designed filtered index improves query performance and execution plan quality because it is smaller than a full-table nonclustered index and has filtered statistics. The filtered statistics are more accurate than full-table statistics because they cover only the rows in the filtered index.
Reduced index maintenance costs
An index is maintained only when data manipulation language (DML) statements affect the data in the index. A filtered index reduces index maintenance costs compared with a full-table nonclustered index because it is smaller and is only maintained when the data in the index is affected. It is possible to have a large number of filtered indexes, especially when they contain data that is affected infrequently. Similarly, if a filtered index contains only the frequently affected data, the smaller size of the index reduces the cost of updating the statistics.
Reduced index storage costs
Creating a filtered index can reduce disk storage for nonclustered indexes when a full-table index is not necessary. You can replace a full-table nonclustered index with multiple filtered indexes without significantly increasing the storage requirements.

Wednesday, 7 September 2011

NEWSEQUENTIALID()

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.

NEWSEQUENTIALID() can only be used with DEFAULT constraints on table columns of type uniqueidentifier. For example:

CREATE TABLE myTable (ColumnA uniqueidentifier DEFAULT NEWSEQUENTIALID())

Surrogate keys

A surrogate key is a row identifier that has no connection to the data attributes in the row but simply makes the whole row unique. And that property is also the downside of it. Because it has no connection to the data attributes we can have two rows with the exact same data in all columns except the key column. This is usually handled at the application side and is an acceptable downside.

Partitioned Table and Index Strategies Using SQL Server 2008

Partitioned Table and Index Strategies Using SQL Server 2008

Partitioning makes large tables or indexes more manageable, because partitioning enables you to manage and access subsets of data quickly and efficiently, while maintaining the integrity of a data collection. By using partitioning, an operation such as loading data from an OLTP to an OLAP system takes only seconds, instead of the minutes and hours the operation takes in earlier versions of SQL Server. Maintenance operations that are performed on subsets of data are also performed more efficiently because these operations target only the data that is required, instead of the whole table.

The data of partitioned tables and indexes is divided into units that can be spread across more than one filegroup in a database. The data is partitioned horizontally, so that groups of rows are mapped into individual partitions. The table or index is treated as a single logical entity when queries or updates are performed on the data. All partitions of a single index or table must reside in the same database.

Partitioned tables and indexes support all the properties and features associated with designing and querying standard tables and indexes, including constraints, defaults, identity and timestamp values, and triggers. Therefore, if you want to implement a partitioned view that is local to one server, you might want to implement a partitioned table instead.

Deciding whether to implement partitioning depends primarily on how large your table is or how large it will become, how it is being used, and how well it is performing against user queries and maintenance operations.

Generally, a large table might be appropriate for partitioning if both of the following are true:

The table contains, or is expected to contain, lots of data that are used in different ways.
Queries or updates against the table are not performing as intended, or maintenance costs exceed predefined maintenance periods.

http://msdn.microsoft.com/en-us/library/ms190787.aspx

SET TRANSACTION ISOLATION LEVEL

SET TRANSACTION ISOLATION LEVEL
    { READ UNCOMMITTED
    | READ COMMITTED
    | REPEATABLE READ
    | SNAPSHOT
    | SERIALIZABLE
    }
[ ; ]

READ UNCOMMITTED

Specifies that statements can read rows that have been modified by other transactions but not yet committed.

READ COMMITTED

Specifies that statements cannot read data that has been modified but not committed by other transactions. This prevents dirty reads. Data can be changed by other transactions between individual statements within the current transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default.

REPEATABLE READ

Specifies that statements cannot read data that has been modified but not yet committed by other transactions and that no other transactions can modify data that has been read by the current transaction until the current transaction completes.

SNAPSHOT

Specifies that data read by any statement in a transaction will be the transactionally consistent version of the data that existed at the start of the transaction. The transaction can only recognize data modifications that were committed before the start of the transaction. Data modifications made by other transactions after the start of the current transaction are not visible to statements executing in the current transaction. The effect is as if the statements in a transaction get a snapshot of the committed data as it existed at the start of the transaction.

SERIALIZABLE

Specifies the following:

Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.

http://msdn.microsoft.com/en-us/library/ms173763.aspx

Only one of the isolation level options can be set at a time, and it remains set for that connection until it is explicitly changed. All read operations performed within the transaction operate under the rules for the specified isolation level unless a table hint in the FROM clause of a statement specifies different locking or versioning behavior for a table.

Common Table Expressions

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

A CTE can be used to:

Create a recursive query. For more information, see Recursive Queries Using Common Table Expressions.
Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.
Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.
Reference the resulting table multiple times in the same statement.

Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.

CTEs can be defined in user-defined routines, such as functions, stored procedures, triggers, or views.

http://msdn.microsoft.com/en-us/library/ms190766.aspx

Example:

USE AdventureWorks2008R2;
GO
-- Define the CTE expression name and column list.
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
AS
-- Define the CTE query.
(
    SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
    FROM Sales.SalesOrderHeader
    WHERE SalesPersonID IS NOT NULL
)
-- Define the outer query referencing the CTE name.
SELECT SalesPersonID, COUNT(SalesOrderID) AS TotalSales, SalesYear
FROM Sales_CTE
GROUP BY SalesYear, SalesPersonID
ORDER BY SalesPersonID, SalesYear;
GO

FILESTREAM Storage in SQL Server 2008

Much data is unstructured, such as text documents, images, and videos. This unstructured data is often stored outside the database, separate from its structured data. This separation can cause data management complexities. Or, if the data is associated with structured storage, the file streaming capabilities and performance can be limited.

FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.

FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

http://technet.microsoft.com/en-us/library/bb933993.aspx

FOR XML Clause

Basic Syntax of the FOR XML Clause

The FOR XML mode can be RAW, AUTO, EXPLICIT, or PATH. It determines the shape of the resulting XML.

use to get output in XML format.

The FOR XML clause appears at the end of your SELECT statement. It provides four modes of XML generation which are RAW, AUTO, PATH and EXPLICIT. Let’s see what these options are.

FOR XML RAW:

It is useful for ad hoc FOR XML querying when the structure of the resultant Xml is not known from beginning. While using this you have to be very careful because the result of this can change dramatically when the underlying data table structure is changed.

FOR XML AUTO:

This is also useful for ad hoc FOR XML querying. This is mainly useful when you need to map the XML result back to the original columns in the source table. Its result can also change dramatically in above case.

FOR XML PATH:

This is designed for explicitly defining XML result structure. This is more appropriate option than above two while you are working in production environment because you always know the resultant XML structure.

FOR XML EXPLICIT:

This is original method for explicitly defining XML result structure. The operation of FOR XML EXPLICIT is more complex and less intuitive than FOR XML PATH. Better to use PATH when you know the structure of your resultant XML.

REMEMBER:

When you select data from SQL Server using FOR XML it will just create an XML fragment and to make it well formed XML you need to add ROOT elements to this fragment. You can add ROOT using FOR XML as well.

When the TYPE option is specified resultant data will be xml data type instance when you want to nest FOR XML option which inserts an inline XML schema to beginning of your XML result.

Understanding PATH Mode

Here I will show you couple of SQL Queries with the use of PATH mode. I am using Adventure works database and SQL server 2008 Enterprise Evolution version these two things can be found at following locations.

http://www.a2zdotnet.com/View.aspx?Id=69

Example:

select * from sim_details(nolock) where MSISDN like '447586653622'

FOR XML PATH,

ELEMENTS XSINIL

result will be in XML format

Types of data compression in SQL Server 2008

Row Compression and Page Compression:

Row Compression:

SQL Server 2008 addresses this issue by introducing a new record format to be used for compressed data that minimizes the overhead of storing the variable length values. It uses only 4 bits of overhead to store the length of the variable length column that is less than or equal to 8 bytes. Note that the previous record format is fully supported in SQL Server 2008 and is the default record format.

page Compression:

Now, if you insert multiple rows into this table, it is possible that many rows will have the same value (i.e. the default) for status column. SQL Server can take advantage of this by storing ‘full time’ value once on the page and then referring to this value from all other occurrences of this value on the same page. As you can imagine, the space savings using this strategy will depend on the amount redundant data on the page.

This strategy is exposed as PAGE compression externally through DDL. A customer can choose to enable ROW or PAGE compression on a table or an index or even on an individual partition(s). PAGE compression includes ROW compression.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2007/11/12/types-of-data-compression-in-sql-server-2008.aspx

SET XACT_ABORT

Specifies whether SQL Server automatically rolls back the current transaction when a Transact-SQL statement raises a run-time error.

When SET XACT_ABORT is ON, if a Transact-SQL statement raises a run-time error, the entire transaction is terminated and rolled back.

When SET XACT_ABORT is OFF, in some cases only the Transact-SQL statement that raised the error is rolled back and the transaction continues processing. Depending upon the severity of the error, the entire transaction may be rolled back even when SET XACT_ABORT is OFF. OFF is the default setting.

Compile errors, such as syntax errors, are not affected by SET XACT_ABORT.

XACT_ABORT must be set ON for data modification statements in an implicit or explicit transaction against most OLE DB providers, including SQL Server.

The only case where this option is not required is if the provider supports nested transactions.

Example:

SET XACT_ABORT OFF;
GO
BEGIN TRANSACTION;
INSERT INTO t2 VALUES (1);
INSERT INTO t2 VALUES (2); -- Foreign key error.
INSERT INTO t2 VALUES (3);
COMMIT TRANSACTION;
GO
SET XACT_ABORT ON;
GO
BEGIN TRANSACTION;
INSERT INTO t2 VALUES (4);
INSERT INTO t2 VALUES (5); -- Foreign key error.
INSERT INTO t2 VALUES (6);
COMMIT TRANSACTION;
GO

FORCESEEK Table Hint

ForceSeek force the query to query only from Index seek.

The FORCESEEK table hint forces the query optimizer to use only an index seek operation as the access path to the data in the table or view referenced in the query. You can use this table hint to override the default plan chosen by the query optimizer to avoid performance issues caused by an inefficient query plan.

For example, if a plan contains table or index scan operators, and the corresponding tables cause a high number of reads during the execution of the query, as observed in the STATISTICS IO output, forcing an index seek operation may yield better query performance. This is especially true when inaccurate cardinality or cost estimations cause the optimizer to favor scan operations at plan compilation time.

FORCESEEK applies to both clustered and nonclustered index seek operations. It can be specified for any table or view in the FROM clause of a SELECT statement and in the FROM <table_source> clause of an UPDATE or DELETE statement.

Example:

SELECT *
FROM Sales.SalesOrderHeader AS h
INNER JOIN Sales.SalesOrderDetail AS d WITH (FORCESEEK)
    ON h.SalesOrderID = d.SalesOrderID 
WHERE h.TotalDue > 100
AND (d.OrderQty > 5 OR d.LineTotal < 1000.00);
GO

http://msdn.microsoft.com/en-us/library/bb510478.aspx