Wise man or Wise guy? You Decide

Just another WordPress.com weblog

Archive for the ‘Development’ Category

On vs. Where

Posted by sqlwiseguy on September 30, 2009

Does it matter if you put your criteria in the ON clause or the WHERE clause?  Well, as with most things SQL the answer is, “It depends”.  If you are dealing with INNER JOIN’s then it really doesn’t matter because the query optimizer is smart enough to come up with the same execution plan for both.  For example these 2 queries evaluate to the same execution plan:

SELECT
SOD.SalesOrderDetailID
,
SOH.SalesOrderID
FROM
Sales.SalesOrderHeader AS SOH
JOIN
Sales.SalesOrderDetail AS SOD
ON
SOH.SalesOrderID = SOD.SalesOrderID
WHERE
SOH.OrderDate >= '7/1/2004'
AND
SOH.OrderDate <
'8/1/2004'

SELECT
SOD.SalesOrderDetailID
,
SOH.SalesOrderID
FROM
Sales.SalesOrderHeader AS SOH
,
Sales.SalesOrderDetail AS SOD
WHERE
SOH.SalesOrderID = SOD.SalesOrderID
AND
SOH.OrderDate >= '7/1/2004'
AND
SOH.OrderDate < '8/1/2004'

Execution Plan

The old SQL 6.5 OUTER JOIN syntax (*= and =*) has been discontinued beginning with SQL Server 2005, so you have to do the JOIN for OUTER JOIN’s in the ON as demonstrated in this code:

SELECT

   SOD.SalesOrderDetailID,

   SOH.SalesOrderID

FROM

   Sales.SalesOrderHeader AS SOH LEFT JOIN

   Sales.SalesOrderDetail AS SOD

        ON SOH.SalesOrderID = SOD.SalesOrderID

WHERE

   SOH.OrderDate >=‘7/1/2004’ AND

   SOH.OrderDate <‘8/1/2004’

Now let’s create a sandbox to play in.

If OBJECT_ID('sales.orders', 'U') Is Not Null
Begin
Drop Table
sales.orders;
End;

If OBJECT_ID('sales.order_items', 'U') Is Not Null
Begin
Drop Table
sales.order_items;
End;

If SCHEMA_ID('sales') Is Not Null
Begin
Drop Schema
sales;
End;

Go

Create Schema
sales;

Go
/*
Tables to hold sample data
*/
Create Table sales.orders
(
order_id INT IDENTITY(1,1)PRIMARY KEY,
customer_id INT
);

Create Table sales.order_items
(
order_detail_id INT IDENTITY(1, 1)PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT
)

/*
Load Sample data
*/
INSERT INTO sales.orders (customer_id)
SELECT TOP 5
AO.[object_id]
FROM
sys.all_objects AS AO;

INSERT INTO sales.order_items
(
order_id,
product_id,
quantity
)
SELECT
1,
1,
7
Union ALL
Select
2,
1,
4
Union ALL
Select
3,
2,
6
Union ALL
Select
4,
2,
11
Union ALL
Select
5,
3,
1;

Now we want to return all the customers who have placed an order, but we only want to return the items where the quantity is greater than 5.  Here is method 1:
Select
O.customer_id,
OI.order_id,
OI.product_id,
OI.quantity
From
sales.orders AS O LEFT OUTER JOIN
sales.order_items AS OI ON
O.order_id = OI.order_id
Where
OI.quantity > 5;

This returns:

customer_id order_id    product_id  quantity
----------- ----------- ----------- -----------
3 1 1 7
7 3 2 6
8 4 2 11

Hmmm, we know we have orders from five customers, but this only returns the three rows.  Let’s look at the execution plan:

image

What’s that nest loops (inner join) operator?  Well, by putting the criteria for the RIGHT (second) table in the WHERE clause we essentially converted our LEFT OUTER JOIN to an INNER JOIN.  The correct way to get the data we want would be this way:

SELECT

    O.customer_id,

    OI.order_id,

    OI.product_id,

    OI.quantity

FROM

    sales.orders AS O LEFT OUTER JOIN

    sales.order_items AS OI ON

        O.order_id = OI.order_id AND

        OI.quantity > 5;

This returns what we would expect to see:

customer_id order_id    product_id  quantity
----------- ----------- ----------- -----------
3 1 1 7
5 NULL NULL NULL
7 3 2 6
8 4 2 11
17 NULL NULL NULL

And here is the execution plan:

image

Where you can see the Nested Loops (left outer join) operator.

So yes, it does matter where you put your criteria when dealing with OUTER JOINS.

Posted in Code Example, Development, Programming, SQL Server | Leave a Comment »

Venturing into Powershell

Posted by sqlwiseguy on September 23, 2009

As you probably know if you read my blog regularly, I recently put together a presentation called, Dive into the Default Trace, and as part of that presentation I wanted to provide examples of different ways to archive the Default Trace files.  Here are the option I considered:

  1. fn_trace_gettable to load the data into a table.
  2. SSIS to archive the files.
  3. PowerShell to archive the files.
  4. xp_cmdshell and the copy command.

I immediately eliminated #4 because I don’t have xp_cmdshell enabled and because Chad Miller (@cmille19) had said in his Powershell talk that anything that requires xp_cmdshell is a good candidate for Powershell.  So I sent Chad an email asking him where to start and he was kind enough to help out.  I got a “working” script together and sent it off to Chad for review and he kindly told me my errors (there were several) and included a modified script that cleaned up my code.  I did make a couple of minor modifications after receiving the script from Chad.  The idea and starting point were mine, even if much of the finished product is Chad’s.  So here’s the code (please comment if you have any improvements):

# Written with much Assistance from Chad Miller 
# (http://chadwickmiller.spaces.live.com/Blog)
param ($serverName
)

#######################
function
Get-SqlData
{
param([string]$serverName=$(throw 'serverName is required.'
),
[
string]$databaseName=$(throw 'databaseName is required.'
),
[
string]$query=$(throw 'query is required.'
))

Write-Verbose
"Get-SqlData serverName:$serverName
databaseName:$databaseName query:$query"

$connString =
"Server=$serverName;Database=$databaseName;
Integrated Security=SSPI;"
$da = New-Object "System.Data.SqlClient.SqlDataAdapter" ($query,$connString
)
$dt = New-Object
"System.Data.DataTable"
[void]$da.fill($dt
)
$dt

}
#Get-SqlData

#######################
function
Get-TraceInfo
{
param($serverName
)

$qry =
@"
SELECT
RIGHT(T.path, CharIndex('\',
Reverse(T.path))-1) as current_file,
Substring(T.path, 1, Len(T.Path) -
CharIndex('\', Reverse(T.path))+ 1) as trace_path
FROM
sys.traces T
WHERE
T.is_default = 1
"@

Get-SqlData $serverName 'master'
$qry

}
#Get-TraceInfo

#######################
# MAIN
#######################

if ($serverName -eq $null
)
{
$serverName =
$env:COMPUTERNAME
}

$result = Get-TraceInfo
$serverName
$copy_path
= $result.trace_path +
'Archive'

if ((Test-Path -Path $copy_path) -eq $false
)
{
New-Item -Path $result.trace_path -Name 'Archive' -ItemType
directory
}

Get-ChildItem -Path $result.trace_path -Exclude $result
.current_file |
where {$_.name -like "*.trc"} | Move-Item -destination
$copy_path

Once I knew what cmdlet’s I needed the PowerShell was probably the second easiest one to implement.  I know SSIS fairly well and it got a bit complex because of having to exclude the file currently in use.  Using fn_trace_gettable is simple because it allows me to use T-SQL which I have many years experience in.

Using PowerShell is really a matter of knowing what cmdlet’s you need, once you have that, the built in help (get-help) is very good, so you can just about do anything.  PowerShell is very powerful and flexible.  It can actually do so much that I think you need to be careful.  For example, the New-Item cmdlet can create just about any item depending on the ItemType you use.  ‘

So while I definitely recommend that any DBA learn some PowerShell I also recommend being very careful and Test, Test, Test!

Posted in Code Example, Development, PowerShell, Professional Development, Programming, Scripts, SQL Server | 3 Comments »

Types Matter!

Posted by sqlwiseguy on August 24, 2009

Louis Davidson (@DrSql) had an excellent series on his blog on the Pillars of Database Design and the one of the key elements of his first pillar A Cohesive Design is to for data types to make sense.  I couldn’t agree more.  I have seen more problems caused by using a character type to hold dates or numbers.  The developers, in each case I’m sure, decided that they would verify the type in the application, and I would bet they did.  What they didn’t do, was consider the case where a user get access via MS Access or some other tool and now can enter/modify data outside the application.  This is how you end up with Nobember 17 or February 30 as a date, 1O (yes that’s an letter o) or A17 as a number.  Now when it comes time to query those values or use them in a WHERE or JOIN, what are you going to do?  You will get conversion errors if you try to convert them to the proper data type.  Here’s a recent thread on SQLServerCentral where you can see many of the problems using a “generic” data type causes.
Look if the column is supposed to store dates use the appropriate DATE data type, most RDBMS products have a few, and if it is supposed to store numeric data use the proper NUMERIC data type.  So if you are storing a person’s age you don’t need BIGINT or DECIMAL, SMALLINT will work fine.  This is one of the things a database is designed to help with, verification that data is of the correct type, don’t keep the database engine from doing what it is designed to do!
Thanks for your suppport!

Posted in Best Practice, Development, Rants, SQL Server, SqlServerCentral | Leave a Comment »

Learning some javascript

Posted by sqlwiseguy on July 24, 2009

As part of my work I need to be a “Jack of all trades” meaning I not only need to be able to do SQL Server DBA and development tasks, but I also need to be capable of doing .NET and VFP development.  Yes, I did say VFP (Visual FoxPro for those who are too young to know what it is). 

The major project in which I am involved is using SQL Server and ASP.NET (C#) MVC.  Fortunately I had downloaded Scott Guthrie’s NerdDinner tutorial when it first came out and have just finished going through it.  In the tutorial I came across some code like this:

if (item.Description !== undefined) {
description = item.Description;
}

Well, I had never seen the !== operator before and assumed that it was a typo and was supposed to be:

if (item.Description != undefined) {
description = item.Description;
}

since I know that == the equality comparison operator and != is the inequality comparison operator.  Well, then I found the same operator later in the code, so I thought, “Hmmm, maybe this isn’t a typo, but a real operator?”, so off to www.google-vs-bing.com to see what I could find.  Well, I didn’t find anything helpful on page 1 of either search engine’s results.  Next a quick question on Twitter.  Ahhh, several replies that, “Yes !== and === are operators and they compare type AND value”.  Having always programmed in strongly typed languages this is new and interesting to me.  So basically in javascript 1 == “1” is true, but 1 === “1” is false.

I still would rather be in SQL, but at least now I am starting to understand more about javascript.

Posted in Development, Professional Development, Programming, SqlServerCentral | 1 Comment »

An Explanation of Why to Use Stored Procedures

Posted by sqlwiseguy on May 9, 2009

I wish I was a smart as the guys I interact with on sites like SQLServerCentral and TwitterPaul Nielsen, @PaulNielsen on Twitter, has an excellent blog post today on Why Use Stored Procedures that I think is a must read for everyone involved with SQL Server.  I am a huge advocate of stored procedures and Paul does much better job explaining the why’s than I ever could. 

So thanks Paul, now I have someplace to point the nay-sayers.

Posted in Blogs, Development, Lessons, Programming, SQL Server, Standards | Leave a Comment »

Deploying Custom SSIS Components

Posted by sqlwiseguy on April 9, 2009

This post is really for me.  I have written 1 custom component for SSIS 2005 and wrote an article on it for SQLServerCentral that was recently re-published.  In the discussion someone asked for specific steps for installing and using it.  Well, since it has been about a year, I couldn’t remember all the specifics so I’m blogging the steps so I have place to go the next time I need it.  So on with the steps:

  1. Register the component in the GAC (Global Assembly Cache).  There are 2 ways to this:
    • Drag and drop the component into C:\Windows\Assembly
    • Use the GAC utility that that is part of the Visual Studio SDK and the Windows SDK.  Here is the syntax:
        gacutil.exe  /if Path\AssemblyName
    • Depending on which one you installed it will be located in one (or both) of these locations if you used the default install:
      • C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin
      • C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin
  2. On your development machine put the component in one of the folders under C:\Program Files\Microsoft SQL Server\90\DTS (default install location):
    • \Connections
    • \ForEachEnumerators
    • \LogProviders
    • \PipelineComponents
    • \Tasks

The component I created is a Pipeline Component so it belongs in the PiplelineComponents folder.  Once you have put it here you can add it to your toolbox by right-clicking on the toolbox, selecting “Choose Items”, select the SSIS Data Flow Items tab (in this case), and select the Error Details component.

This is really a summarization of this blog post by Matthew Roche that you can read for more details.

Posted in Development, Personal, Published Articles, SQL Server, SqlServerCentral, SSIS, Tools | Leave a Comment »

Error Handling in SSIS re-featured on SQLServerCentral

Posted by sqlwiseguy on April 3, 2009

I am honored that my first ever published article, Error Handling in SSIS, was re-featured on SQLServerCentral today.  I don’t know how Steve selects articles to feature a second time, but I assume it was because they were popular.  Here are a couple of earlier blog posts about this article/subject:

Error Handling in SSIS Published

SSIS Error Logging Custom Component

Posted in Development, Professional Development, Programming, SQL Server, SqlServerCentral, SSIS | Leave a Comment »

What’s ownership chaining?

Posted by sqlwiseguy on March 25, 2009

Edit: 04-27-2009 K. Brian Kelley corrrectly pointed out in a comment that objects can still be owned by users in 2005 and later.  See his blog post to see how.

Edit:  While the code section is not as nice as I would like it is better.  Thanks for sticking with me.

I recently answered a question on SSC about how you granting execute rights on a stored procedure allows you to access tables without having any permissions granted on the tables.  The way that SQL Server implements this it through ownership chaining.  Here is the BOL (SQL Server 2005) definition of Ownership Chains:

When multiple database objects access each other sequentially, the sequence is known as a chain. Although such chains do not independently exist, when SQL Server 2005 traverses the links in a chain, SQL Server evaluates permissions on the constituent objects differently than it would if it were accessing the objects separately. These differences have important implications for managing security.
Ownership chaining enables managing access to multiple objects, such as multiple tables, by setting permissions on one object, such as a view. Ownership chaining also offers a slight performance advantage in scenarios that allow for skipping permission checks.

How Permissions Are Checked in a Chain

When an object is accessed through a chain, SQL Server first compares the owner of the object to the owner of the calling object. This is the previous link in the chain. If both objects have the same owner, permissions on the referenced object are not evaluated.

Prior to SQL Server 2005, objects were owned by users, and, in most cases, designers and developers used the dbo user to own common objects.  In my experience it was very rare for objects to be owned by a “real” user. Thus if the customers table and orders table are both owned by dbo, then you can create a customer_orders_get stored procedure owned by dbo, grant user Joe execute permissions on the stored procedure, and Joe would be able to successfully use the stored procedure to query the customers and orders tables without having select permissions on the tables.  This is pretty easy to understand, as long all the objects are dbo.object_name I can access them using ownership chaining.
SQL Server 2005 introduced schemas which removed the use of users as owners.  Now objects belong to schemas, not users.  From BOL:

Beginning in SQL Server 2005, each object belongs to a database schema. A database schema is a distinct namespace that is separate from a database user. You can think of a schema as a container of objects. Schemas can be created and altered in a database, and users can be granted access to a schema. A schema can be owned by any user, and schema ownership is transferable.

The question now is, how does ownership chaining work when using schemas?  As the last line of the BOL quote states, a schema is owned as well, and ownership chaining in SQL Server 2005 and greater is based on schema ownership.  So if schema A is owned by dbo then you would be able to use ownership chaining to access objects in dbo from schema A and vice versa.  I think the best way to explain is with an example:

USE [master]
GO

-- drop the DB if it exists
IF EXISTS ( SELECT
name
FROM
sys.databases
WHERE
name = N'oc_test' )
DROP DATABASE [oc_test]
GO

USE [master]
GO

-- create the test database
CREATE DATABASE [oc_test] ON PRIMARY
(
NAME = N'oc_test',
FILENAME = N'C:\Data\oc_test.mdf' ,
SIZE = 194560KB,
MAXSIZE = 1024000KB,
FILEGROWTH = 10240KB
)
LOG ON
(
NAME = N'oc_test_log',
FILENAME = N'C:\Logs\oc_test_log.ldf',
SIZE = 204800KB,
MAXSIZE = 204800KB,
FILEGROWTH = 10240KB
)
GO

-- switch to the new databse
USE oc_test ;

GO

-- drop the user if it exists
IF EXISTS ( SELECT
1
FROM
sys.database_principals AS DP
WHERE
DP.[name] = 'test_user' )
BEGIN
DROP USER test_user ;
END

-- create our test user
CREATE USER test_user WITHOUT LOGIN
WITH DEFAULT_SCHEMA = dbo ;
GO

-- drop proc and schema
IF OBJECT_ID('ownership_chaining.table_one_ins')
IS NOT NULL
BEGIN
DROP PROCEDURE ownership_chaining.table_one_ins ;
END

GO

IF SCHEMA_ID('ownership_chaining') IS NOT NULL
BEGIN
DROP SCHEMA ownership_chaining ;
END
GO

-- create our test schema, notice it is "owned" by dbo
CREATE SCHEMA ownership_chaining AUTHORIZATION dbo ;

GO

-- drop our test table if it exists
IF OBJECT_ID('dbo.table_one') IS NOT NULL
BEGIN
DROP TABLE dbo.table_one ;
END

/*
create a test table NOTICE it is in the dbo
schema which is owned by dbo
*/
CREATE TABLE dbo.table_one
(
column_one INT IDENTITY(1, 1),
column_two VARCHAR(10) NOT NULL
) ;

GO

/*
Create the test procedure in the
ownership_chaining schema which
is owned by dbo
*/
CREATE PROCEDURE ownership_chaining.table_one_ins
@column_two VARCHAR(10)
AS
INSERT INTO
dbo.table_one (column_two)
VALUES
(@column_two) ;

RETURN ;

GO

/*
give out test user execut permissions on the
procedure
*/
GRANT EXECUTE ON ownership_chaining.table_one_ins
TO test_user ;

GO

-- change user context
EXECUTE AS USER = 'test_user' ;

-- execute the procedure
EXEC ownership_chaining.table_one_ins
@column_two = 'Test 1' ;

-- go back to my login
Revert ;

-- verify that it worked which it should
SELECT
*
FROM
dbo.table_one AS _TO ;

/*
Now change the owner of the ownership_chaining
schema so that it is no longer owned by dbo
thus breaking the chain
*/
ALTER AUTHORIZATION ON SCHEMA::ownership_chaining
TO db_owner ;

-- change user context
EXECUTE AS USER = 'test_user' ;

/*
using try - catch so I get the revert as
I expect failure
*/
BEGIN TRY
EXEC ownership_chaining.table_one_ins
@column_two = 'Test 1' ;
END TRY
BEGIN CATCH
-- show the error message
SELECT
ERROR_MESSAGE() AS error ;

-- return to my user
revert ;
END CATCH

-- verify data
SELECT
*
FROM
dbo.table_one AS _TO ;
I hope this helps clear up what I found to be a somewhat confusing topic.

Posted in Development, Programming, SQL Server, SqlServerCentral | 6 Comments »

Measuring Developer Performance

Posted by sqlwiseguy on December 10, 2008

I just read this post on Joel Spolsky’s blog (Joel on Software) where he says, about using timesheets to measure performance:

We have a theory, here, that this is a bad idea. Using timesheets as a performance metric can lead to only one thing: bad data in timesheets.

You really need to read the whole post to understand where he is coming from, but I agree with his point. As soon as someone knows how you are measuring their performance, many people will find a way to “game” the system. What Joel doesn’t do, likely because this was not the real point of his post, is mention how he measures developer performance. So the question is:

How do you measure performance for developers?

I wish I had a good answer for this. I have always thought performance should be measured by what you accomplish, not how long it takes to accomplish it. If the task is to create three reports for the accounting department by the end of the week and one developer gets it done by Tuesday and surfs the net for the rest of the week and another developer gets it done on Friday, which one performed better? Both met the expectation, working under the assumption that the reports work the same, so how do you evaluate performance? Granted the one who finished sooner could be assigned more work so is more productive, but both met the requirement.

I had a short contract position writing SSRS reports for a company. The first day I showed up, they had a certain number of reports spec’d out for to work on and expected it to take me several days to get them done. Well, by the middle of the second day I had completed them all and the Business Analyst I was working with didn’t have anything else ready for me and I was expected to be there from 8 to 4:30. I couldn’t leave, but I wasn’t being productive, yet my employer was happy with my perfomancem and my contract, which was originally for four weeks, was extended and I ended up there for six months.

The point of the story is that I had days where I was paid for 8 hours and worked 3-4 because I was waiting on someone else. So was my performance good, they thought so. Also my timesheet showed 8 hours worked, which really isn’t accurate.

This also relates to the editorial at SQLServerCentral. Be sure to read the discussion as well.

Posted in Development, Performance, Professional Development | 4 Comments »