As .NET developers, we often find ourselves waist deep in SQL Server queries and stored procedures. A good example is an ASP.NET application I recently developed and delivered to the client for testing. The application utilized an existing SQL Server database with most of the T-SQL contained in pre-existing stored procedures (aka, I didn’t write them). I delivered the application and the client complained about performance problems. The bottlenecks originated in the database tier, so I ended up spending countless hours organizing the T-SQL to boost performance.
This week, I share some tips that you may use in your projects. I focus on SQL Server, but many of the items are applicable to all database platforms.
Hard to avoid
Even though most organizations have database administrators and developers, it is hard to avoid SQL Server development. It is just one piece of the development puzzle. I’ve had many database developers tell me that I know what my application needs so I should write the query. While it is hard to argue with that logic, it is difficult to be proficient with T-SQL, while keeping up with the many facets of .NET development. However, the basic T-SQL language remains consistent. With that in mind, let’s cover a few guidelines for developing efficient database code.
Like any coding, there is more than one way to build your T-SQL queries. Here are a few guidelines for boosting performance.
You should always use a WHERE clause to narrow the number of rows returned. If you do not use a WHERE clause, SQL Server performs a full table scan and returns all rows. (There will be situations where all rows are necessary, but use a WHERE clause for all other instances.)
You should always use the WHERE clause in favor of a HAVING clause. When you use GROUP BY with the HAVING clause, the GROUP BY clause divides the rows into sets of grouped rows and aggregates their values, and then the HAVING clause eliminates undesired aggregated groups. In many cases, you can write your select statement so it will contain only WHERE and GROUP BY clauses without a HAVING clause.
While the asterisk (*) makes it easy to return all column values returned by a query, you should only return the necessary rows. The fewer number of columns in the result set produces less data thus less network traffic. The performance boost depends upon the possible number of columns, but it is good practice to always designate columns.
SQL Server cursors make it easy to loop through a result set, but the functionality comes at the price of performance. Cursors may be good for nightly server tasks, but avoid them in your application code/procedures. It is better to use a select statement to return the necessary values and process them on the client side.
The T-SQL COUNT function makes it easy to return the number of items returned by a query, but the function can be used with a specific column to improve performance. The problem is that SELECT COUNT(*) performs a full table scan to return the count.
You can designate a column in the function call, and you can also use the sysindexes table if only a total row count is necessary (without a WHERE clause). There is the ROWS column in the sysindexes table. It contains the row count for each table in the database. The following query returns the number of rows in the specified table:
SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2
Many developers have fallen in love with the DISTINCT option in a query. It allows you to return only unique row values—no duplicates. The problem is that it degrades performance, so use it only when absolutely necessary.
Returning a number of rows
You may only need a subset of a query. The TOP operator allows you to designate that only a certain number or percentage of items are returned in the query—regardless of the total values returned by the query. The following query returns the top ten items of the query:
SELECT TOP 10 CustomerID FROM Northwind.dbo.Orders WHERE Freight < 50.0
Or, you can return a percentage of the overall result set:
SELECT TOP 5 percent CustomerID FROM Northwind.dbo.Orders WHERE Freight < 50.0
What code to avoid
This may seem like a no-brainer, but you should avoid any and all code that does nothing. I’ve encountered countless stored procedures that include code that adds no functionality. It may be left over from another version of the procedure, but you should remove it if it isn’t used; or, you can comment it out to avoid performance degradation.
Use the database server
You should take advantage of the database platform and use stored procedures as opposed to including T-SQL in your client code. Stored procedure code is optimized by the database platform, so it should be utilized at all times to ensure code is running efficiently. In addition, views can be utilized to replace large queries and boost performance.
Indexes are your friend
Utilizing table indexes can greatly improve query performance if the indexes are used and applied correctly. Proper index creation is beyond this article, but there are plenty of resources available to provide more information. In addition, SQL Server does include the SQL Profiler tool to aid in locating performance bottlenecks.
More work to do
Your .NET code has been tested and everything is functioning as planned, but there may still be work to do on the database side. You can attack slow performing queries various ways, but I hope the guidelines in this article provide you with a sound starting point. In addition, you can utilize the SQL Server toolset to identify and address query issues. Please share your tips for optimizing queries with the community in the article discussion.