Find out why you should be using columnstore indexing to boost the performance of your data warehouse.
We all want our databases to work as efficiently as possible. In today's environment, being able to run queries quickly and accurately can be the difference between the best-performing businesses and the laggards. But while there are many tweaks you can make to improve performance, there are also a few steps you can take to help turn your data warehouse from a hatchback to a hypercar.
If you’re running SQL databases, there's one thing in particular that can have a huge impact on your performance, and that's adopting columnstore indexing. This has been available on products since SQL Server 2012, but a series of new features introduced in the 2016 version has helped expand these capabilities to larger data warehouses.
But what is columstore indexing, and how can you go about deploying it to give you the speed boost you need?
What is columnstore indexing?
In a traditional SQL solution, data is stored in rows, so each record will have a row of its own that may extend over multiple pages. For example, a typical database for employees may have entries like this:
Entry 1: Alice, Smith, Finance, Accountant
Entry 2: Bob, Jones, Sales, Sales Director
Entry 3: Charles, Williams, HR, Recruiter
And so on for the entire database, with each individual's details stored in their own row. But in a columnstore index, each field for every entry is stored in its own column on the disk, so the same data as above would look like this:
Entry 1: Alice, Bob, Charles
Entry 2: Smith, Jones, Williams
Entry 3: Finance, Sales, HR
Entry 4: Accountant, Sales Director, Recruiter
The key difference is every field is assigned its own column that's stored individually. This means, if you need to conduct a query that, for example, only searches for employees' surnames or job titles you only have to read the relevant column rather than pulling from the entire database.
Why columnstore indexing matters
Instead of having to read every field for each entry, a query only needs to fetch the specific columns required. This means much less data is sent from the disk, reducing the number of input/output operations needed.
As each column will be made up of similar data, it will be easier to compress, which improves query performance by using a smaller in-memory footprint. This in turn boosts overall query performance, as the server can perform more queries and data operations in-memory.
The upshot of this is that you can see significant performance gains over rowstore solutions. In fact, a columnstore indexing format can achieve up to ten times the query performance of traditional row-oriented storage, as well as up to seven times better data compression over the uncompressed data size.
When is a columnstore solution the best option?
This method of storing data won't be appropriate for every dataset, so it pays to know when to use it. Generally speaking, columnstore indexes are able to provide high performance gains when being used for analytic queries that scan large amounts of highly-structured data, especially on large tables.
Columnstore indexes are well suited for use on data warehousing and analytics workloads, especially on fact tables, because these operations tend to require full table scans rather than table seeks.
On the other hand, rowstore solutions still have their place, particularly when used for queries that seek deeper into the data, when searching for a specific value, or for queries on a small range of values.
The difference between clustered and non-clustered
Columnstore indexing can be used in both clustered and non-clustered forms. While they will both function in the same way, a non-clustered index is a secondary index that's created on a rowstore table, whereas a clustered columnstore index is the primary storage for the entire table.
A non-clustered columnstore index offers users a columnstore index that allows them to run analysis queries, while at the same time performing read-only operations on the original table. A clustered columnstore index, on the other hand, is updateable, allowing users to insert, delete and update operations on the index. As the physical storage for the entire table, a clustered columnstore is the only index for the table.
Understanding when to use columnstore indexing can provide your business with the speeds and efficiency it needs to transform how it interacts with data warehouses, delivering essential information to users at speeds that would not be possible with traditional solutions. If you aren't using them yet, now's the time to change that.
Insights for Professionals provide free access to the latest thought leadership from global brands. We deliver subscriber value by creating and gathering specialist content for senior professionals. To view more IT content, click here.