Which Sources Should Your Data Warehouse Talk To?

Which Sources Should Your Data Warehouse Talk To?

So you've built a data warehouse - now where are you going to get the information you need to populate it and derive insight?

How firms store and process data is often what now separates the best-performing firms from the laggards. Companies that know how to use their data have better insight into their operations, their customers and the wider market, and so are better placed to make the right decisions.

To make the most of this, you can't afford to waste time hunting down data from across the business whenever you need it - it must be available, in a single repository, at all times. For many firms, the solution to this will be to set up a data warehouse.

Once you have the tools for this, the next step will naturally be to load data into it. But how should you go about this? The sources you choose to feed into your warehouse will play a significant role in the type of analytics you will be able to access and what insight you can expect to achieve.

Unlike a data lake, you can't just dump every bit of data you collect into the repository and expect it to be useful. In order to work effectively, a data warehouse needs to rely on structured, pre-processed data that meets key quality metrics. It must therefore be accurate, correctly formatted and relevant to the task at hand.

As a result, there are a few key categories of data sources that your data warehouse should be communicating with. Here are a few that no business' strategy should be without.

Your operational systems

The most obvious source of information for your data warehouse will be in your firm's key databases and business applications. Enterprise resource planning systems, customer relationship management applications and human resources tools are all among your most useful assets for analytics.

However, it may not be as simple as plugging them directly into a data warehouse and running analytics processes. Traditional transactional databases such as Microsoft Access, DB2 and MySQL, are not typically designed for business intelligence or big data processing, as this data is not optimized for advanced reporting or analysis.

This is where Extract, Transform and Load (ETL) tools come in. These will be essential elements for loading all data sources into your warehouse, but they will be especially crucial in turning your operational data into a more useful form.

IoT sensors

One increasingly common source of data that every business will have to factor into their data warehouse planning will be the Internet of Things (IoT). With billions of sensors set to come into use in the coming years, they present lucrative opportunities to businesses - if firms can effectively take advantage of them.

According to IBM, IoT devices will soon be the world's biggest source of active analytics data. But how should you be looking to integrate this into a data warehouse? Much of this data will arrive in raw, unprocessed forms that may make it unsuitable for these systems. Therefore, it's important to identify what data from these devices will be important, and how it can be extracted and transformed into a more structured format before being added to a data warehouse.

For example, one source of IoT data with a clear real-world use is telematics boxes fitted to cars, which insurance companies can use to reward good drivers with lower premiums. To achieve this, the key task is to convert this raw data into clear scoring values that can be fed into an analytical model that is linked with personal details stored in the data warehouse.

External data

Information from outside the business is another key category that the data warehouse needs to access, and this can come in a wide range of forms. For example, financial services firms may need to link to external credit rating reports to assess suitability for certain products, while other firms will benefit from up-to-date market information.

There are a couple of ways businesses can obtain this information. The first is from publicly-available sources, such as social media, which can be very useful in helping to understand your customers. However, as this is typically unstructured data, you need to have a good ETL solution in place to manage this.

Government statistics can be another precious external source of data, and you can usually be assured of both the accuracy of this information and that it will be available in a format well-suited to analytics. The US government, for example, offers a wealth of resources, ranging from agricultural and manufacturing data to financial details.

Regardless of what sources your data warehouse talks to, you'll need to take care to clean the data gathered and ensure it is of high quality before you can run effective analytics that give the accurate insight you need to be successful.

Insights for Professionals provide free access to the latest thought leadership from global brands. We deliver subscriber value by creating and gathering specialist content for senior professionals. To view more IT content, click here.

Insights for Professionals