With big data now an essential part of any business' activities, the actual process of getting data from its initial sources into a format suitable for use in analytics is becoming a top priority.
Many firms now find themselves collecting more data than ever before, thanks to improved abilities to track and record what customers are doing, while innovations such as the Internet of Things are set to provide another huge source of unstructured data in the coming years.
But this data won't be much use if it is just sitting in an Excel spreadsheet somewhere, so firms need a fast, effective way of importing this into their analytics solutions. And for many, this is where ETL technology comes in.
Standing for Extract, Transform and Load, these solutions take raw data from whatever initial form it arrives in, cleans and transforms it into a more standardized format that is easier for data scientists to work with, and adds it to a data warehouse where it can be processed.
This has become an everyday activity for many firms, but it's not the only way of getting information from the source and into a data warehouse. Increasingly, there's another similar, but not identical, alternative to consider in the form of ELT solutions.
ELT and ETL - what's the difference?
ELT has been around a while, but as data volumes and processing power grow, it's becoming more relevant for many organizations. The two initialisms stand for the same processes, but the difference between the two is more than just a swapping of letters.
While the two activities meet the same need - to take data and prepare it for analysis by cleaning, formatting and enriching it - ELT offers a different approach to how the process is carried out that could make it more suitable for certain scenarios, primarily when and how the transformation stage takes place.
With normal ELT, the process involves extracting the data from the source, transforming it, then loading it to the data warehouse. ELT, on the other hand, reverses the order of the last two activities.
This means data is extracted from the source and immediately moved to the data warehouse before it is transformed. And while this may seem like a minor distinction, it's actually a significant change.
A more flexible approach
This is because ETL is fundamentally a hands-on, batch process. It requires data engineers to interact with the data at each step of the process and is subject to the capacity limitations of data warehouses. It also often requires a lot of waiting around, as access to the information itself is not available until the entire process has been completed and the full selection of data has been successfully loaded.
As the volume of data continues to grow, this means data scientists could spend a lot of their time sitting around, as it could take hours or even days to complete the process on very large data sets. And in an environment that prizes real-time results, these delays could be the difference between timely insight into data and missing out.
ELT, by contrast, promises a much more flexible, streamlined approach that should enable IT teams to begin working on the data much more quickly. Instead of moving data to an interim staging area before being transformed, then loaded into the data warehouse, ELT uses standard tools such as FTP to transfer the data directly to the data warehouse infrastructure.
This makes it much more flexible than ETL. When using ELT solutions, users are able to run new transformations, test and enhance queries directly on the raw data as it is required. This removes much of the time and complexity traditionally associated with ETL alternatives.
When to use ETL for your data management
It may therefore sound like ELT, as a more modern and efficient option for managing data, should be the obvious choice for businesses looking to improve the pace of their big data analytics processes. However, it's not quite that simple, as both ELT and ETL have their own pros and cons that mean they may be more suitable for some operations than others.
So, when should organizations stick to more familiar ETL solutions? One application where this technology is usually the better choice may be if you're working on especially sensitive or regulated data. In these scenarios., if you're loading raw data into a large data warehouse, it may be easy to lose control of exactly what details are being uploaded and where to, which may put your business at risk of falling foul of compliance requirements.
In such cases, it may be sensible to sacrifice the higher ingestion speed of ELT in order to ensure you know exactly what's going into your database and avoid the need for more complex security solutions.
If you know your data is poor quality and will need significant cleansing operations or custom integrations to get it into a fit state for analytics, ETL is often a better option, as when using ELT, you will need skilled professionals to write custom code in order to undertake complex data transformation actions within the warehouse.
When will ELT be a better alternative?
By contrast, if speed matters to your business, than ELT is often the best choice. Because the data loading and transformation operations take place in parallel, ELT doesn't have to wait for the data to be transformed and then loaded. Therefore, if you need results fast - for example, if you are working with data sets that will quickly become outdated or irrelevant - ELT's ability to deliver raw information considerably faster will be invaluable.
This speed and flexibility is also a key advantage if you're working with extremely large data sets, and those that contain a lot of unstructured information. The less structured your data is, the longer the transformation stage will take, and when you scale this up to today's volumes, the knock-on impact of this can be huge. If you're working with petabytes rather than terabytes of source data, ELT may well be the only practical option.
Similarly, if you’re planning to use high-end data processing engines like Hadoop, or cloud data warehouses, ELT can take advantage of the native processing power for higher scalability.
ELT is a more complex solution than ETL, requiring users to possess an in-depth knowledge of tools and expert design of the main large repository. But as it is easier to scale and faster at dealing with very large volumes of data, it's likely to be the future for many businesses.