Data science is clearly an exciting and innovative field, but it’s also one with a clear problem. According to a Gartner analyst, close to 85% of big data projects fail. This might sound like a death knell for the sector, but the issue isn’t with the technology; it’s down to the fact that existing company infrastructures and systems simply aren’t equipped to handle the demands of data.
Part of the issue is that skilled professionals are in short supply. Demand for data scientists has boomed by an astonishing 256% in the US since December 2013, while over in the EU it’s estimated there will be around 769,000 unfilled data science positions by 2020. Given the amount of work involved in the sector, it’s clear that something needs to change.
The answer could be surprisingly simple: automate. Data science isn’t the only field experimenting with automation, but it’s perhaps the sector that could see the biggest impact from embracing it. The technology has the potential to revolutionize big data as it currently exists, and there’s plenty of evidence showing how it could be a game-changer.
Speeding up data science
While data science projects can be illuminating, interesting and exciting, there’s one thing they definitely aren’t: fast. It’s thought that the average project takes around two to three months to complete, and the longer they take, the more they cost. This in turn makes them more likely to be considered as failures if they don't provide a clear and immediate return on investment.
So, how could we speed this up? One of the big issues with data science is the amount of time it takes to collect, clean and organize the information needed for the project. One survey estimated that this accounts for around 80% of the work of data scientists, and is the aspect that around 75% enjoy the least about the job.
Automating this could significantly speed up projects. Collecting and sorting data is something a computer is much better at than a person, yet it’s something data scientists spend most of their time doing.
Of course, not everything about this can be entrusted to a machine. Nevertheless, it’s estimated that 40% of data science tasks could be automated by 2020, drastically cutting down on the workload of the professionals involved and enabling projects to be completed much quicker, making them more cost-effective.
More fields than expected can be automated
It’s probably clear that the most basic, manual tasks in data science can be automated with few issues. However, could automation really change the field that drastically? The more complex elements are always going to have to be undertaken by humans, right? Well, not exactly. In fact, more aspects of data science can be automated than you might think.
For example, Vice President of AI at IBM Research, Alexander Gray, pointed out that the element of data science most suited to automation is actually the modeling stage at the very end of the pipeline. This is often the main focus of data science training, so it’s understandable if it seems like an important task that should only be entrusted to humans. However, it’s actually best suited to artificial intelligence.
If you want proof, look no further than the most recent KaggleDays SF Hackathon, an 8.5 hour data science challenge. Google entered its modelling algorithm, AutoML, against the various Kaggle Masters and GrandMasters competing and ended up in second place. Therefore, it’s clear that automated modelling can be as effective as the best human efforts.
Other areas of data science can also be automated. Microsoft has debuted software that can automate machine learning, for example, while various elements of coding can also be entrusted to AIs. Software exists to do everything from spot errors in code to write programs from scratch, so automation can be applied to more than you might think.
Data scientists won’t die out (but might have to adapt)
Given all the different areas of data science that can be automated, you might be wondering about the future of the career. If everything can be automated, where will talented data scientists fit in? There’s no need to change jobs yet, as there are plenty of areas of the career that still require a human touch.
As some have pointed out, some of the most important aspects of data science include thought leadership and communication. Data scientists must kick off projects with client communication, then document and resend the results. They will also need to guide the projects from start to finish.
Then there’s the fact that the field is still growing rapidly. Jeremy Achin, CEO of DataRobot, said:
We need trillions of AI systems, and even if you automate 80% of the work, 20% of trillions is still a massive number. If everyone on the planet became data scientists there still wouldn’t be enough.
So while there’s no shortage of work for talented data scientists to undertake, the more manual aspects of the role might be made quicker and easier by automation. This might well change the focus of the role, but doesn’t look likely to make it obsolete any time soon.