Data Classification for GDPR and Why You Should Do It


David KingTechnical Director at Secon Cyber Security

Thursday, August 31, 2017

Data Classification is often overlooked but it could be the answer to your General Data Protection Regulation requirements.

Article 6 Minutes
GDPR: Paper records of confidential customer details stored across different color coded folders

If you store, process or transfer data about EU citizens (this could be about customers or your very own staff) then you should be aware of the forthcoming General Data Protection Regulation.  If you don’t know what it is and what it means for your business then you need to get up to speed and quickly.  The new Regulation comes in to effect on 25th May 2018 and has some serious implications and some extremely large fines.  

In fact, managing data has, for many companies, always been a headache and with GDPR it probably just got worse.  If you don’t know where that EU citizen data is, then you have a real problem.  However, you shouldn’t just focus on Personal Information (PI) or Personally Identifiable Information (PII) but should consider it as part of a wider initiative.  

Data classification has been around for a very long time.   It first appeared in the UK in the late 19th Century and formed part of the Official Secrets Act 1889 entitled “An Act to prevent the Disclosure of Official Documents and Information”.  Yet, despite it having been around for over 125 years, it’s still mainly only government bodies and large financial institutions that seem to do it well.  There are, no doubt, many other organizations out there that do it correctly but there are many more companies that do it very badly or just don’t do it at all.  

If you’re a business (who needs to comply with GDPR) and don’t yet have a data classification policy implemented and effective, why should you do it and where do you start?   

Statistics indicate that anything up to 70% of unstructured data on a network could be considered ‘ROT-ten’ (Redundant, Obsolete or Trivial).  By only storing what you need to for as long as you need to you will reduce your storage costs, which is also a key consideration under GDPR.  Removing what you don’t need can also lead to better indexing, faster access and quicker recovery times.  Reducing what you store can also reduce your risk.  After all, if you don’t have it, it can’t be stolen!

You should also classify data so that people understand what’s important.  Visible document labelling can make people think twice about the way they handle data.  This makes them more aware and generally increases their security conscious, thereby improving the overall security of your business.  Another observation is that it also leads to improved document handling.  

Once you’ve decided that it makes sense, you need to translate this into action and with GDPR consideration at the center.  One suggested approach (based on the Deming-cycle) might look like this;

Discovery (Plan)

First you need to know where your data is.  In the old days it was relatively easy as it all used to be on-site, on local file servers or similar.  Today there are a myriad of public and private clouds, social media and sharing sites, and hybrid solutions integrating them all.  A study by Skyhigh Networks in Feb 2015 stated that “The average public sector organization uses 721 cloud services”.  That was over 2 years ago and there are even more cloud-based services available today.

File auditing tools and cloud-access security brokers (CASB) will help identify where files are and where they’re being sent.   You also need to understand not just who’s using it but who has access and who is responsible for it.  These are often very different things.  Just because an HR assistant has access to the data doesn’t necessarily mean they are the data owner.  Knowing who the data owner is and why you have that data is important.  Only the data owner can really say how important that data is.

Deletion (Do)

Once you know where your data is, do you know ‘what’ it is?  What does it contain and do you still need it?  As discussed, you should get rid of anything that is no longer required (that ROT-ten data).  You should also keep a record of what has been deleted and why.  Another key principle within GDPR is accountability.  Again, you will need to rely on the data owner to tell you whether it needs to be kept or not.

Classification (Do)

With the data reduced to only what you need, and the data owners identified, you need to decide on a data classification scheme.  This can be situational specific or more generic.  The Governments’ protective marking scheme used to have 6 levels ranging from Unclassified and Protected through to Secret and Top Secret.  For reasons of clarity, they recently reduced this to just 3 levels – OFFICIAL, SECRET and TOP SECRET.  You need to choose something that is clear, simple and relevant to your business.

Whilst there are many different data classification companies out there, you could just use headers and footers, watermarks and visible labelling combined with a good user awareness and education process.  However, tools help to automate this and will frequently be able to help with the data discovery piece.  They can often be programmed to use your data classification scheme and prompt users when creating or saving documents.

Monitoring (Check)

With less data to be monitored, and the subset of important data identified and suitably labelled (classified), the appropriate controls can be implemented.  This is usually in some form of Data Loss Prevention (DLP).  Putting controls around just this data is much more cost effective than trying to protect all the data you own.  

Data breaches, under GDPR, must be notified to the relevant Supervisory Authority or SA (the ICO in the UK) within 72 hours of happening.  Having a good monitoring solution makes this possible.  The SA will want to know what data has been lost and how and what is being done.  Depending on the type of breach and the data stolen, you may also have to notify the affected individuals.

Fines for breaches can be up to 4% of the company’s annual global turnover or €20m (whichever is the greater!).  In the case of the Tesco Bank breach this would have been over £1.5bn.  It’s expected that anything and everything done to identify and protect (PII) data can help reduce this.  Some popular techniques include encryption and pseudonymization and, if done properly, can avoid you having to notify the end user (NOTE: you will still have to notify the ICO or SA).

Review (Act)

Both documents and data age.  What was Top Secret a year ago may now be public knowledge or Official (think Mergers and Acquisitions) so why continue to apply overly restrictive controls?  Constant review and adjustment is required to ensure that the most appropriate controls are in place.  You will also need to ensure the monitoring systems are still working effectively.  Again, document everything.


Data classification helps a company understand what data it has and what is important.  It can be used to identify data no longer required and help improve operational efficiency and lower costs.  It should help by also identifying data about EU Citizens so that appropriate organizational and technical controls can be implemented, a key element of any GDPR project.  It won’t make you GDPR compliant but it will go a long way to help.

David King

David King is the Technical Director at Secon Cyber Security, one of the UK's leading cyber security service providers. David is leading the way in innovating and developing Secon Cyber Security's Advanced Managed Security Services, including their Managed Detection and Response (MDR) and Managed Vulnerability Assessment services. David also has experience in pre-acquisition audit and global IT operations including security, infrastructure, applications and service delivery.  He completed a Master’s Degree in Cyber Security (2016) and specialised in Data Classification for his thesis.  He is also EU GDPR Practitioner Certified.


Join the conversation...