I think in order to answer this question; we need to firstly look at what Big Data is. There is no one definition, but I think this is a pretty good one: “the term big data is used to describe datasets with volumes so huge that they are beyond the ability of typical Database Management systems to capture, store and analyse”.
Gartner analyst, Doug Laney devised the 3V’s model for Big Data. Gartner’s definition is “Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”
We look at what big data can do in terms of advancement in science and medicine, predictive analytics and we are amazed at the cleverness of it all. The ability to predict our intelligence based on whether we liked a curly fries image on Facebook is both amazing and disturbing at the same time. Ted Talk – Jennifer Goldbeck, The Curly Fry Conundrum.
I am fascinated by the opportunities that Big Data analytics can bring, but I am more than a little concerned about what can and will happen in the future if our data is used for less than advantageous means. Let’s take for example a recent documentary I watched – BBC Horizon – The Age of Big Data
The programme addressed how Big Data was used for crime prediction in Los Angeles, the analysis was so great it was possible to predict where and when and possibly by whom the next crime would be committed. Does that mean I could be pre-imprisoned just in case? Ok, this is an extreme example, but the movie “Minority Report” comes to mind.
So, who is watching Big Brother?
Thankfully in Europe we have strong Data Protection Regulations, which are due to get stronger with the introduction of GDPR (General Data Regulation Regulation) April 2016. See my recent blog on the Data Protection Road Map.
An extensive document published by ICO – UK, Big Data and Data Protection, sought to discuss and address the implications and compatibility of Big Data and Data Protection. If one looks at the core principles of data protection in the context of big data and big data analytics, there are some key concerns to be addressed. ICO has captured a summary of practical aspects to consider when using personal data for Big Data analytics:
2 Important Points to Remember
- Big Data is characterised by volume, variety, velocity of “all” data.
- Data Protection is interested because it involves the processing of personal data.
So does this alleviate concerns?
Potentially yes, there are many methods and tools available to organisations that not only protect our personal data but also remove the individual identifiable element. Anonymisation is one approach.
Applied correctly, anonymisation means data is no longer personal data. Anonymisation seeks to strip out any identifier information such that the individual can no longer be identified by the data alone or in combination with other data. Anonymisation is not just about sanitizing the data, it is also a means of mitigating the risk of inadvertent disclosure or loss of personal data. Organisations will need to demonstrate anonymisation was carried out in a most robust manner. From a business perspective, this should be balanced with adopting solutions that are proportionate to the risk.
ICO has published an extensive Anonymisation Code of Practice, which they claim is the first of its kind from any European Data Protection authority. It provides excellent guidance and also suggests some anonymisation techniques; which include: data masking, pseudonymisation, aggregation, derived data items and banding. A further useful resource is UKAN UK Anonymisation Network.
Is Big Data Compatible with Data Protection or not?
Ultimately I believe it’s not actually about compatibility. Big Data and Data Protection are not mutually exclusive. The must and do co-exist. The challenge for organisations is and will be more so in the future, that of building trust with individuals and operating ethically.
Data Protection principles should not be seen as a barrier to Big Data progress. Applying core principles such as fairness, transparency and consent as a framework to trust and ethics will encourage innovative ways of informing and engaging with the public in the future.
References & Bibliography
Gartner Framework http://www.gartner.com/it-glossary/big-data/. (accessed 16 April 2016)
IBM – http://insidebigdata.com/2013/01/04/video-how-to-successfully-manage-the-four-vs-of-big-data/. (accessed 16 April 2016)
Ted Talks, Jennifer Goldbeck, October 2013 https://www.ted.com/talks/jennifer_golbeck_the_curly_fry_conundrum_why_social_media_likes_say_more_than_you_might_think. (accessed 16 April 2016)
BBC Horizon – The Age of Big Data, November 2014 – http://www.dailymotion.com/video/x1z39o0_e11-the-age-of-big-data_tv (accessed 16 April 2016)
Alison Murphy, GDPR Data Protection Road Map-https://diganalytics.wordpress.com/2016/03/23/gdpr-data-protection/ (accessed 16 April 2016)
Information Commissioners Office – UK https://ico.org.uk/ (accessed 16 April 2016)
ICO, Big Data and Data Protection – https://ico.org.uk/media/for-organisations/documents/1541/big-data-and-data-protection.pdf (accessed 16 April 2016)
ICO, Anonymisation – https://ico.org.uk/media/for-organisations/documents/1061/anonymisation-code.pdf (accessed 16 April 2016)
UK Anonymisation Network – http://ukanon.net/about-us/ukan-activities/ (accessed 16 April 2016)
Management Information Systems: Managing the Digital Firm, Laudon & Laudon, 13th Ed, Global Ed, Pearson.
Big Data, Big Analytics – Minelli M., Chambers M., Dhiraj A., 2013 Wiley CIO Series.
Business Intelligence and Data Analytics, From Big Data to Big Impact – Chen et Al, MIS Quarterly, Dec 2012.