Hash function

Hash function as a personal data pseudonymisation technique

Spanish Data Protection Authority (AEPD) has launched new guidance on hash function as a personal data pseudonymisation technique.

GDPR refers pseudonymization of personal data as one of the appropriate technical and organisational measures that may be taken by data controllers in order to ensure a level of security appropriate to the risk. However, it does not specify how data can be pseudonymised. In this context, hash function may be a suitable technique for such purpose and, lucky us, AEPD has prepared some guidancein order to clarify how it works. Do you want to learn more about hash function as a personal data pseudonymisation technique? Keep reading!

What does ‘pseudonymisation’ means?

According to the GDPR, ‘pseudonymisation’ means “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person”.

What is a hash function?

Have you ever tried to write a tweet with more than 280 characters and have not been able because of Twitter character limit? Hash function has come to save you!

A digest or hash function is a process which transforms any random dataset in a fixed length character series, regardless of the size of input data. For example, the full text of Romeo and Juliet may become a series of just one hundred numbers after being run through a hash function. You may be wondering: how? Hash functions divide the input message into blocks, calculate the hash for each of the blocks and add up them all.

Hash and reidentification

How likely is the output of a hash to be reverted to the initial input? Let’s imagine a processing activity whichintends to associate a hash value to each National Insurance Numbers in UK. The main element that would allow or hinder reidentification is the “order” in the message space.

The message space is represented by all possible datasets which may be created and from which a hash may be generated (in our case, UK National Insurance Numbers).The stricter this “order” is (for example, in the case that only National Security Numbers from women who are 30-45 years old were admitted), the smaller the set of numbers (processing message space) will be. Thisguarantees the hash effectivity as a single identifier (no collision) but it also increases the likelihood of identifying the original message from the hash.

The degree of disorder in a dataset is called entropy. The smaller the message space and the lower the entropy are, the lower the risk of collision in hash processing is, but re-identification will be more likely and vice versa: the higher the entropy, the higher the possibility of a collision, but the risk of reidentification will be lower. This is the reason why measuring the amount of information is one of the key factors to consider whenever a message is protected via hash functions or any other pseudonymization or encryption techniques.

How does this apply to the day-to-day business? This basically means that the more variables that “order” the message space (e.g. individuals age, gender, socioeconomic status, nationality, etc.), the higher the risk of re-identification (e.g. the higher the risk of singling out an individual).

The risk of re-identification is even higher when additional information is linked to the hash.

Strategies to hinder re-identification

A strategy to hinder re-identification of the hash value is to use an encryption algorithm with a key that is confidentially stored by the data controller or with the other person taking part in the processing, so that the message is properly encrypted before the hash is completed.

The effectiveness of the encryption will depend on the environment (distributed environments may increase this risk), the vulnerability to attacks and the volume of encrypted information (the more information, the easier it will be to carry out a cryptanalysis), among others.

As an alternative to encryption, random fields may be added to the original message, so the format of the original message is expanded to an “extended message”, which increases its entropy.

However, the computation of the hash itself (e.g. selection of an specific algorithm and its implementation), message space related aspects (e.g. entropy), linked information, physical safety and human factors, etc. includes a series of weaknesses and introduces different risk elements that makes hash function a pseudonymization technique rather than an anonymization one.

According to the AEPD, using hash techniques to pseudonymise or anonymise personal data must be justified by a re-identification risk analysis associated with the specific hash technique used in the processing. “In order to consider the hash technique an anonymisationtechnique, this risk analysis must also assess:

The organisational measures that guarantee the removal of any information that allows for reidentification.
The reasonable guarantee of the system robustness beyond the expected useful life of personal data.”.

Do you require assistance with pseudonymisation and anonymisation techniques? Aphaia provides both GDPR adaptation consultancy services, including data protection impact assessments, and Data Protection Officer outsourcing.

GDPR in a Data-Driven Economy

“Insights on GDPR in a Data-Driven Economy” forum

Aphaia attended the ‘Insights on GDPR in a Data-Driven Economy” forum jointly organized by Denae and University Queen Mary of London.

The “Insights on GDPR in a Data-Driven Economy” forum took place last Tuesday 29th in Madrid. It was a half-day event where some of the most relevant professionals in the industry talked about the Supervisory Authorities roles upon the GDPR, data protection implications of Brexit and the upcoming ePrivacy Regulation.

One of the main points stressed by the speakers was the fact that GDPR is not all about fines. In words of Dr. Ian Walden, Professor of Information and Communications Law and Director of the Centre for Commercial Law Studies in Queen Mary University of London, “the whole process of ensuring compliance with the appropriate rules in order to protect data subjects’ rights is not only based on fines”. In a similar way, Rafael García Gozalo, coordinator of the International Area of the Spanish Supervisory Authority (AEPD) stated that “in the AEPD strategy plan 2015-2019, enforcement does not appear as a main target. It does not mean that the AEPD is not going to enforce, it only means that the way the AEPD is conceiving their supervisory role is not primarily aimed at enforcement”.

The forum was comprised by three interesting panel discussions:

“Organisations and Supervisory Authorities: GDPR enforcement Challenges”;
“Responding to Brexit in the context of a Data-Driven Economy: are we all ready?” and
“Data protection in action”.

The sources where the enforcement actions originate from were discussed in the first panel. It is remarkable that 50% of data protection fines come from data subjects’ complaints. This means that not only a static compliance matters, but also how the controllers respond to their customers, users or employees concerns and to any accidental data breach makes the difference too. Mitigation measures should be placed at the top of compliance procedures.

The second panel focused on Brexit. Being data transfers one of the main concerns of businesses in case of hard Brexit (as detailed by Dr.Bostjan Makarovic in our blog), the speakers pointed out the importance of the Standard Contract Clauses in this regard, and also the need to update them in line with the GDPR, as they were approved under the former Directive 95/46/EC.

The last panel was comprised by talks that covered issues as compliance as a service, the roles of the data controllerand the data processor and the principal security architect-secureworks.

We are very grateful to Ian Walden for inviting us to this interesting forum and we also want to thank Cristina Morales, Mabel Klimt, Estrella Gutiérrez, Rafael García Gozalo, Silvia Ruiz, Raúl Rubio, Paula Ortiz, Ulrich Wuermeling, Christopher Millard, Laura Aliaga and Alfredo Reino for their much valuable contributions.

Will your business be affected by Brexit? Aphaia’s data protection impact assessments and Data Protection Officer outsourcing will assist you with ensuring compliance.



CCPA vs GDPR. In this blog we take a look at similarities and differences between the CCPA and the GDPR. 

It has been a year and a half since the GDPR started to apply. Did you think you were done adapting all your data processes to the Regulation? Don’t miss this post! You might still have a lot of work to do with the new California Consumer Privacy Act (CCPA).

The CCPA was enacted in 2018 and it will be effective from January 1, 2020. It is the first law in the US to provide the consumers with privacy rights. Businesses collecting, selling or disclosing California residents personal information might be subject to the CCPA requirements.

At this stage you may be wondering if the CCPA is the ‘Californian GDPR’. Don’t panic! We have prepared this blog to let you answer that question yourself. Aphaia has gone through the CCPA and the GDPR thoroughly in order to identify the most relevant similarities and differences between them and we have put together our findings in the lines below.

Who is obliged to comply with the CCPA?

While the GDPR applies to “controllers” regardless of their nature or their activity, the CCPA requirements only apply to for-profit entities (“businesses”) that:

are for-profit;
collect consumers’ personal information, or on the behalf of which such information is collected;
determine the purposes and means of the processing of consumers’ personal information;
do business in California; and
meets any of the following thresholds:
has annual gross revenue in excess of $25 million;
alone or in combination, annually buys, receives for the business’s commercial purposes, sells or shares for commercial purposes the personal information of 50,000 or more consumers, households, or devices; or
derives 50% or more of its annual revenues from selling consumers’ personal information.

The CCPA also applies to any entity that controls or is controlled by the business.

Are there territorial limits?

The CCPA applies to organisations that do business in California and, similar to the GDPR, even though it is not explicitly mentioned, it also seems to be applicable to those ones established outside of California if they collect, sell or disclose California consumers personal information while conducting business in California.

Who has rights under the CCPA?

The GDPR covers the privacy rights of ‘data subjects’, who are defined as “an identified or identifiable natural person”, whereas the CCPA protects ‘consumers’,understood as natural persons who are California residents.

Which processes involving data fall under the CCPA?

Whilst the GDPR refer the ‘processing’ of personal data, the CCPA specifically includes ‘collecting’ and ‘sharing’ personal data.

It is important to note that ‘collecting’ covers “buying, renting, gathering, obtaining, receiving, or accessing any personal information pertaining to a consumer by any means” and ‘selling’ comprises “renting, disclosing, releasing, disseminating, making available transferring, or otherwise communicating personal information for monetary or other valuable consideration”. It should be stressed that ‘selling’ does not necessarily involve a payment to be made in exchange for personal information.

What rights does the CCPA provide the consumers with?

Similar to the GDPR, the CCPA provides consumers with new rights, including a right to transparency about data collection, a right to be forgotten, and a right to opt out of having their data sold, which becomes opt in for minors.That said, Californian consumers have the following rights:

The right to know whether their personal information is being collected about them.
The right to request the specific categories of information a business collects upon verifiable request.
The right to know what personal information is being collected about them, the categories of sources form which the information is collected, the business purposes for collecting or selling the information and the categories of third parties with which the information is shared.
The right to say “no” to the sale of personal information.
The right to delete their personal information.
The right to equal service and price, even if they exercise their privacy rights.

It is clear that the CCPA will have large implications for businesses in California (and all around the world!) as it is the strictest privacy law ever enacted in the US. However, with appropriate help, organisations will be able to manage the requirements and implement them step by step as happened with the GDPR almost two years ago.

Do you require assistance with CCPA compliance? Aphaia provides both GDPR and CCPA adaptation services, including data protection impact assessments and Data Protection Officer outsourcing.

South Summit 2019

Aphaia at South Summit 2019!

Aphaia attended the 2019 South Summit in Madrid, Spain.

South Summit 2019 took place last week in La Nave, Madrid (Spain). South Summit is an innovation and investing event that has been held annually since 2013.

South Summit provides startups, investors and corporations seeking to improve their global competitiveness with the chance to connect and create interesting business opportunities. Technology is the common thread among them all.

South Summit brings together the most disruptive ideas that will be market game changers in the following years in several industries like fintech, insurtech, ecomerce, marketing, or healthcare.

Some amazing startups we had the chance to meet are Hoop Carpool, a sustainable mobility startup that promote car sharing for daily trips to the office or to the University; IDOVEN, that uses AI to detect heart problems to prevent cardiac disease, heart attacks and sudden death and GATACA, that provides digital identity solutions to deliver fast and secure access to digital services worldwide.

One of the highlights of South Summit is their Startup Competition, where 100 finalists out of more than 3000 participants show their projects to an expert jury. South Summit 2019 global winners are the following ones:

Startup Competition Winner: Streamloots – Audience monetization for eSports.

Most disruptive startup: Bdeo – Providing visual intelligence to the Insurtech Industry.

Most scalable product: Influencity – Influencer Data-Driven technology.

Best team: Jubel – AI for travel expert advisor services.

Glovo and Badi were finalist startups in South Summit previous editions.

Aphaia helps startups to comply with the relevant data protection, telecom, AI and IoT legislation both before launching their products and services and during the whole product life-cycle.

If you need advice to comply with the GDPR and other relevant legislation in your startup, Aphaia offers both AI ethics and Data Protection Impact Assessments, among other services. Contact us and let us know how we can help you.