Understand everything about anonymization
Articles published by Data Comply One (formerly Mission RGPD) follow our episodes d’1min to understand everything. These are short videos during which our lawyers who are experts in personal data protection offer you simple definitions of key concepts of the General Regulation on the Protection of Personal Data (GDPR) and give you concrete examples. Follow us on LinkedIn so you don't miss any news!
Today we are addressing the notion of anonymization. Make yourself comfortable, we'll explain everything to you in 5 minutes!
What is anonymization?
Anonymization is one of the technical measures that can be implemented by data controllers. According to the CNIL, it is an operation which consists of using a set of techniques, so as to make impossible, in practice, any identification of the person by any means whatsoever and in an irreversible manner. Once anonymized, the data is no longer subject to the GDPR since it is no longer of a personal nature.
Anonymization allows data to be retained for defined processing, without it being able to be attributed to the data subject. Anonymization is useful for retaining data beyond its initial retention period, but also for reusing it, publishing it, etc. However, we must be vigilant on this point and ensure that the anonymization process is truly effective. If anonymization is carried out correctly, it allows the use of data, which is at the origin of personal data, while limiting the risks of invasion of people's privacy. In fact, the person who then accesses the data only has an anonymized view of a dataset.
There are different anonymization methods, grouped into two families: randomization and generalization. Each of these two families encompasses several techniques. To randomize a dataset, the main techniques that can be used are: adding noise, permutation and differential confidentiality. Regarding generalization, it is mainly aggregation and k-anonymity, l-diversity and t-proximity techniques that can be used. Each anonymization method has a different purpose and guarantees a varying level of accuracy. To determine which anonymization method should be deployed, the purpose of the processing must first be identified. This purpose for which the data must be anonymized can for example be the establishment of statistics. The method presented below is just a simplified example of generalization anonymization (k-anonymity technique).
How does generalization work?
Using a generalization technique, the data controller modifies the scale of the attributes of the datasets, or their order of magnitude. If for example, among the categories of data to be anonymized is the age of the persons concerned; To generalize this data, it is necessary to replace the precise age of people with age groups. Through this process, the dataset can no longer be individualized and it is no longer possible to re-identify the person.
To illustrate, let's take the example of a set of data used by an organization's HR department. In this example, this service uses the data to compile statistics on people's salaries, based on their age and seniority in the company. The raw data is as follows:

The first anonymization step will consist of deleting the names of the persons concerned since this data is not necessary for the envisaged processing. As a reminder, the GDPR sets out in Article 5 an obligation to only process the data necessary for the defined purpose (principle of minimization). The registration number can also make it possible to identify people, provided you have a file which matches the registration number to the identity of the person (as in the table above, or a pay slip for example). Thus, to produce the statistics envisaged, it is not necessary to keep the registration number.
The person's attributes such as age and seniority make it easy to identify them. This data must be anonymized since it is useful in calculating the statistics desired by the HR department. Finally, the sensitive data remains to be processed: salary. This data must also be anonymized. Here, the interest in anonymizing these datasets is to undo the identifying link between the name of the person concerned and their salary.
To anonymize this data using a generalization method, it is necessary to change the scale of the data by establishing age groups, salary and whether the seniority date is less or more than an established date. This makes the data reusable but less accurate. In our example, the data after anonymization will be as follows:

What are the pitfalls to avoid in an anonymization process?
G29 guidelines on anonymization techniques (2014) identify three conditions for the anonymization process to be reliable, it must be impossible:
- To individualize, that is, it should not be possible to isolate an individual in the dataset. In the example above, individuals are grouped into broad categories;
- To correlate, in other words data relating to a single person or a group of people must not be able to be linked together;
- To infer, that is, to deduce information, with a high degree of probability.
Data Comply One (formerly Mission GDPR) and anonymization
Thanks to Data Comply One (formerly Mission RGPD), in each processing sheet, fill in a few clicks the measures implemented to ensure data security, including anonymization! Add as an attachment the document that details your process. In the event of an inspection, easily find the necessary documents and the links that connect them together!