The Implications of the GDPR for Research Involving Genetic Data

by Laura Drechsler, Brussels Privacy Hub, LSTS, VUB


On 5 October 2017, the Brussels Privacy Hub organized a lunchtime debate on “The Implications of the GDPR for Research Involving Genetic Data”.

Aim of the debate was to present both the research necessities and the legal requirements for genetic data processing, considering the respective rules of the General Data Protection Regulation (GDPR). This unique multi-disciplinary approach was delivered by Liam Quinn (computational genetic PhD researcher at University College London) and Professor Paul Quinn (Brussels Privacy Hub, LSTS, VUB) in front of a varied audience with participants ranging from civil society representatives and academics to representatives of the EDPS and the private sector.

The debate started with a short introduction into the field of computational genetics. It was explained that DNA as a blueprint for the development of all the cells in a living organism holds immense research potential. DNA samples can now be obtained from a wide array of sources including saliva, blood and urine. Additionally, it is now possible to sample human remains from cave sediments, reaching new conclusions about genetic ancestry. DNA collection is to be separated from DNA extraction. For this several chemical methods can be applied. The bigger the DNA sample the better the result, although there are methods to amplify an otherwise weak source. After extraction digitisation made possible the field of computational genetics. In computational genetics, the different ladders of a DNA source are converted into binary code, assuming that the human genome knows only two bases-combinations. This allows computational geneticist to code their data, which greatly streamlines the analysis of genetic data and improves its fidelity.

The legal aspects come into this process when one considers that any publication in scientific journals nowadays requires that the data sets on which the findings are based are made available so that third parties can potentially test their replicability. As a result, huge data arrays with genetic data are made available online. An example for such a data collection is the 100.000 genomes project of Healthcare England, in which data comes from participants that consented to partake in studies up to 15 years ago, when the possibilities of genetic research had completely different parameters. In this way samples of several 100.000 of participants are collected and analysed.

This presents a treasure trove for researcher especially focusing on rare diseases, since such statistical power could not be obtained previously. While all the data are anonymised and only available remotely with a secure login (no downloading) even the study organisers themselves admit that it cannot be excluded that somebody is identified by an astray researcher. Moreover, the data protection requirements applied to such databases impact research. As computational genetics is a fast-paced area, stringent data security provisions might hamper efficiency. Projects are also often multi-disciplinary, so that algorithms are not necessary developed by the scientist using them (and also not always completely understood by him or her). Such collaboration is great for science, but can be tricky for data protection. Additionally, the open access installed for many data collections may pose challenges. Another problem stems from the fact that genetic data are most useful when combined with data from other fields, e.g. a hospital file, geographic data, cultural data or linguistic data. Each additional set leads to more scientific insights but also increases the possibility of identification of the data source.

After this scientific introduction, the implications of this type of genetic research for data protection under the GDPR were explained.  Although in the past legal literature considered anonymization of genetic data sufficient to not fall under the definition of personal data, such assumption cannot be upheld when researchers have the whole genome potentially in addition with other data sets available. The increased identifiability is brought about by several reasons, including heightened computing power, the sheer amount of available data and the common (good) research practice to make the data underlying the research available so it can be double-checked by third parties. Even the organisers of the 100.000 genomes project admit that data subjects remain identifiable by their genetic data despite its anonymization, hence genetic data will almost always be personal data. Researchers working with genetic data should be made more aware of this fact, since consequently this also means the application of the GDPR. However, the main pillars of the GDPR, such as the correct legal base for processing, data processing requirements, or data subject rights, could pose difficulties for researchers.

In addition, the GDPR clearly defines genetic data as being always sensitive data. This is a change to the previous regime under Directive 95/46/EC, where this had to be established in each individual case. The blanket assumption of the GDPR that genetic data are sensitive will therefore force a change in many current research practices. While this will certainly pose an extra burden on researchers, it is also much clearer than the previous situation and therefore facilitates research planning. In general, the GDPR provides two legal bases to process genetic data: explicit consent regulated in Article 9 para 2 (a) GDPR, and an exception for processing for scientific purposes regulated in Article 9 para 2 (j) GDPR.


Explicit consent is difficult to obtain for genetic research. Such consent would need to be provided by an informed data subject. However, as the research is so complex and studies often do not know what they will find in advance, it is nearly impossible to provide appropriate information for informed consent. This issue is heightened by the multi-disciplinary methods applied, so that actually hardly any one scientist can explain the whole study. The specified legal requirements for consent can also be tricky for researchers to provide. These elements include the identity of all data controllers and processors, which can be hard in a multinational collaboration as new people might be added later. Also, the purpose of the processing is not always easily defined. As hinted at above the unknown unknowns are often the aim of genetic research. The GDPR even takes that into account and recounts these difficulties in Recital 33. Information about the research parameters poses a further challenge, as it can be impossible to provide from the start information about data storage time, or other research uses.

A famous example of these issues can be found in a case against Arizona State University. Researchers of this case obtained consent of Native American participants to analyse their data for a diabetes study. During the study researchers also found indicators for schizophrenia in the samples and included those results in the publication. The participants found such results stigmatising and sued as schizophrenia was never mentioned when consent was originally obtained. They won their case and trust between the native community and university research in Arizona has not yet been restored. Even if such research studies use rather detailed consent forms, as demonstrated by a consent form for a project involving Aboriginals in Australia, language can pose an understanding issue especially for vulnerable participants. In addition, measures proposed in such forms such as prevention of open access, can actually be a barrier to science.

Consent is however only one issue posed by the principles of the GDPR. The purpose limitation principle is the most obvious problematic. Researchers usually cannot define their purpose from the outset. The principle of data minimisation is counter-intuitive to genetic science, as researchers actually need to analyse as many whole genomes as possible to reach statistically sound results. They need the haystack to find a needle. Similar problems are created by the principle of storage limitation stating that data should be destroyed as soon as possible. However, genetic markers are often only found comparing samples over a long period of time. For frontotemporal dementia, a rare neurodegenerative disease, the relevant marker could only be found comparing data of one family, where it was especially prevalent, reaching back to the 1870ies.   

Arguably more problems arise though, if consent is used as a legal basis for processing. Considering data subject rights – more information needs to be provided if consent is the legal basis. The right of data portability only applies where a person provided data itself, so mainly when they consented to the processing, and does not need to apply to genetic analysis if the other legal basis is used. Additionally, the GDPR allows Member States to limit data subject rights e.g. the right to access and to rectify data, if the legal basis is the exception for scientific processing.

Finally, the presenters observed that the exception for scientific research of Article 9 para 2 (j) GDPR is more suitable for genetic research than explicit consent. This preference can be explained not just with the practical difficulties involved in obtaining informed consent, but also with the facilitation with data subject rights explained previously. There are many cases, where it is neither practical nor possible to use informed consent for genetic data due to the specific nature of this type of research. To base genetic data processing on the exception for scientific purposes also allows for reuse of data, and poses less strict requirement for storage. However, using this exception for genetic data processing does not provide a blank cheque for processing. As Article 89 GDPR specifies, there are certain conditions to be followed, including a necessity and a proportionality test, and the need for a specific legal basis created by the Member States. Placing genetic data processing within these rules will be more reasonable in most cases, but of course there are still instances, where explicit consent should be required.

Ethics bodies dealing with such questions so far stated a clear preference for basing genetic data processing on explicit consent. This aligns with their general support for using consent, whenever possible, since this has many ethical advantages. They also usually prefer consent for the processing of anonymised data. It is the opinion of the presenters that this push in favour of consent, could be considered abuse of that legal basis and is not appropriate for certain occasions.

Another issue independent from the legal basis question that requires further examination is the potential need for a data protection impact assessment of Article 35 GDPR. Such assessment is required according to the GDPR if there is a wide-spread processing of sensitive data. Arguably for computational genetics a single genome can be big data and hence represent a case of wide-spread processing of sensitive data. This rings especially true, when one considers the mentioned tendency of researchers to combine many different data sets. In a data protection impact assessment, the potential risks to the freedom of the data subjects would need to be considered. This could prove to be problematic – the amount of potential issues would be enormous and touch upon many areas. For research centres, it might mean the necessity to hire external expertise which then could lead to budgetary constraints. A potential solution for the institutional level would be to develop expertise for this kind of impact assessment within the different research groups. Individuals with expertise could assess the risks involved. This analysis would need to go beyond the purely legal domain and also consider social issues such as stigmatisation or discrimination.

Concluding, the presenters stated that genetic data are more often than not personal data. If they are personal data, they are also automatically sensitive data. Explicit consent is pushed as a legal basis for such genetic data processing, although it is often not suitable. The presenters suggested that more use should be made of the exception for processing related to scientific purposes, which can provide a more satisfactory solution to many of the issues connected with processing genetic data.

The presentation was followed by a lively debate with the participants about possible implications for cross-border research projects, scientific data transfers between countries, the extent of protection the GDPR requires for sensitive data, the possibility of differing application of the scientific purpose exception, the feasibility of European ethical standards for genetic data processing, and the role of data protection officers (DPO) in data protection impact assessments.

Connect with us

Brussels Privacy Hub

Law Science Technology & Society (LSTS)

Vrije Universiteit Brussel

Pleinlaan 2 • 1050 Brussels



Stay informed

Keep up to date of our activities and developments. Sign up to our newsletter:

My Newsletter

Copyright © Brussels Privacy Hub