Data Privacy and Confidentiality Agreements
As AI technology, particularly generative AI, becomes more prevalent in research practices, understanding and addressing its ethical implications is crucial. This three-part series will explore the landscape of ethical research, initially setting a foundation by examining general standards. Subsequently, we'll delve into issues uniquely pertinent to the use of generative AI. The first instalment discusses privacy and confidentiality; the second assesses ethical data analysis with generative AI; and the third explores the integration of generative AI in report writing
General standards
The ethical criteria for research are extensive, yet many are fundamental to good human interactions. These include principles such as non-harm, honesty, respect, and special consideration for vulnerable groups. For our discussion on generative AI, I've highlighted the criteria that are particularly critical when this technology is involved.
Legal compliance: Research activities must always be carried out in accordance with the applicable laws and regulatory frameworks in order to ensure legal certainty.
Informed consent: Participants must be fully informed about the objective, methodology, and potential impact of the research. Your consent must be given voluntarily on this basis.
Maintaining confidentiality: All personal data and information collected during research must be treated with the utmost discretion and protected to ensure the privacy of those involved.
Data protection: It is the responsibility of researchers to respect and protect the privacy of participants through appropriate data security measures.
Social responsibility: Researchers have a responsibility to society and should aim to make a positive contribution to the good of society through their work.
Principle of non-harm: At every stage of the research, care must be taken to ensure that there is no harm to the participants, the environment or society.
Respect for human dignity: Upholding the rights, interests and inherent dignity of all research participants is a fundamental obligation.
Honesty and integrity: The presentation of research results must be sincere, transparent and free from any form of manipulation in order to strengthen credibility and trust in scientific research.
Objectivity: No research can be truly objective; however, researchers must reflect their personal prejudices and bias at all stages in the research process.
Careful and accurate documentation: It is essential that all research data is accurately recorded and conscientiously documented to ensure traceability and reproducibility of the research.
Respect for intellectual property: the recognition and protection of the intellectual property of others is mandatory; Plagiarism is to be avoided at all costs.
Consideration of minors and vulnerable groups: Research projects involving minors or those who are unable to represent themselves require specific safeguards and ethical considerations.
Environmental ethics: The impact of research on the environment must be considered and minimized in order to promote environmental sustainability.
Animal ethics: Research involving animals must take into account their welfare and prevent unnecessary suffering.
Responsible use of research funds: The allocation and use of research funds should be transparent, efficient, and targeted in order to ensure the integrity and public trust in research.
Fair evaluation: Evaluation procedures, such as peer reviews, must be conducted fairly and as objective as possible to ensure the credibility of scientific publications and funding decisions.
Source: "Praxisleidfaden für Integrität und Ethik in der Wissenschaft". Federal Ministry of Education, Science and Research, Austria.
Addressing these ethical issues individually can be challenging because the solutions often overlap and address multiple concerns simultaneously. Starting with legal compliance, the simplest step is to verify that the tools you use adhere to the privacy regulations relevant to your location. Key regulations include the GDPR in Europe, CCPA in California, and HIPAA for health data in the U.S., which I've discussed in more detail in my previous blog post, 'Beyond the Comfort Zone: The Dynamics of Technological Acceptance.' Importantly, data privacy laws become irrelevant if your data is fully anonymized. Let's delve into what complete anonymization entails.
Anonymization
Anonymization transforms personal data so that individuals can no longer be identified or can only be identified with an unreasonable amount of effort in terms of time, cost, and manpower. This process involves stripping away or altering all identifying characteristics, rendering re-identification of the individual virtually impossible. Once data is anonymized, it is no longer classified as personal data, thereby lifting many of the data protection constraints associated with its use. Consequently, you can securely utilize this anonymized data with AI tools in the cloud.
Pseudonymisation
Pseudonymisation is a data processing technique where personal identifiers are replaced with pseudonyms, or unique identifiers, making it difficult to directly identify individuals without additional information. This extra information must be stored separately and securely to prevent unauthorized access. While pseudonymised data reduces the risk of direct identification, it is still considered personal data under data protection laws because re-identification is possible. Consequently, if you possess a dataset with real names replaced by pseudonyms, you still need the consent of the research participants to use this data with AI tools in the cloud.
Today, preparing a transcript is far less burdensome thanks to AI tools. However, using these tools can present challenges, particularly because the raw data may contain identifiable information about research participants. While obtaining consent to process this data with an AI tool puts you on solid legal ground, it doesn't necessarily equate to ethical responsibility. There’s a risk that participants might not fully understand the confidentiality agreements or informed consent documents they sign, or they may not be able to assess the potential risks involved in the event of a data breach.
Some commentators are now highlighting concerns about data breaches in relation to generative AI. However, it's worth noting that if the potential consequences of a data breach are truly severe, then, irrespective of generative AI, such sensitive data shouldn't be stored on any internet-connected computer. This issue wasn't widely discussed before the emergence of generative AI, and it's doubtful that individual researchers would have typically invested in a completely offline computer solely for managing sensitive data.
On a positive note, generative AI has enabled the development of tools that can transcribe and anonymize data offline, significantly improving the protection of research participants. I have discussed these tools in previous writings, but for your convenience, I’m including the links here again:
aTrain (Windows only)
To utilize these tools, your computer needs a dedicated external graphics card, as they are GPU-intensive. Currently, gaming computers are best suited for this task due to their robust hardware. Moreover, there are already computers available with chips specifically optimized for AI tasks, and it's likely that we will see more of such specialized hardware in the future. In a few years, it's probable that most computers will be equipped to handle AI applications efficiently.
For the time being, we must work within the limitations of current technology. For example, expect that an offline AI tool may take approximately three hours to transcribe a one-hour interview. Fortunately, this task can easily be performed overnight or run in the background while you attend to other duties.
As this discussion focuses on ethics, it's essential to allocate time to review the transcripts. While AI-generated transcripts won’t be flawless, thoroughly examining them is time well spent. This review not only allows you to familiarize yourself with the data, setting the stage for data analysis, but also upholds the ethical standards of integrity and meticulous documentation.
To dive deeper into your discussion on AI and confidentiality, it's important to clarify the distinctions and connections between informed consent and confidentiality agreements. Here's a more detailed exploration of each concept, followed by a comparison that highlights their different roles, especially in the context of AI:
Informed consent is a fundamental process in both research and clinical settings, designed to protect and inform participants or patients. It involves providing a clear and comprehensive explanation of the procedures, risks, benefits, and rights associated with a particular study, treatment, or experiment. The aim is to ensure that participants are fully aware of what their involvement entails, allowing them to make an educated and voluntary decision about whether to participate.
A confidentiality agreement, often used in research, business, and employment contexts, is a legal contract designed to protect sensitive information from being disclosed to unauthorized parties. In the context of research involving human subjects, such agreements might be used to safeguard participants' private data, ensuring that it is not shared beyond the confines of the research team without explicit permission.
While both informed consent and confidentiality agreements involve ethical considerations and privacy, their applications differ significantly:
In the context of AI, informed consent primarily revolves around ensuring that individuals understand how their data will be used, what the AI is capable of, and the potential risks of data breaches or misuse. This is crucial, especially when AI systems analyse personal data, as participants need to be aware of how their information is handled.
Confidentiality Agreements in AI-related research projects assure them that personal data will be kept confidential and used only within the agreed terms.
Both confidentiality agreements and informed consent involve the sharing of critical information, but they serve different purposes. Confidentiality agreements focus on the non-disclosure of shared information, ensuring that it remains private. In contrast, informed consent ensures a transparent and comprehensive understanding of what participation involves. Therefore, for a truly ethical approach, a combination of the two is essential.
Many of us have likely experienced signing an informed consent form before a medical procedure. Often, the decision to sign comes without a real choice or a full understanding of the extensive details contained within the document. Similarly, in research settings, legal consent is frequently obtained just before an interview or focus group begins, typically recorded on tape. This method can make participants feel pressured to consent, as they have already committed their time. This type of consent collection can feel more like a unilateral requirement: 'We need your data; you just need to agree.'
Obtaining participants' consent provides legal protection; however, it's imperative for any research project to rigorously adhere to the principle of non-harm. This requires ensuring that participants' data is safeguarded in such a manner that it cannot lead to any harm. Harm to research participants can manifest in various ways, key risks include:
Reputational Damage: The publication or misuse of information that reveals participant identities can harm their professional, social, or family life.
Legal Risk: Uncovering activities through research that may have legal implications for participants presents a significant risk. Even without malicious intent, the dissemination of such information could inadvertently expose participants to legal challenges.
In contrast, a confidentiality agreement represents a mutual contract. Research participants provide their information with the assurance that the receiving party will handle it responsibly. This not only protects the participants but also fosters a relationship of trust and mutual respect, enhancing the ethical integrity of the research process.
The components outlined below serve as the foundation for crafting a comprehensive agreement for research participants. I've highlighted the elements that are particularly crucial when integrating AI tools into the research process.
Introduction of the parties involved: Identification of the party that owns the data and the party that uses the data, including the use of AI tools.
Definition of confidential information: clarification of the data that is considered confidential. This includes raw data, interview transcripts, audio recordings, AI-generated reports, and any notes or analysis taken during the course of the study.
Purpose of the confidentiality agreement: Stipulate that the use of the information exclusively serves the intended research purpose and that it will not be passed on to unauthorized third parties.
Obligations of the receiving party: The use of the information should be for research purposes only, including the conditions for disclosure of the data to third parties, team members or subcontractors, including the use of AI-tools.
Protection of research participants: Describe in detail how data is securely stored, carefully processed, and protected during research. This includes specific measures such as off-line transcription, encryption, access controls, and the use of anonymization or pseudonymization. If you need some background information on how data is processed when using an AI-tool for data processing in the cloud, the following article: ‘Beyond the Comfort Zone: The Dynamics of Technological Acceptance’ offers and introduction.
Use of AI tools: The use of AI tools for data analysis must be explained in detail, including the specific software or LLMs used, to ensure transparency about the processing processes.
Legal disclosure requirements: The conditions under which confidential information must be disclosed as a result of legal requirements, regulations or court orders must be clearly defined, including the procedure for such cases.
Duration of the confidentiality obligation: Determination of the validity period of the agreement, for example for the duration of the research project and a defined period beyond.
Exclusions of Confidential Information: Definition and delimitation of information that is not considered confidential, such as publicly available data or information obtained independently of the Agreement.
Return or destruction of confidential information: It is necessary to determine how to deal with confidential information at the end of the agreement or project, whether by returning it to the owner or by destroying it safely.
Consequences of breach of contract: The consequences of a breach of the agreement, including possible legal action or penalties, must be made clear.
Using the links below, you can download a template (*docx) for an agreement that merges informed consent with your commitments to secure participant data, transforming it into a more reciprocal contract (an English and a German version).
AI-Focused Confidentiality and Consent Agreement Template
Vorlage: Vertraulichkeitserklärung für Forschungsteilnehmende bei KI Nutzung
Please be aware that these templates are not guaranteed to be comprehensive and should not substitute for legal advice. They are designed with GDPR compliance in mind. Use them as a starting point and customize them to fit your specific needs
Artificial Intelligence (AI) Tools: Guidance on the Use of AI in Human Subjects Research (The University of Tennessee, Knoxville).
Webinar: Navigating the Ethics of AI in Qualitative Research
Webinar: Ethik im Fokus: KI-Nutzung in der qualitativen Forschung verantwortungsvoll gestalten