When discussing the application of AI in qualitative research, the conversation invariably gravitates towards two pivotal concerns: bias and ethics. For some researchers, these concerns are deal-breakers, effectively ruling out the use of AI. For others, they introduce a level of discomfort and uncertainty, leading to reluctance in adopting AI tools. However, there's a growing segment that recognizes the inevitability of AI's role in research. This group seeks guidance on navigating these challenges. Meanwhile, researchers who have spent considerable time experimenting with AI have come to understand that while bias and ethics are significant issues, they don't necessarily preclude the use of AI. Instead, they view these challenges as critical topics for discussion and action, emphasizing the need for thoughtful integration of AI into their research methodologies.
In this article, my focus is on the issue of bias, as for example articulated by Christou (2023a):
“It is important to highlight the shortcomings and criticisms of AI, such as the potential bias towards data or perspectives. This is especially true if AI in the form of a deep learning model is trained using a dataset that itself contains biased information, which could then lead to the generation of biased responses.” (p. 1971)
They are trained on huge data swaths of text and that text may itself be biased toward groups of people or language. Due to this, the model may produce information that is unfairly biased or discriminatory, thus contributing to the maintenance of harmful stereotypes and inequalities.” (p. 1974)
Christou suggests various remedies to deal with such bias. The first one is to be transparent and acknowledge the potential bias and what they have done to mitigate it. Next the researchers must become well-versed in their data to comprehend it completely. Only then can they evaluate the outcome of any AI-based analysis. The results need to be thoroughly reviewed before any theoretical or conceptual discussion can take place. This can be done by validating the accuracy and reliability of the information by identifying and analysing connections and relationships in a process of cross-referencing.
These recommendations, representing best practices, were established well before the advent of AI to ensure the quality and trustworthiness of research findings (Creswell, 2014; Maxwell, 2013). Therefore, we can simply continue to apply and adapt them in the context of AI usage.
Another suggestion from Christou for addressing the challenges presented by AI involves researchers taking measures "to ensure diverse and unbiased training data" (p. 1977), a viewpoint that is shared by other experts in the field. However, this is an impractical task and, as I will argue, an unnecessary one. Generative AI apps are built on top of so-called foundational model. OpenAI's Chat-GPT 3 has been trained on 45 terabytes of text data. Meta's Llama 2 model underwent training with an extensive dataset comprising 2 trillion tokens. The definition of a 'token' varies across languages. In English, for instance, a single word typically corresponds to between 0.5 to 1 token. The enormity of these training datasets makes it nearly impossible for researchers, or anyone for that matter, to guarantee the absence of bias in all the data.
A commonly cited concern regarding the use of generative AI tools is the lack of transparency from companies about their training data. However, even if a comprehensive list of training data were provided, it's impractical to expect researchers to meticulously review it all, given the massive volume.
Moreover, it's already acknowledged that these data sets contain biases; the critical question is how we address and mitigate these biases. Additionally, it's inaccurate to claim complete ignorance about the contents of the pre-training data, as some general information is typically known.
We know that foundational models are pre-trained on the following data types:
Books: A wide range of books, covering fiction and non-fiction across various genres and subjects, provides the model with narrative structures, formal language, and a depth of knowledge on specific topics.
Websites: Content from a variety of websites, including news articles, blogs, and forums, gives the model exposure to current events, informal language, opinions, and everyday knowledge.
Scientific Papers: Incorporating scientific literature allows the model to learn technical and specialized vocabulary, as well as complex concepts and reasoning found in academic writing.
Wikipedia: As a comprehensive encyclopaedia, Wikipedia offers structured information on a vast array of topics, helping the model understand facts and general knowledge about the world.
Instructional and Help Documentation: Manuals and help guides provide the model with how-to knowledge and procedural information.
Social Media Posts: Social media data provides insights into colloquial language, current slang, and social discourse.
Other Online Texts: This includes a range of texts available online, such as news articles, blog posts, and other web content.
The biases in AI training data present in various forms, notably in cultural and linguistic representation. The majority of data sourced from the internet predominantly mirrors Western culture, and English emerges as the most prevalent language. Thus, certain groups of people or viewpoints are underrepresented in the training data. Therefore, the AI might not perform as well for these groups or might not generate content that reflects their perspectives accurately. This is referred to as representation and content bias.
Further, the AI can reflect cultural biases present in the training data, which might lead to responses that are more aligned with dominant cultural norms and less aware or sensitive to others. Regarding language, the AI performs better in languages or dialects that are more prevalent in the training data (i.e. English), potentially leading to less accurate or nuanced responses in less-represented languages.
Companies developing foundational models are aware of the biases inherent in their training data and do not turn a blind eye on it. Rather they have taken measures to mitigate these biases through techniques like careful dataset curation, algorithmic adjustments, and ongoing testing and evaluation. Take for example a look at:
https://openai.com/blog/how-should-ai-systems-behave
https://ai.meta.com/blog/measure-fairness-and-mitigate-ai-bias/
Although still in their infancy, foundational models are evolving to become more inclusive of diverse views, perspectives, cultures, and languages over time. However, I don’t think they will ever be completely free from biases.
In this section, I would like to reframe the discussion by posing a few questions: Why do we actually demand unbiased training data, even if our own scientific writings are not free of bias? Can objective writing even exist?
Our perceptions and interpretations are invariably shaped by our upbringing, cultural background, and life experiences. Our advantage as qualitative researchers is that we are acutely aware of this. Let's turn to Hammersley (2013) for an insightful reminder of what conducting qualitative research entails:
“There is acceptance, perhaps even celebration, of the fact that data, and inferences from them, are always shaped by the social and personal characteristics of the researcher. It is recognized that it is impossible to eliminate the effect of these, and indeed that they may facilitate insight as well as leading to error” (p. 13).
This reflection is not limited to qualitative research alone. It equally applies to quantitative research, where the researcher's identity equally shapes every stage of the research process. From formulating research questions to selecting target populations, from the wording of survey questions to the interpretation of data and results, the researcher's perspective plays a pivotal role. While statistical results might seem objective, the interpretation of these results is always filtered through the researcher's subjective lens.
Denzin and Lincoln (1994) describe the (qualitative) researcher as a bricoleur.
“The bricoleur understands that research is an interactive process, shaped by his or her personal history, biography, gender, social class, race, and ethnicity, and those of the people in the setting.
The bricoleur knows that science is power, for all research findings have political implications. There is no value-free science. The bricoleur also knows that researchers tell stories about the worlds they have studied. Thus, the narratives, or stories, scientists tell are accounts couched and framed within specific storytelling traditions, often defined as paradigms (e.g. positivism, post-positivism, constructivism)” (p. 3).
In highlighting that the focus on biases has been a persistent issue in qualitative research, I deliberately refer to classic literature; texts which may have faded from the collective memory of modern scholars due to their absence in digital formats. Clarke, as early as 1975, acknowledged that a researcher's personality influences both the choice of research topic and the methodological approach.
Details about the personal and intellectual journeys leading researchers to abandon certain inquiries or embrace new topics often remain hidden. Influences such as family dynamics and spousal support, or the lack thereof, can have a significant impact on a research project. Furthermore, the prestige of a researcher's institutional affiliation may play a pivotal role in either facilitating or impeding opportunities like access to research venues or funding. Reinharz (1992) illuminated the fact that a substantial portion of qualitative observational research has historically been conducted predominantly by white, privileged males.
Postcolonial research emerged prominently in the latter half of the 20th century not only as a response to the colonial legacy but also as a critical examination of the Western-dominated academic system that shaped research practices and knowledge production (Jack and Westwook, 2006; Watermeyer and Neille, 2022; Noda, 2020).
Western scholars and institutions have historically had a disproportionate influence on the direction of academic research and the creation of knowledge. This dominance meant that research often reflected Western priorities, interests, and values, leading to a skewed understanding of non-Western societies and cultures. Local knowledge was constructed based on Western methodological standards. Postcolonial research questions this so-called 'universal truths' and 'objective' knowledge claims that are largely based on a Eurocentric worldview.
This is also reflected in scholarly publications. Dahdough-Guebas et al. (2003) found in their study that in a random sample of publications focusing on least-developed countries, 70% did not feature a local researcher as a co-author. Often in such research, local colleagues are relegated to providing logistical support, with their expertise overlooked and their contributions unrecognized in the final scholarly work.
Noda's (2020) analysis of submissions and acceptances by Western and Non-Western scholars in international relations journals reveals striking disparities. For instance, in the International Studies Quarterly between October 2018 and September 2019, of the 704 manuscripts received, only 130 (18%) were from non-Western authors. Out of these 704 submissions, 65 were accepted for publication, translating to a 9% overall acceptance rate. Notably, 61 of the accepted manuscripts were by Western scholars, indicating an 11% acceptance rate for Western authors compared to just 3% for non-Western scholars.
Examining the International Political Sociology journal from January to November 2019, out of 183 submissions, 41 (22%) originated from non-Western institutions. However, of the 36 authors ultimately published, 35 were affiliated with Western institutions, and only one was a non-Western scholar (Lisle et al., 2019).
This pattern is not an isolated occurrence. You’ll find more such examples if you dig into the literature. If we now return to the sources for the training of foundational models, we see that one of the sources is scientific works, an area to which predominantly Western scientists contribute. Thus, unintentionally, they have contributed to the very bias that some of them are now pillorying.
However, there's a positive aspect: understanding the origin of these biases enables us to address them more effectively. Consequently, the so-called 'black box' of AI bias might be less obscure than often claimed.
I recently saw the show ‘Where I was’ by the comedian Trevor Noah (2023) recording in Detroit, USA. While watching my thoughts were: He would make a very good qualitative researcher, given his sharp observation skills and his way of seeing and relating things to each other. Among many other things, he presented a list of the top 5 things white people love.
No. 5: Museums (“there is not even a place where white people have settled and not build a museum…”)
No. 4: Swimming (just think of the Olympics)
No. 3: “White People Love Being Flabbergasted
No. 2: Being white (“It is a pretty sweet gig. I don’t blame you.”)
No. 1: Sweet Caroline by Neil Diamond (“There is nothing that brings more joy to the soul of a white person than the sounds of that Neil Diamond song.”)
To understand the No. 1, I have to tell you a little more about the show. In the beginning, Trevor was talking about the hymns of various nations and also singing them. Nobody was joining in. He also did not ask for it. Then he begins to sing in a somewhat pressed voice:
“Hands, touchin’ hands. Reachin’ out, touchin’ me. Touching you. Pump, pump, pum.” (the words are difficult to understand). Then:
“Sweet Caroline” – this is the point where the audience is joining in (pump, pump, pump) without him having encouraged them. Earlier he said: “it works every time, I have been on every continent, in many countries, when that song plays, you see white people’s eye light up.”
If you are wondering how this relates to the topic we are discussing here: AI and bias. This is a wonderful demonstration of how biases are often rooted in cultural stereotypes and shared experiences.
It's unlikely that foundational models will ever achieve complete unbiasedness. The inherent bias even in supposedly objective sources like scientific papers is a testament to this. Therefore, the goal should be to achieve a more balanced representation of data, encompassing a wider array of perspectives, cultures, and languages, rather than setting requirements that are unattainable. As mentioned above, the companies training foundational models are also aware of it and are working towards mitigating AI bias.
Understanding these considerations, how can we effectively address AI Bias? I propose two strategies:
Many people are hesitant to adopt generative AI due to concerns that it will automatically perform analyses without providing insight into how the results were achieved and thus, issues like biases become relevant. Personally, I am cautious of any tool that claims to offer instant analysis with just a click of a button. This effectively creates a 'black box' scenario, where users remain unaware of the underlying processes.
Adopting this novel technology solely for the purpose of automating tasks, including coding or thematic analysis, is not the most effective approach in qualitative research. It's more beneficial to first explore and comprehend the technology's potential capabilities, and then align these with specific tasks that can help us realize our objectives.
It's important to recognize that generative AI, rooted in Large Language Models, excels in processing text, including letters, syllables, and words. However, it is not equipped as a trained analyst for qualitative data. While it's true that generative AI can detect patterns, these are primarily linguistic patterns, not the type of patterns sought in qualitative data analysis.
As researchers our goal is to uncover patterns that offer deeper insights into our subject matter. We look for patterns influenced by context, which shape participants' responses or behaviours; patterns that relate to participant characteristics like age, gender, background, or other demographics; patterns that reflect changes stemming from specific events or experiences, just to give a few examples.
Large Language Models (LLMs) can recognize relationships between sentences and paragraphs, which is essential for grasping the context and maintaining coherence in conversations. This ability plays a key role in discerning the overall theme of a text. However, this form of pattern recognition mainly operates at the linguistic level and does not engage with linking meanings that necessitate a more profound interpretation of the text.
Let's explore the strengths of Large Language Models (LLMs). Their capacity to retain extensive amounts of text far exceeds that of a human researcher. Moreover, they can efficiently locate and retrieve specific information from these texts when prompted. By interacting with an LLM through an application, we can leverage this capability in several ways:
I have the ability to pose questions to the LLM regarding my data. Beyond that, I can also supply the LLM with contextual information about my research project, enabling it to better understand and contextualize my data.
The effectiveness of the LLM's responses heavily depends on the quality of my prompts. While I can request the LLM to conduct a thematic analysis – as it knows from its training data what thematic analysis is – this approach still faces the limitations previously discussed.
By asking specific, targeted questions about my data, my approach with the LLM becomes more structured and methodical. This allows me to critically evaluate each response, assessing for potential biases. A well-designed tool should also enable me to verify the original sources upon which the LLM's responses are based. If biases are detected, they might also reflect those present in the responses of my study's participants.
In the next step, we need to reflect on our own biases when interpreting the AI's responses, as these are preliminary insights rather than conclusive results. The final outcomes are shaped by our interpretation and analysis of these insights.
If we approach AI-assisted qualitative data analysis in this way, we minimize the impact of biases inherent in an LLM's training data. By using the AI not as a tool to conduct the analysis for us, but rather as a means to extract pertinent information, we can more effectively analyse the data ourselves, ensuring a more controlled and reflective approach to understanding our findings.
This brings me to the second point.
In learning about qualitative research, you quickly encounter the following key lessons:
The subjectivity of both the researcher and the subjects is an integral part of the research process.The researcher's reflections on their actions and observations form a critical component of the data and significantly influence its interpretation (Flick, 1998).
Schwandt (1997) distinguishes two forms of reflexivity. The first involves critical self-reflection on one’s own biases, theoretical predispositions, and preferences. The second emphasizes the researcher's role within the setting and context of the study, and the social phenomena being investigated. This reflexivity is crucial in ensuring that research doesn't devolve into a self-serving ideology, a concern highlighted by Thomas (1993). Additionally, as emphasized by feminist researcher Scheper-Hughes (1992), it is vital in preventing the perpetuation of androcentric views and biases, particularly those related to race and class, in their research.
As you can see, reflecting on and addressing biases has long been a fundamental skill for qualitative researchers. Being reflexive involves continually conversing with oneself. Instead of just presenting findings as facts, the reflexive researcher actively forms interpretations and then examines the origins of these interpretations (Van Maanen, 1988). With the AI in the middle, assuming that you use an interactive approach as outlined above, you simply do the same with. You can either critically reflect on the output received from the AI, or converse with the AI directly.
You might be stunned by the reflective capacity of an LLM when you collaborate with it. If you notice bias in a response, it's beneficial to bring it to the LLM's attention. Sometimes, it may also be necessary to rephrase your prompt, as a biased response could stem from the way the prompt was formulated.
The LLM is well-equipped to revisit and delve deeper into the data as needed. Often, a second response can be more insightful than the first. Additionally, you can instruct the LLM to analyse the data from a specific perspective. If you suspect any racial, political, or gender biases, inform the LLM and request a revised response. This approach helps in obtaining more balanced and nuanced insights.
Thus, by guiding the AI in its analysis through careful prompting and interaction, rather than allowing it to operate independently — any potential bias in the AI's responses becomes significantly less problematic. Give it a try, and you might be pleasantly surprised. Experimenting with different perspectives when analysing data can be an engaging and enlightening process, ultimately enriching your final interpretation.
You may now have a clearer understanding of the image I created using DALL.E to illustrate the content of this article:
Rather than solely focusing on the potential biases of the other (such as the AI), it is equally important to reflect on our own biases as mirrored in our self-reflection.
The conclusion presented below was crafted in collaboration with ChatGPT, which had been previously consulted to refine the phrasing of my sentences. Hence, it was well-versed with the context of this article. After completing the editing process for the article, I uploaded the image shown above and requested ChatGPT to compose an appropriate conclusion:
“The interplay between human reflection and artificial intelligence, as captured in the accompanying visual, offers a profound metaphor for our exploration of bias in research. As we stand at the precipice of technological advancement, akin to the figure gazing into the fusion of nature and circuitry, we are called to look beyond the surface.
Just as AI reflects back the data it's given, often magnifying the biases within, it also holds a mirror to our own preconceptions. As researchers our challenge is to recognize and dissect these biases, understanding that in the realm of knowledge, our self-awareness is as crucial as the intelligence we aspire to create. It is not enough to scrutinize the AI for its potential to reflect societal biases; we must also turn inward, questioning the foundations of our own perspectives. This dual examination paves the way for more equitable and insightful research outcomes, fostering a symbiosis between human and artificial cognition that can elevate our quest for understanding.”
References
Creswell, J. W. (2014). Research Design: Qualitative, Quantitative and Mixed Methods Approaches (4th ed.). Thousand Oaks, CA: Sage.
Christou, P. (2023a). How to Use Artificial Intelligence (AI) as a Resource, Methodological and Analysis Tool in Qualitative Research? The Qualitative Report, 28(7), 1968-1980. https://doi.org/10.46743/2160-3715/2023.6406
Christou, P. (2023b). The Use of Artificial Intelligence (AI) in Qualitative Research for Theory Development. The Qualitative Report, 28(9), 2739-2755. https://doi.org/10.46743/2160-3715/2023.6536
Dahdough-Guebas, F., Ahimbisibwe, J., Van Moll, R. and Koedam, N. (2003). Neo-colonial science by the most industrialised upon the least developed countries in peer-reviewed publishing. Scientometrics, 56 (3), 329–343.
Denzin, N. and Lincoln, Y.S. (1994). Introduction: Entering the field of qualitative research. In: N. Denzin and Y.S. Lincoln (eds). Handbook of Qualitative Research. Thousand Oaks, CA: Sage.
Flick, U. (1998). An Introduction to Qualitative Research. London: Sage.
Jack, G. and Westwood, R. (2006) Postcolonialism and the politics of qualitative research in international business. MANAGE. INT. REV. 46, 481–501. https://doi.org/10.1007/s11575-006-0102-x
Hammersley, M. (2012). What is qualitative research? (‚What is?‘ Research Methods series). London: Bloomsbury.
Knoblauch, H. (2021). Reflexive Methodology and The Empirical Theory of Science. Historical Social Research, 46(2), 59–79. https://doi.org/10.12759/hsr.46.2021.2.59-79
Lisle, D., R. Doty, A. Hall, and V. Squire. (eds.). International political sociology annual editorial report 2019. https://www.isanet.org/Portals/0/Documents/IPSJ/2019_IPS%20Report.pdf?ver=2019-11-06-103241-263
Maxwell, J.A. (2013). Qualitative Research Design: An Interaktive Approach (3rd ed.). Thousand Oaks, CA: Sage.
Noah, T. (2023). “Where I was.” Netflix, www.netflix.com.
Noda, O. (2020). Epistemic hegemony: the Western straitjacket and post-colonial scars in academic publishing. Rev. Bras. Polít. Int. 63 (1). https://doi.org/10.1590/0034-7329202000107
Schwandt, T. A. (1997). Qualitative Inquiry. A Dictionary of Terms. Thousand Oaks, CA: Sage.
Thomas, J. (1993). Doing Critical Ethnography. Thousand Oaks, CA: Sage.
Reinharz, S. (1992). Feminist Methods in Social Research. New York: Oxford University Press.
Scheper-Hughes. N. (1992). Death with Weeping: The Violence of Everyday Life in Brazil. Berkeley: University of California Press.
Van Maanen, J. (1988). Tales of the Field: On Writing Ethnography. Chicago: Chicago University Press.
Watermeyer J. and Neille J. (2022). The application of qualitative approaches in a post-colonial context in speech-language pathology: A call for transformation. Int J Speech Lang Pathol. 2022 Oct; 24(5):494-503. doi: 10.1080/17549507.2022.2047783. Epub 2022 Apr 18. PMID: 35435778.