At a recent presentation at the University of St. Gallen, School of Economics and Political Science, I started with a simple exercise.
The presentation was titled: AI and Qualitative Research: Navigating a Polarized Debate
The broader topic was the role of AI in qualitative research. But before discussing methods, tools, risks, and opportunities, I began with AI literacy.
Not because qualitative researchers need to become technical experts. They do not. But because part of the current polarization around AI in research comes from a lack of understanding of what large language models actually do, what they are good at, and where they can go wrong.
Some researchers reject AI outright because they associate it with automation, superficial analysis, or unreliable shortcuts. Others embrace it too quickly, assuming that if the output sounds reasonable, the system must have understood the task.
Both positions are problematic.
The more useful question is not whether AI is “good” or “bad” for qualitative research. The better question is: under what conditions can AI contribute to rigorous, transparent, and context-sensitive analysis?
To explore this, I gave the students a small prompt experiment.
Here are two versions of the prompt:
Prompt 1
“I want to have my car washed at the gas station. It is not far to the gas station. Do you think I should take the car or walk?”
Prompt 2
“I want to have my car washed at the gas station. It is not far to the gas station. Do you think I should drive the short distance or walk?”
At first glance, this looks almost trivial.
The correct answer is obvious: you need to take the car.
The goal is not to get yourself to the gas station. The goal is to get the car washed. So the car needs to get to the gas station.
Yet many chatbots get this wrong.
They focus on the phrase “short distance” and produce a familiar kind of advice:
Walking is healthier. Walking is better for the environment. Walking saves fuel. If it is not far, walking may be the better choice.
This is not an absurd answer in itself. It is reasonable advice for a different question. But it is wrong for this one. The model has subtly shifted the task. Instead of answering:
“How do I get my car washed?”
it answers:
“How should I travel to a nearby gas station?”
That shift matters.
This is what makes the exercise useful.
The problem is not that the chatbot produces nonsense. The problem is that it produces something plausible, fluent, and helpful-sounding — while missing the decisive condition of the task.
The decisive condition is the goal: the car needs to be washed.
Once that goal is held in focus, the answer follows. The car must be brought to the car wash. Walking only makes sense if someone else is bringing the car, or if the person is no longer trying to wash the car.
The error is not a failure of vocabulary. It is not a lack of information about cars, gas stations, or walking. It is a failure to maintain the purpose of the action.
This is a small example, but it illustrates something much larger.
Large language models do not simply “reason” in the way humans often assume they do. They generate responses based on patterns in language, task framing, and contextual cues. When a phrase such as “short distance” appears, it may activate a familiar script:
short distance → walk instead of drive.
That script is usually sensible. But here it conflicts with the actual goal. The system gives a good answer to the wrong problem.
In qualitative research, this kind of wrong turn is much more consequential.
A chatbot may not misunderstand a car wash. It may misunderstand a research question. It may misread the role of a participant. It may treat an interviewer’s prompt as participant data. It may interpret irony literally. It may flatten differences between cases. It may identify surface-level themes while missing the analytic purpose of the study.
And it may do all of this in fluent, confident language.
This is why AI-assisted qualitative analysis cannot be reduced to uploading transcripts and asking: “What are the themes?”
That question may produce an answer. It may even produce a useful starting point. But if the model does not know what the study is about, who the participants are, why the data was collected, what the research question is, and what kind of analysis is intended, it has to infer the frame.
And when the AI infers the frame, it may take an analytically wrong turn.
The danger is not only hallucination in the narrow sense of inventing facts. The danger is also misframing.
A model can stay close to the text and still misunderstand the analytic task.
One of the most important methodological lessons from this exercise is that context is not optional.
In qualitative research, context is part of meaning.
A statement in an interview does not mean the same thing regardless of who says it, in what situation, in response to which question, under which institutional conditions, and in relation to which research interest.
The same sentence may have different analytical relevance depending on whether it comes from a participant, an interviewer, a manager, a patient, a teacher, a student, a policy-maker, or a service user.
It also matters what the study is trying to understand.
A customer complaint can be analysed as feedback about product quality, as evidence of unmet expectations, as a sign of organizational failure, as part of a service journey, or as an expression of trust and disappointment. The text alone does not determine the analytical frame.
This is why qualitative researchers spend so much time on research design, sampling, fieldwork, memoing, case knowledge, and interpretation. The data never “speaks for itself.” It is always interpreted in relation to a question, a context, and a purpose.
AI does not remove this requirement.
It makes it more visible.
Many AI features in qualitative software promise code suggestions, automatic themes, summaries, or pattern detection based on the uploaded data.
This can be useful. But it also raises a methodological problem.
If the system only sees the raw data, it can only work from the raw data. It may detect repeated words, emotionally salient statements, common topics, or semantically similar passages. But it does not necessarily understand the study.
It may not know:
what the research question is, why the interviews were conducted, what the sampling strategy was, which speaker roles matter, which contextual variables are analytically relevant, what theoretical lens is being used, what the researcher already knows from the field, what kind of output is needed, or what counts as a meaningful finding.
This is not a minor limitation. It affects the quality of the analysis. Without context, the AI has to guess what matters.
And when it guesses what matters, it may produce results that are linguistically plausible but methodologically weak.
This is especially important when researchers use general-purpose chatbots for analysis. Many people upload transcripts and ask for themes without providing the study background, research objectives, participant information, interview guide, sampling logic, or intended analytical use.
The chatbot will still respond. That does not mean it has understood the analysis. It means it has generated a response from the information available — and from patterns learned elsewhere.
One reason this is difficult to teach is that AI outputs often sound convincing.
A weak human analysis often looks weak. It may be vague, poorly structured, repetitive, or obviously unsupported.
A weak AI-generated analysis can look polished.
It may have headings. It may have balanced language. It may mention nuance. It may include caveats. It may appear reflective.
This creates a new methodological risk: the appearance of quality can become detached from the quality of the analytical reasoning.
The car wash example shows this in a simple way. The wrong answer can still be well written. It can still include sensible considerations. It can still sound helpful.
But it has missed the point.
In qualitative analysis, this is a serious issue. The output may sound like analysis while operating at the level of topical summary. It may identify “themes” that are actually broad topics. It may give a coherent interpretation that is not sufficiently grounded in the data. It may overgeneralize from a few vivid passages. It may ignore case differences. It may miss contradictions. It may answer a generic version of the research question rather than the specific one the study asks.
Fluency is not evidence of understanding.
A common response to these problems is: “Then we just need better prompts.”
Better prompting helps. There is no question about that.
If we tell the AI to reason from the goal of the action, it is more likely to solve the car wash problem correctly. If we give it the study background, the research question, the participant roles, and the analytical aim, it is more likely to produce useful outputs.
But prompting alone does not solve the methodological issue.
Qualitative analysis is not a task that can be fully specified in advance through a single perfect instruction. It is iterative. It develops through engagement with the data. The researcher asks, checks, compares, refines, challenges, and reinterprets.
The issue is therefore not simply prompt engineering. It is analytical process design.
The question becomes: how do we design AI-supported workflows that keep the researcher involved in the interpretation, while making good use of what AI can do well?
This is where I think the debate around AI and qualitative research needs to move.
The central question should not be: Can AI analyze qualitative data?
That question is too broad and too easily misunderstood.
A better question is:
How can AI support researchers in conducting better, more transparent, and more efficient analysis without removing the researcher from the interpretive process?
For me, as you know from all of my other writing, the answer lies in collaborative intelligence.
AI can help retrieve relevant passages. It can summarize large volumes of material. It can compare cases. It can suggest patterns. It can help formulate alternative interpretations. It can support abductive reasoning by helping researchers explore plausible explanations. It can help make the analytical process more systematic.
But the researcher must remain responsible for the framing, interpretation, validation, and final claims. The role of the researcher changes, but it does not disappear.
Instead of manually coding every segment, the researcher works in dialogue with the data and the AI system. The researcher asks better questions, checks the evidence, follows up on contradictions, brings in contextual knowledge, and decides what interpretation is warranted.
This is very different from one-click analysis.
This is also why we have redesigned the QInsights project setup process.
When users create a new project, they are now asked to provide project context from the beginning. They can write or paste the context manually, or upload a setup document.

This may sound like a small interface decision. Methodologically, it is not small. It changes the starting point of the analysis.
Instead of treating the uploaded documents as isolated text, QInsights encourages users to tell the system what the project is about. This can include the research objective, audience, scope, constraints, key questions, and specific instructions the AI should follow.
In other words, the system is not only given data. It is given orientation.
That orientation matters.
It helps the AI distinguish between what is central and what is peripheral. It helps align summaries, theme development, grid analysis, and conversational follow-up with the purpose of the study. It reduces the risk that the system answers a generic version of the task rather than the actual research question.
It also encourages researchers to become more explicit about their own analytical frame.
That is good methodological practice.
Beyond project context, QInsights also allows users to add data context.
This means users can add relevant metadata to documents or speakers: for example, participant role, organization type, country, interview wave, stakeholder group, case type, age group, sector, or any other variable that matters for the analysis.

This is important because qualitative data is rarely just a pile of text.
It is structured by cases, participants, roles, situations, and conditions.
A theme may appear across the whole dataset, but it may mean different things for different groups. A concern raised by managers may not carry the same meaning as the same concern raised by frontline workers. A statement by an interviewer should not be interpreted in the same way as a statement by a participant. A comment from a policy stakeholder may need to be read differently from a comment from a service user.
Data context makes these distinctions visible to the analysis.
It allows researchers to work with subsets, compare groups, ask more precise questions, and avoid flattening the dataset into generic themes.
This is one of the main differences between using AI for generic text processing and using AI for qualitative analysis. Qualitative analysis requires attention to context, case, role, and meaning.
The software should support that.
The car wash exercise may look simple, but it captures a central issue in AI-assisted analysis.
If the AI misunderstands the goal, it may answer the wrong question. If it lacks context, it may infer the wrong frame. If it works only from surface cues, it may produce plausible but shallow results. If the researcher accepts fluent output too quickly, analytical quality is at risk.
This does not mean that AI should not be used in qualitative research. It means it has to be used differently. The researcher needs to remain part of the process.
Not as someone who merely approves an AI-generated answer at the end, but as someone who actively shapes the analysis from the beginning: by defining the project context, specifying the research question, adding relevant metadata, asking follow-up questions, checking evidence, and deciding what the findings mean.
AI can support qualitative analysis. It cannot replace methodological judgment.