The Making of 'Lonely in Rotterdam' - AI, Synthetic Interviews, and the Validation Problem

There is a particular kind of problem that only conference organisers understand: the gap. Someone drops out, a session runs short, the schedule you've been carefully building for weeks suddenly has a hole in it, and you're the one who has to fill it.

That is, more or less, how I ended up running a qualitative research study in two days.

First, some context: what is AIAgents4Qual?

AIAgents4Qual 2026 is the first open conference where AI serves as both primary author and reviewer of research papers — an experiment in what happens when generative AI takes the lead in qualitative inquiry. AI Agents4Qual 2026

The idea is genuinely provocative: papers must be predominantly authored by AI systems, and each submission must include an autoethnographic reflection — a critical account of what happened when you handed agency over, where you resisted, intervened, or let go.

The conference takes place on March 13, 2026, as a one-day virtual event AIAgents4Qual. If you read this before April 13, 2026, you can still join.

We run from 8:30 AM to 23:30 CET — designed to cover multiple time zones around the globe. The conference platform stays open until April 13, 2026. Same agenda, same sessions. Pick what interests you and watch on demand, at your own pace.

Why “Lonely in Rotterdam”

A dear colleague of mine — Steve Powell — submitted a paper to the conference called Lonely in London. It is a fascinating piece: he gave an AI system (Cursor, a code editor with an embedded agent) a set of real interview transcripts about loneliness among young adults in deprived London boroughs, handed it some methods papers, and told it to develop and apply its own thematic analysis methodology.

The AI planned the workflow, carried it out, updated its own instruction files as it went, and produced the final paper — essentially without Steve in the room.

When it looked like that there would be a gap in the conference programme, I thought: what if we did a session together? A kind of methodological conversation between two cities, two pipelines, two approaches to the same question.

I live near Rotterdam. Hence: Lonely in Rotterdam.

That was the origin. A gap in the programme, a colleague's paper, and the thought that it would be fun — and probably interesting — to put the two side by side.

What I actually did

I modelled the study by Fardghassemi and Joffe (2021, 2022) that Steve was reporting on. As I had no data, I had to produce data, following the description provided by Fardghassemi and Joffe:

Part 1 of the data collection procedure involved a free-association “grid” task: participants were given a sheet with four empty boxes and asked to express what they associated with “the experience of loneliness” using images and/or words, one idea per box.

They then elaborated each box in turn in an interview beginning with: “Can we talk about what you have put in box 1, please?”, with minimal prompts such as “can you tell me more about that?” to reduce content injection.

Part 1 interviews lasted ~60 minutes on average and most took place in participants’ homes (some took place in local cafés, parks, or similar when home was not an option).

Part 2 followed immediately after Part 1. Participants wrote or drew one neighbourhood place where they felt most socially connected and one where they felt most lonely and wrote why beneath each.

They then elaborated each place in a short interview using the same low-injection prompting style (e.g., “How does that make you feel in this space?”). Part 2 interviews lasted ~20–30 minutes

So, I asked Claude and Gemini to produce 10 synthetic interviews each, following this instruction.

To give you an idea what kind of images the free association tasked produced, here is the visualization from one of the synthetic respondents:

Making Lonely in Rotterdam: AI, Synthetic Interviews, and the Validation Problem

The literature review was conducted by GPT 5.2. For the analysis I used QInsights and followed my CA^AI approach (Friese, 2026, what else 😊), and NotebookLM handled conceptual visualisation.

I want to point out that the way I used QInsights in this study departs from its intended design philosophy. If you are familiar with my writing and perspective, you know that QInsights is built around the conviction that qualitative analysis is a fundamentally human interpretive act: the platform is designed to support and augment the researcher's thinking, not to replace it. In normal use, the human researcher remains in the loop at every stage — formulating questions, evaluating outputs, redirecting the analysis, and bringing interpretive judgment to bear on what the AI surfaces.

It is technically possible in QInsights to delegate the full analysis to Q, the platform's AI assistant, without substantive human intervention. For this study, that is precisely what I did — deliberately, and as part of the experiment.

What delegation produced

The fully delegated analysis — thematic pass run by QInsights without researcher intervention, guided analysis questions accepted without modification — produced output that was coherent, well-structured, and in several respects analytically sharp.

What struck me first was how much the delegated analysis produced. The fully automated pass generated output that was coherent, readable, and at moments genuinely sharp. Concepts such as drukke eenzaamheid, bestaansmoeheid, the distinction between being unread and misread, and the causal chain linking in-betweenness, conditional recognition, and shame were not mine. I did not plant them in the data or write them into the analytic prompts. They emerged from an AI-to-AI process between synthetic interviews and the analytic system.

Here is the visualization of the findings created by NotebookLM:

The Architecture of Invisible Distance — conceptual diagram of the proximity-recognition-shame mechanism, generated by NotebookLM from the QInsights synthesis.

I found that impressive, but also unsettling. The output arrived with the calm confidence of something already finished.

It looked like analysis. It sounded like analysis. In places, it even felt better articulated than many first human drafts would be.

That is exactly why it needs caution. Plausibility can be dangerously persuasive. I could feel how easy it would be to accept these outputs because they were so fluent, so organised, so ready to travel into a paper.

But I also knew, while reading them, that something important was missing. I had not done the kind of close, resistant reading that qualitative work usually demands.

I had not sat with the interviews line by line. I had not tested the summaries against the grain of the material. I had not asked, slowly and repeatedly, whether the interpretation really held.

I moved through the material quickly because that was the point of the experiment. Yet the speed of the process also made the limitation very clear. What I had in front of me was not validated interpretation. It was first-pass analytic output generated under conditions of delegation.

That distinction matters to me. I do not see that as a failure of the system. I see it as a precise description of what this pipeline can do and what it cannot do. It can generate plausible interpretations. It cannot, on its own, establish whether those interpretations deserve trust.

For me, that is the ethical centre of the experiment. The results in this paper are not findings about actual young adults in Rotterdam. They are products of synthetic data and automated analysis. Treating them as if they were straightforward empirical results would be misleading.

The feeling of wanting to dig deeper

What surprised me most was that the stronger the output became, the more I wanted to go back and spend real time with the data.

That was not because I thought the analysis was weak. Quite the opposite. The analysis was good enough to make the missing depth visible.

Some of the images in the material stayed with me: dead phones, bridges, the sense of moving like a ghost through one’s own flat, the orange peel, the grandmother’s house. Those are not just colourful details. They are invitations. They suggest lives, atmospheres, textures of feeling that a fast thematic pass can only touch lightly.

I found myself wanting to slow down, to read case by case, to follow a metaphor further, to ask why one image surfaced rather than another, to understand what a particular formulation was doing in the context of one life rather than across a corpus.

In other words, the automated analysis did not make me feel that the work was done. It made me feel the pull of deeper interpretive work more strongly. That, to me, was one of the most interesting effects of the whole exercise.

Orchestration, not delegation

I also became increasingly aware that “delegation” is not quite the right word for what I did. I delegated substantial parts of the work, yes, but I was not absent.

I designed the study, shaped the prompts, decided what kind of data to generate, chose which outputs to pursue, rejected formulations that did not feel right, redirected when something seemed thin or off, and shaped the final argument of the paper.

The systems produced the wording, the structures, the syntheses, and much of the text. But I remained responsible for the framing, the judgement, and the decisions about what counted as acceptable.

That leaves me with an uneasy but productive question about authorship.

I did not write this paper in the conventional sense. It would feel dishonest to claim that I did. Yet I also did more than simply receive a machine-generated result.

My role shifted from writer to orchestrator, editor, judge, and designer of the analytic situation. That is real intellectual labour, even if it is not the same kind of labour that academic writing has usually required.

I do not think the vocabulary for this has settled yet, and I am not sure it should settle too quickly.

Two days

The fact that the entire process took about two days is, in one sense, astonishing. In another sense, it is precisely why the experiment needs to be handled carefully.

Two days is enough to show that AI systems can generate a synthetic corpus, analyse it, and help produce a paper that is coherent and conceptually interesting.

Two days is not enough to produce the kind of grounded, validated, policy-relevant qualitative knowledge that research about real communities should aim for. That kind of work requires time, resistance, checking, and accountability to people whose lives are actually at stake.

So, the question I am left with is not whether this pipeline “works.” In one sense, it clearly does.

The more important question is what kind of knowledge it produces. My answer is that it produces possibility rather than confirmation. It opens a research agenda rather than closing one.

It shows that AI-led qualitative inquiry can produce outputs that are analytically suggestive and methodologically provocative. But it cannot tell us whether any of this is true of Rotterdam, because there were no real respondents, no lived experiences, and no participants who could push back against the interpretation.

That is why I see the next step as comparative. The same protocol should be used with real interviews, and the synthetic and empirical corpora should then be placed side by side. Where do they converge? Where do they diverge? What patterns appear in both, and what belongs only to the models? That, to me, is where the real methodological value now lies.

What this paper opens, it cannot close. I do not see that as a weakness. I see it as the most honest conclusion I can draw.

The validation question remains the most pressing

One thing the presented pipeline cannot tell us is whether any of this is accurate.

The study produces a plausible account of loneliness among young adults in Rotterdam's deprived neighbourhoods — named mechanisms, a causal chain, neighbourhood variation.

But there are no real respondents to check it against, no domain expert to contest the interpretation, and no policy claim that would survive that scrutiny without one.

The appropriate next step would be a comparative study: generate the synthetic corpus, conduct real interviews using the same protocol, and examine systematically where the two converge and where they diverge.

Convergence would suggest that AI training data encodes something real about the lived experience of loneliness in this context.

Divergence would tell us what the models got wrong, and why — which is equally valuable knowledge. That study would be methodologically significant regardless of its outcome.

References

Fardghassemi, S., & Joffe, H. (2021). Young adults’ experience of loneliness in London’s most deprived areas. Frontiers in Psychology. https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2021.660791/full

Fardghassemi, S., & Joffe, H. (2022). The causes of loneliness: The perspective of young adults in London’s most deprived areas. PLOS One. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0264638

Friese, S. (2026). From Coding to Conversation: A New Methodological Framework for AI-Assisted Qualitative Analysis. Qualitative Inquiry, 0(0). https://doi.org/10.1177/10778004251412

Powell, St. (2026). Lonely in London -- a qualitative analysis written entirely by an AI which iteratively edited its own instructions. Presentation at AIAgents4Qual, 13 March 2026.