AI-powered versus human powered qualitative data analysis

To find out how AI can benefit the analysis of qualitative data, I did an experiment based on a text I coded using software that supports qualitative data analysis. This is a bit more advanced than using paper and pencil to code data.

Then I also let AI do the work.

The sample data

The texts I am using for this experiment are comments on a parenting blog and an article published in the New York Times Magazine. The topic is the relationship between having children and happiness. Based on quantitative research findings, there is a negative correlation between having children and various measures related to happiness and life satisfaction. If you are interested in an entertaining summary of these studies, look for the Ted Talk by Dan Gilbert: Happiness: What Your Mother Didn't Tell You (2018 WORLD.MINDS Annual Symposium).

Results of manual coding

When reading through the comments, the topics I came up with when coding the data manually were as follows:

Figure 1: Code system developed based on human powered coding

Positive and negative effects of parenting, reasons for having children and socio-demographics have further subcodes.

Just a note in case you are wondering why I coded for socio-demographics and did not use data attributes. The simple answer is, that there were none. The data are reader comments and not social media data. The latter usually comes with attributes attached. To be able to use data attributes for the analysis, I needed to read and code it based on the information people provided in their comments.

Results of AI coding

When using the same software and applying its AI-powered concept search, the results were as follows:

Figure 2: AI-generated concepts

Based on the suggested concepts, I auto-coded the data and organized them in a code system. Codes for which I did not find a meaningful top-level code were added to the folder “Other”.

The next step was to figure out what was coded by each of these codes. To do this, you need to read the data segments coded by each code.

I did so for the suggested codes AGE and FRIENDS. This way, I learned I could put them under Parenting and renamed them “role of age” and “what friends say”.

Figure 3: Code system developed based on the codes suggested by AI

I continued with the codes TIME and WOLRD. As it turned out, these codes were useless. What was coded there was about lots of different things:

Time:

First-time mother
Around that time
Being a parent is hard at times
I want time with each grandchild
Time spend caring
Time goes by

World:

Traveling the world
Show her the world
My perfect world was gone
Wouldn’t trade it for anything in the world
Selfishness leads to a worse world for everyone

Going through all segments coded with these codes, a few things would cluster – but not necessarily under the label ‘time’ or ‘world’. For example:

My perfect world was gone might end up being coded with 'Negative aspects of becoming a parent'

Wouldn’t trade it for anything in the world might end up being coded with 'Perception of parenthood: worthwhile trade-off'

Selfishness leads to a worse world for everyone might end up being coded with 'Reasons for having children'

This means building new categories from scratch. To do this based on the AI pre-coded data is much more difficult as you don’t know where other similar data segments are – maybe under the code ‘child’, or ‘mom’ or ‘marriage’? You have no idea. So basically, you end up reading everything anyway. This defeats the purpose. The point of using AI, at least this is the claim that is made, is to save time.

Let’t take a look how a coded text looks like in both cases:

Manually coded text

Figure 4: Manually coded text

AI-coded text

Figure 5: Auto-coded text based on the concepts suggested by AI

One thing that you probably quickly notice is that the manually coded text is more nuanced, only the relevant areas are coded. AI always codes the entire paragraph with everything that it finds.

When organizing the AI-generates codes, I just went by the label. Therefore, I sorted HAPPINESS under ‘positives’. When looking inside the code, it turns out that classifying based on the suggested concept label is not possible. The respondent uses the word happiness, but not in a positive way. He questions the notion of happiness as it relates to children, and says having a daughter probably made him less happy.

Thus, the initial coding done automatically is just a starting point. To come up with a code-system that fits the data, it is necessary to go through all auto-coded segments. For most segments, the original auto-code is not a good fit and needs to be adjusted.

The AI tool suggested lots of codes around roles like mother, father, spouse, etc. But is this helpful? What is this segment about? What are they talking about? About themselves, their own parents? About not being prepared for parenthood, about their work, career, nights they cannot sleep, their worries, the financial impact, their joys, other sources of happiness?

In order to find out, I would have to read all of what is coded under each of these 'concepts' and as a result I would need to do lots of re-coding. As I was promised that the new AI-tools will save me time, I don't want to do this. Instead I try out another option.

The software allows me to see which concepts relate to each other:

Figure 6: Relating AI generated concepts to each other

So I learn that husband relates to child, boy, girl, teenager, daughter, mom, parenthood, kid, happiness, joy, and so on. This doesn't give me more insights. Above we already learned that I cannot assume that the data coded under happiness have a positive connotation. This also might apply to joy.

There is no way around, if I want to find out anything meaningful, I have to read the data, correct the coding and build new topics and categories that describe the data more accurately. With such a small data set, this is of course possible. It will, however, take as long as coding the data from scratch. This defeats the purpose.

I know of other tools that learn based on the adjustments you make to a code and apply them to other coded segments. But as we have seen from the above example with TIME and WORLD, it is not a matter of all other segments being moved to another category. Some of the codings are irrelevant and need to be removed. Others will end up in very different categories.

The promise is ‘that the software does the heavy lifting -- that it allows you to generate deep insights automatically by leveraging the latest AI and machine learning algorithms for faster results.’

Judge for yourself. You can guess my answer.

Asking an alternative AI-powered tool

For comparison, I used another AI tool to generate codes. These were the topics and themes it came up with:

Figure 7: Topics and themes generated by another AI tool

If you compare figure 7 to figure 3, the generated topics are similar.

The themes appear to consider more context. Still, we get apples and fish in one basket. The data behind LESS HAPPY were about criticizing the study designs, about other people but not me, suggesting that it would be more important to talk about joy and not ’happiness’, and others.

When coding the data myself, I picked up on other topics that both AI tools didn’t recognize -- like sources of happiness, the definition of happiness, reasons for having and not having kids, or people’s attitudes about the relationship between happiness and children.

When reading through the data, a human realizes this is someone’s definition of happiness, or someone talks about sources of happiness. Most of the time, this does not entail the explicit use of the word 'definition' or 'source'. And if not, the AI-powered tools cannot make the connection.

One can only hope that the proposed coding is more accurate with larger data sets, as it would be impossible to go through all of it manually to correct it.

Can Chat GPT help?

I added some of the original data sources to Chat GPT and asked it to categorize the data. I don’t have a pro version so I am limited on how much data I can add.

This is what Chat GPT came up with:

Sure, I can categorize these posts for you. Based on their content, I would categorize these posts under the topic of parenting and, more specifically, on the subject of how having children affects happiness and overall life satisfaction. Some of the posts express a positive perspective on parenthood, while others are more ambivalent or even negative. Some discuss the challenges and sacrifices associated with parenting, while others focus on the joys and rewards.

Additionally, some posts reference external sources, such as research studies and books, to support their arguments.

This is not bad. If you want a high-level summary of the data, Chat GPT does a good job. You however miss-out on the detail. To learn, for instance, how children affect happiness or what the sacrifices are, you still would need to read the data.

Another option is to ask Chat GPT to give you a categorization of each paragraph:

Personal experience and reflection on the challenges and benefits of becoming a mother
Personal experience on the difficulty and joy of raising a child
Discussion of a research article on the effect of planned vs unplanned parenthood on marital satisfaction
Personal reflection on overcoming fear and finding fulfilment in parenthood
Question about the scope of research on happiness and parenting
Opinion on the emotional burden of expecting children to bring happiness
Personal reflection on the richness and challenges of parenthood and the deep love for a partner
Speculation on the biological drive to reproduce and the gratification of fulfilling that drive
Personal experience on the happiness of raising two boys

It is not the same as coding the text, as you don’t have a link to the original source data. You only get the summaries. But, if you don’t have much time to read a text, this gives you a good idea of what the text is all about.

Using CHAT GPT to summarize coded data

If you use a coding approach to qualitative data analysis, a common step is to summarize the data based on a code or a code query. Let’s see what we get if we make use of Chat GPT, summarizing the AI-coded data and the manually coded data.

As an example, I use the concept ‘parenting’ as it was coded by both the AI-tool and the human coder. To get a more nuanced summary, I related the code to sentiment. This also allows us to see how good the coding for the sentiment was:

Summary of: Parenting (negative sentiment)

The first paragraph expresses support for research suggesting that having children does not necessarily make people happy and criticizes the idea that not having children is selfish. The second paragraph argues that parenting is fulfilling, despite the challenges it presents. The third paragraph questions whether research on parenting and happiness considers the experiences of never-married people. The fourth paragraph discusses the positive and negative aspects of parenting and how it can help individuals mature. The fifth paragraph laments the increasing anxiety and regimentation surrounding parenting and asks whether children are better off today. The sixth paragraph suggests that people who hate parenting may be ignorant or lazy, and a lack of focus on individual happiness in past generations may have contributed to greater fulfilment in parenting.

Summary of: Parenting (neutral sentiment)

The data contains three different perspectives on the relationship between parenting and happiness. The first perspective suggests that the studies indicating parents are less happy than childless people may not necessarily imply causation. Instead, happy people may be more likely to avoid having children, while unhappy people may feel the need to have them. The second perspective is that the definition of happiness is vague, and it is more about self-realization and feeling whole, even though it is not about continuous joy. The third perspective describes personal experiences of being a parent, which is often challenging but ultimately rewarding, and parents can find personal growth and fulfilment in the process. Overall, while parenting may not be a job for everyone, the rewards are difficult to put into words and can only be appreciated once experienced.

Summary of: Parenting (positive sentiment)

The author discusses the paradox that while parenting is statistically associated with less moment-to-moment feelings of happiness, it is also associated with greater overall meaning and purpose in life. The author argues that loving one's children and loving the act of parenting is not the same thing. The author's personal experience with adoption is mentioned, and they reflect on how parenting can help one become a wiser, more understanding, and more compassionate person. The comments section shows a range of opinions on parenting and the article, with some readers focusing on the overall meaning and purpose of parenting, while others highlight the challenges and potential regret associated with parenting.

The comments on the parenting blog discuss the challenges and rewards of parenting. The article being commented on notes that while studies show that parenting can lead to less moment-to-moment happiness, it can also provide greater overall meaning and purpose. Some commenters reflect on their decision to have or not have children, while others discuss the difficulty of measuring the benefits of parenting. One commenter suggests that working towards a productive and purposeful life can lead to happiness, while another notes that the experience of parenting can be both challenging and rewarding. Overall, the comments reflect a variety of perspectives on the joys and difficulties of parenting.

When evaluating the overall results on sentiment, the fit is not bad. The first summary has a much more negative tone to it than the last. So thumbs up for that.

For comparison, below, you find a summary of the manually coded data.

Negative effects of parenting

Summary: The provided data includes quotations from different individuals who share their thoughts and experiences about having children. The overall theme is about the impact of having children on people's happiness and quality of life.

Some people express that having children has reduced their overall happiness, increased their stress and created more conflict with their partners, particularly regarding the division of labour. Others mention that they love their children but acknowledge the sacrifices they have made and the significant change in their lifestyle. They also recognize that having children adds more responsibilities and reduces the freedom they had before.

Additionally, some quotes suggest that societal and cultural expectations contribute to the perception that having children is the only way to achieve happiness or fulfilment. Others suggest that more significant support and accommodation from employers and partners could make a significant difference in improving the work-life balance of parents.

Overall, the data suggests that having children can lead to mixed feelings and a significant change in the quality of life, depending on the individuals' circumstances and support systems.

Positive effects of parenting

Summary: The quotations all express a range of perspectives on the experience of parenthood. Although some admit that it is not always easy and that there are moments of frustration, most consider being a parent to be a fulfilling and meaningful experience that adds richness to life. Many also note the impact it has had on their personal growth and relationships with others. Some indicate that it has made them more responsible and less selfish, and others say it has made them part of a community. They suggest that the joy and fulfilment that come from being a parent outweigh the difficulties and that the experience is worth it, even though it may not always result in happiness.

As can be expected, there is more focus on the experience itself and more detail on what precisely the positive and negative effects of parenting are. The AI-coded data on parenting encompass all kinds of topics that loosely fit under the umbrella term 'parenting'.

Let's try one more experiment to figure out what the AI-tools have coded under HUSBAND. Both tools suggested this as a relevant concept. CHAT GPT summarizes as follows:

The given text contains a variety of viewpoints on the topic of having children and how it can impact happiness. One person explains that they were hesitant to have children as it could ruin their plans for success, but they eventually did, and it was the best thing they ever did. Another person discusses how they have chosen not to have children and how they believe it is not necessary for happiness. They also highlight that having children can waste resources and that adopting is a more responsible choice. Others describe their own experiences with having children and how it has brought them happiness despite the challenges. Some individuals discuss the hardships of parenthood, while others mention the emotional attachment and hurt that come with close human relationships in general. The text presents a range of opinions on the topic of having children and how it relates to happiness.

This is not at all about husbands. You probably would get a very similar summary if CHAT GPT would summarize all texts, or the extracted segments of other codes. The likely reason is that each paragraph is coded with many codes and there is lots of repetition.

Intermediate conclusion

I am not so impressed with the current capabilities of AI to code data.

I am impressed with the summaries that chat gpt delivers. This will certainly speed up analysis after you have coded the data.

The categorization offered by chat gpt is not a replacement for coding. What you get is a high-level summary of the data. If you want a quick overview of what a text or an interview is all about, this might be good enough. It all depends on the purpose.

What AI nor chat gpt can’t do

I already mentioned that AI does not pick up on all relevant topics. If you manually go through all of the auto-coded data, adjust and adapt, you probably will find those topics and will ad them as codes. But, as mentioned, this defeats the purpose.

Finding relevant socio-demographics, if not available otherwise, is also a task for a human. When someone wrote, I am a mom or a father, I coded this as having a child. Some people also wrote about how many children they have. Searching for words like mom, dad, father, mother and auto-code would not speed up the process as someone might talk about their mother and not being a mother, etc.

I also coded for attitude. To do so, it is necessary to evaluate the entire statement in context. Does someone believe children make them happy or unhappy, or do they think there is no relationship?

When it comes to relating different aspects of the data to each other, this also goes beyond what AI can currently do. The way I coded the data manually allowed me to look at how attitude relates to the experience of parenthood:

Figure 8: Relationship between attitude and experience of parenthood

The data show that the more positive the attitude, the more positive the experience of parenthood. Is this a self-fulfilling prophecy? To learn more, a look at the various comments will probably give some insights.

The coding also allows for examining further linkages in the data. The figure below shows that parents with more than one child are more optimistic about parenting than those with only one child. They also have a stronger belief that children make them happy.

Thus, do their children make them indeed happier? Another explanation could be that the fact that they have more than one child leads to an attitude change. As a person, we don’t want to be contradictory. In psychology, this is described as self-consistency bias.

Figure 9: Relationship between the number of children and experience of parenthood

If you are in marketing, you might think there is no need to go so deep into the data. Let me challenge you: Because you are in marketing, couldn’t you think of just the right product, or a positioning of a product that confirms parents in their choice, that help them to live happily ever after with their self-consistency bias?

Conclusion

CHAT GPT can be put to good use in summarizing data. A current limitation is the amount of data you can add. In the future, there might be applications that will allow for larger amounts of text to be summarized, e.g. an entire document or interview. Currently, it helps summarize subsets of data, e.g. extracted based on coding.

When it comes to coding, AI-coded data give you a superficial idea of what topics there are in the data, whereby many topics are irrelevant. The AI-tools are still based on words used in the text and not meaning. You need to be aware that this is what you get.

If you have a regular-sized qualitative data set, e.g. a few focus groups, or 10-30 interviews, the AI-powered tools don’t help you to save time. Whatever is suggested as concepts do not fit the data well. Lots of time needs to be spent on adjusting and cleaning up.

When you have a large text-based data set, you have no choice but use AI to classify the data. The question I would like to raise is – Is it necessary to collect so much qualitative data?

At the recent #360Qual Marketing Research conference, I learned about a few insane qualitative data sets that were collected. For example, at Phillip Morris they collected 10.000 interviews. Researchers at Ikea do 1000s of home visits and end up with an enormous amount of data they have no idea how to analyse.

What is the purpose of collecting qualitative data at a quantitative scale? Is this question ever raised? Do decision makers think they get better results?

The truth is they won't – you can analyse such data at a very superficial level only, also when involving current AI technology.

And sometimes you only need a little bit of data and lots of right brain to find the answers to a problem.

This is what I will be writing about in the next issue of this blog.