Insights from the Latest Causal Map Team Validation Study

Qualitative data analysis stands at a turning point. Whilst evaluators have spent decades painstakingly extracting causal claims from interviews, reports and stakeholder consultations, artificial intelligence now promises to fundamentally transform this time-consuming process. Yet the central question remains: can we really trust this new digital assistant?

The Causal Map Team led by Steve Powell, Gabriele Caldas Cabral and Hannah Mishan has tackled precisely this question. Their validation study “AI-assisted causal mapping: a validation study,” published just now in the International Journal of Social Research Methodology, offers encouraging answers—and important nuances.^1 ^3

The Challenge: Systematically Identifying Causal Claims

Anyone who has worked through hundreds of pages of interview transcripts searching for causal relationships knows the problem: the work is not only time-intensive but also prone to inconsistencies. Different coders identify different causal connections, miss subtle cues or apply coding rules unevenly.^4

Traditional qualitative data analysis (QDA) focuses on thematic connections, often established retrospectively. Causal mapping takes a different approach: it systematically identifies every single causal claim in the text—for example, “staff turnover led to implementation delays”—and transparently documents the source and context.^5

The Approach: AI as a Transparent Low-Level Assistant

The key principle underlying the study lies in its philosophical positioning: the researchers deliberately do not want to develop a “black box” that automatically constructs a complete causal model. Instead, they deploy large language models (LLMs) as transparent assistants for clearly defined, low-level tasks.^1

This distinction is fundamental for evaluation practice. Whilst black-box models undermine trust and auditability, the Causal Map Team’s approach enables complete traceability: every identified causal connection is linked to the original quotation and source.^6 ^4

The researchers do not ask whether AI can model an entire system, but rather: Is the ability of current AI models to identify causal claims in texts of sufficient quality to be useful?^1

The Methodology: A Two-Stage Process with Human Validation

The AI-assisted workflow comprises two core functions:^4

1. Extraction: The AI identifies causal statements in the text and extracts cause-effect pairs with source attribution. Examples include direct causality (“training improved data quality”), indirect effects (“delayed payments led to partner withdrawal”) and complex relationships (“market downturn and drought together drove migration”).^4

2. Structured Summary: The AI organises findings into clear formats, groups similar causal connections and counts frequencies to show the strength of evidence.^8

Crucially: the human evaluator remains the analyst. The AI delivers structured data outputs, but interpretive and evaluative work—judging whether a causal relationship actually holds—remains with the subject matter expert.^9

The Results: 85–95% Accuracy on Explicit Claims

The validation results are remarkable. Studies by the Bath SDR Team show that AI-assisted causal mapping systems achieve 85–95% accuracy on explicit causal claims when combined with human validation.^4

The AI demonstrates particular strengths in four areas:^4

Exhaustive scanning: The AI misses no explicit causal statements, reducing the risk of overlooked insights
Perfect consistency: Coding rules are applied uniformly across all data
Rapid processing: Large volumes of text can be analysed quickly
Complete transparency: Every identified connection is accompanied by traceable source links

Importantly, there is a critical distinction: the AI performs particularly well on explicit causal formulations. With implicit or context-dependent causal claims, performance improves with more precise prompts but remains more challenging.^10

The Validity Question: No Automatic Causal Inference

The study draws a crucial methodological distinction: causal mapping does not automatically validate causal relationships. It identifies and organises evidence for causal connections, which must then be evaluated by evaluators.^11

As the team emphasises: “Causal mapping distinguishes carefully between evidence for a causal link and the causal link itself”. The approach focuses on assessing the strength of evidence for each causal relationship or pathway—a task that is relatively straightforward to automate.^9

This strengthens several validity dimensions:^4

Construct validity: Clear operational definitions of causal relationships are enforced
Reliability: Agreement between team coders increases
Transparency: An auditable trail from data to conclusions emerges
Credibility: Systematic, exhaustive analyses become demonstrably robust

The Recommended Workflow: Five Steps for Rigorous Analysis

Infographic: AI-Assisted Causal Mapping: Five-Step Workflow and Key Findings

For MEL practitioners, the team proposes a structured five-step process:^4

Step 1: Prepare data
Clean and anonymise interview transcripts. Remove identifying information. Segment texts into sections of 3–5 paragraphs to improve AI accuracy and protect confidentiality.

Step 2: Extract claims with AI
Use a clear prompt template: “Identify all causal claims. For each, provide: cause, effect, source attribution and type (explicit/implicit/conditional).”

Step 3: Summarise evidence
Ask the AI to group similar causal connections and count frequencies. This highlights well-supported pathways and evidence gaps.

Step 4: Human validation
Evaluators review every extracted claim. Check accuracy against source text. Assess contextual relevance. This quality assurance step cannot be skipped.^4

Step 5: Build the causal map
Import validated connections into mapping software and develop and refine the visual causal map with stakeholder input.

Risks and Ethical Considerations: The Limits of Automation

The study does not shy away from risks. Three central challenges are identified:^12 ^1

1. Trust without critical scrutiny
Without rigorous human validation, LLM-assisted analysis risks “sloppy” research that appears rigorous but is not trustworthy. Researchers might uncritically accept AI outputs instead of engaging deeply with their data.^10

2. Bias and hallucinations
AI models can amplify biases present in training data. They can also fabricate plausible but unverifiable interpretations. Automated qualitative software tools are influenced by data, nuances and biases embedded through algorithmic logic.^7 ^12

3. Limited interpretability
Whilst the causal mapping procedure itself is transparent, the internal mechanisms of LLMs remain a black box. This requires additional reflexivity and documented justifications.^12

The researchers therefore emphasise: Automated coding is error-prone/noisy and may have deep biases. The solution lies in combining AI efficiency with human expertise and epistemological reflexivity.^11

Implications for Evaluation Practice: Human–AI Partnership

What does this mean for evaluators? The study delivers three core messages:

AI does not replace evaluators—it empowers them. The technology takes on time-intensive extraction tasks, creating space for deeper interpretation and strategic analysis.^14 ^4

Rigour requires methodological discipline. Successful implementation needs careful methodology, human oversight and ethical safeguards. Prompts, validation steps and limitations must be documented in methods sections.^10 ^4

Transparency is not optional. In an era of algorithmic decision-making, evaluation systems must offer traceable audit trails demonstrating how data leads to conclusions.^15 ^4

Outlook: The Future of Causal Analysis

The Causal Map Team’s validation study marks an important milestone. It demonstrates that AI-assisted causal mapping, when used responsibly, can strengthen analytical rigour and transparency. The technology enables exhaustive analysis of large qualitative datasets whilst maintaining consistent standards.^4

At the same time, the study underscores that these tools are only as trustworthy as the people who deploy them. The future of evaluation lies not in replacing human expertise with algorithms, but in a productive human–AI partnership that combines the strengths of both: the pattern recognition and scalability of AI with the contextual understanding, ethical judgement and interpretive depth of human evaluators.^10

The central question is thus answered: Yes, we can trust AI as a causal mapping assistant—provided we deploy it for what it is: a powerful tool that supports human analysis but never replaces it.

About the Study: Powell, S., Caldas Cabral, G., \& Mishan, H. (2025). AI-assisted causal mapping: a validation study. International Journal of Social Research Methodology. DOI: 10.1080/13645579.2025.2591157

Full text available at: https://www.tandfonline.com/eprint/5HC4P3MMEM3GMXXNN2VM/full?target=10.1080/13645579.2025.2591157
Preprint on ResearchGate: https://www.researchgate.net/publication/398379200_AI-assisted_causal_mapping_a_validation_study

⁂

[^11]: https://www.betterevaluation.org/sites/default/files/2024-05/Causal Pathways introductory session Causal mapping.pdf