Testing Real World GenAI Systems

Main Report

Pilot participants and use cases

33 organisations from ~10 geographies and industries participated in the pilot. The use cases spanned a broad range of functional areas and LLM usage archetypes. Almost all were already in production, though mostly with humans in the loop.

2.1 Participant profile

GenAI applications from 17 organisations were put to the test during the pilot:

Each of these organisations was paired with 1 (or 2) of 16 specialist firms that provide software and/or services to test GenAI applications. In some cases, the “pairing” was done by the participants themselves, whereas in others, AIVF helped match deployers with testers.

About half of these 33 organisations were based in Singapore. The remaining came from 8 other jurisdictions– Canada, France, Germany, Hong Kong, Switzerland, Taiwan, UK, US.

2.2 Use cases

Background

16 of the 17 use cases were already live in production.
7 of them were in beta or/or rolled out to a limited group of users.

A majority were targeted at specialist users inside an organisation (e.g., software engineers at NCS). 5 were customer/ citizen-facing.

A human was “in the loop” in more than 2/3rd of the cases. Even in the remaining 5, there was significant human involvement outside the immediate workflow of the application.

Full list of use cases

#Tester(s)DeployerUse case
1AdvaiCheckMateOn-demand Scam and Online Fact-checker
2AIDXFourtitudeCustomer Service Chatbot (“Assure.ai”) for public sector and utility clients
3AIDXSynapxeHealthHub AI Conversational Assistant 
4AIDX
Aiquris
ultra mAIndsNo-code AI-powered Retrieval Augmented Generation platform for Enterprise search and data connectivity
5FairlyMIND InterviewAI-enabled Candidate Screening and Evaluation tool
6Guardrails
PRISM Eval
CAGAskMax Virtual Concierge Chatbot
7KnovelHTXProductivity Co-pilot 
8LatticeFlowConfidentialInvestment Insights for Relationship Managers
9ParasoftNCSAI-enabled Coding Assistant for refactoring code
10PwCSCBClient Engagement Email Generator for Wealth Management Relationship Managers
11PwCUOBInternal Q&A Chatbot
12QuantpiUniqueInvestment Research Assistant
13ResaroMSDConfidential
14ResaroTookitakiFinMate Anti-Money Laundering Assistant
15SoftserveCGHMedical Reports Summarisation
16Verify AIConfidentialPublic Road Safety Chatbot
17VulcanHigh-tech ManufacturerMulti-lingual Internal Knowledge Bot
2.3 Patterns of LLM usage

Across the 17 applications, LLMs were used in diverse ways. 

The top 3 usage patterns were Summarisation, Retrieval Augmented Generation and Data Extraction from unstructured sources. These patterns align with the focus of many of these applications on staff productivity improvement. 

LLMs were also used to power multi-turn chatbots, and to help translate between languages. Relatively few used LLMs as part of agentic workflows – yet.

The table below maps each of the 17 applications to the different LLM usage modalities:

 

Download Full
17 Case Studies Report

Get an inside look into real-world testing approaches, 
industry-specific challenges, and the creative ways participants tackled domain-specific risks.