Introduction

The AI Verify Foundation (AIVF) is a non-profit subsidiary of Singapore’s Infocomm Media Development Authority (IMDA). Its mission is to support the creation of a trusted AI ecosystem through access to reliable AI testing capabilities.

Together with its parent IMDA, the AIVF launched the Global AI Assurance Pilot in February 2025, to help codify emerging norms and best practices around technical testing of Generative AI (“GenAI”) applications. Existing, real-world GenAI applications were put to the test, pairing organisations that had deployed them with specialist AI testing firms.

1.1 Rationale

The pilot was motivated by three core beliefs:

GenAI can have a massive, positive impact on our society and economy – if it is adopted at scale in public and private sector organisations
Such “real-world” adoption requires GenAI applications to operate at a much higher level of quality and reliability (vs. the general-purpose models that underpin them)
The extensive work underway on AI model safety and capability is necessary, but not sufficient, to help meet that higher bar.

Large Language Models (LLMs) and their multi-modal equivalents are being adopted extensively as personal productivity tools. However, to have real transformational impact, GenAI must get embedded in the public and private sector organisations that drive critical parts of the economy, such as health, finance, utilities and government services.

Using GenAI in such real-world situations, at scale, raises the quality and reliability bar significantly. Two factors account for this difference: Context and Complexity.

Unlike a general purpose LLM chatbot application or personal productivity tool, a GenAI-enabled application must operate in the specific context of a use case, organisation, industry and/or socio-cultural expectations. For example, a GenAI application in a healthcare setting may have very low levels of tolerance for “hallucination” compared to one used as an internal employee helpdesk.
Real-life GenAI applications are also likely to have more layers of complexity. They may use LLMs in conjunction with existing data sources, processes and systems, creating additional potential points of failure beyond the LLM.

Most academic and technology industry efforts around AI testing have tended to focus on Model safety and alignment. A shift is required – from the Safety of Foundation Models to the Reliability of the end-to-end Systems or Applications in which they are embedded.

The pilot was an attempt to start enabling that shift – not through new academic research or technical development, but through real-world experience.

1.2 Objectives

The pilot was launched with 3 target outcomes

Testing norms for GenAI applications

Inputs into future standards for Technical Testing of GenAI applications.

Foundations for a viable assurance market

Greater awareness of the ways in which external assurance can build trust in GenAl applications and enable adoption at scale.
A foundation for potential accreditation programmes in the future

Al testing tool roadmaps

Inputs into the product roadmaps for open source and proprietary Al testing software
Specific focus areas for AIVF’s Moonshot platform

1.3 Ground rules

The pilot had the following ground rules:

The application must involve the use of at least one LLM or multi-modal model
The application must be live or intended to go-live (not Proofs-of-Concept)
The exercise must focus on technical testing (not process compliance)
Testing should be conducted on the GenAI application (not just the underlying foundation model)
Testing must be conducted by an external party – i.e., an organisation different from the one that has built and/or deployed the application

IMDA and AIVF sought no access to the actual results of the technical tests. The focus was on understanding the deployer’s risk assessment, the design and implementation of technical tests against those risks, and the lessons learnt from the exercise.

Testing Real World GenAI Systems

Main Report

Main Report

Introduction

1.1 Rationale

1.2 Objectives

1.3 Ground rules

Download Full 17 Case Studies Report

Main Report

Case Studies Report

© 2025 AI Verify Foundation. All rights reserved.

Testing Real World GenAI Systems

Main Report

Main Report

Introduction

1.1 Rationale

1.2 Objectives

1.3 Ground rules

Download Full 17 Case Studies Report

Main Report

Case Studies Report

© 2025 AI Verify Foundation. All rights reserved.

Download Full 17 Case Studies Report