Oct 1, 2024
AI in Consulting Practice #4: Measuring = learning
Dennie van den Biggelaar, Onesurance, in Ken je vak!, VVP 4-2024
In this fourth part of the series AI in Consulting Practice, we focus on a crucial aspect: how do you know and measure whether your AI system is actually doing what it is supposed to do? In the first part (VVP 1, 2024), AI strategist Dennie van den Biggelaar showed how to get started with Machine Learning (a specific part of AI), in the second part (VVP 2) how to operationalize AI in your business processes, and in the third part (VVP 3), Integrating AI software into existing IT landscapes was the focus.
Measuring the effectiveness of an AI application starts with defining clear "business KPIs. These KPIs are essential because they provide direction on what aspects of your business you want to improve and how to make these improvements measurable. For an insurance company, for example, these goals might include increasing revenue, improving retention, increasing policy density or increasing STP acceptance. Establishing these KPIs provides a framework for both developing and evaluating the AI application.
Man and machine
In practice, AI applications often work together with human experts. Therefore, it is important to measure the performance of both AI and humans separately and together. This provides insight into the effectiveness of the collaboration and helps you determine where improvements can be made.
Example: Active customer management: Suppose you have an AI algorithm that identifies customers with a high probability of churn. If the inside sales or advisor does not adequately follow up on these signals, the intended reduction in churn may not happen. By measuring performance by employee, you can discover whether certain employees are achieving better results than others. These insights can then be shared to strengthen the team as a whole.
Technical performance
To assess the technical performance of a predictive algorithm, several indicators are used: accuracy or precision (indicates how often the algorithm makes the correct prediction), precision (this measure looks specifically at the reliability of positive predictions, sensitivity (this measures how well an AI model is able to detect all relevant outcomes), Area Under Curve (provides an overview of the model's prediction quality across various thresholds) and Log Loss or Logarithmic Loss (this measures how close the predicted probabilities are to the actual outcomes).
In addition to these indicators, speed, efficiency and scalability are important. Speed, or latency, determines how quickly the AI application responds to a request. Efficiency is measured by the application's memory consumption, and scalability is judged by the amount of predictions realized within a given time (throughput). These factors provide an assessment of an algorithm's scalability.
Robust and ethical
An AI application must not only perform well technically, but also be robust and ethical. This includes the ability of the model to continue to perform well even if the input data or environment changes (model drift and shift). In addition, the model must be sensitive to changes in the data on which it is trained (data drift and shift). Ethical considerations, such as avoiding discrimination based on gender, ethnicity or age, are also crucial to ensure that the AI operates fairly and responsibly.
'Measuring effectiveness of an AI application is a complex but necessary process'
Uptime and reliability
As with any cloud-based application, the uptime of an AI application is critical, especially in production environments. A common standard in a Service Level Agreement (SLA) is an uptime of 99.9 percent. This means that out of every 1,000 interactions with the application, no more than one should go wrong. To ensure this reliability, a backup application is often deployed to take over in the event of an outage.
From prototype to production
Setting up an AI application is a step-by-step process. In the prototype phase, the main focus is on testing the predictability of the algorithm and minimizing any discrimination. If the AI application passes these tests, the next step is to assess whether the application actually improves the desired business KPIs. The scalability of the model is also considered at this stage.
Once the AI is in production, the focus shifts to ensuring uptime and monitoring its robustness over time. By systematically measuring and evaluating, you can continuously improve and ensure that your AI application is doing what it needs to do, now and in the future.
Measuring impact
One of the most effective methods of measuring whether an AI application is producing the desired results is through A/B testing. This involves randomly dividing the target audience into two groups: one group (Group A) uses the new AI application, while the other group (Group B) uses the traditional method or an earlier version of the system without AI. By comparing the performance of the two groups, you can determine how effective the AI is in improving the business KPIs.
The success of an AI application depends heavily on how the insights from A/B testing are integrated into business operations. For example, if an A/B test shows that a particular AI tool leads to higher policy density, this may prompt a wider rollout of the tool within the organization.
Effective
Measuring the effectiveness of an AI application is a complex but necessary process. It starts with defining clear business KPIs and evaluating both technical performance and human-machine collaboration. Robustness, ethical considerations and uptime are as important as algorithm predictability. Moreover, by using A/B testing, you can reliably determine whether the AI application actually contributes to achieving your business goals. It is essential that it not only performs well technically, but also contributes effectively to improving your business results.
The original article was published in the VVP, read here the article online.


