|

November 28, 2025

|

6 min. read

Part 2: Calculating Instead of Guessing – What Reliable AI Systems Look Like

In the first part of this series, we explained why the foundation for the high failure rate of AI projects is often laid in the decision-making phase for AI: Companies ask the wrong questions. They ask “What can we use AI for?” instead of “What problem do we want to solve with AI?”.

But even if the right questions are asked in the right order, a serious problem remains – and that is the subject of this second part:

How do you ensure that AI systems are reliable?

This is THE critical question, especially for financial companies, but also for all other companies juggling risks and compliance requirements.

The Reliability Problem: Why Generative AI Alone Cannot Be the Answer

In April 2025, BCG published an insightful Briefing on Artificial Intelligence and in particular on “AI Agents and the Model Context Protocol”. The central finding is worrying: the reliability of fully autonomous AI agents is rated as “low-medium”.

And even more importantly, reliability decreases drastically with the scope of the task. After one hour of running time, we are down to 50% reliability. At over 4 hours, only 10%.

This is not a theoretical gimmick. In concrete terms, this means that you cannot simply integrate generative AI into your core processes and hope that it will work. Especially not in an end-to-end agent system with many sequential steps. Risk assessment in the financial sector, for example is always qualitative and quantitativequantitative.

Why Generative AI Is Not Reliable Per Se

Quantitative analyses can Generative AI not. It is not reliable per se. That sounds provocative, but it’s a fact. Why?

Because the input in your company is something that AI has never seen before. Your data. Your organization. Your processes. Your expectations. None of this is included in the model’s training data.

And with generative AI systems, it works like this: they deliver what sounds plausible, not what is true. They are “made to please you”. They are trained to be as helpful and satisfactory as possible – even if the facts are different.

This is the phenomenon of “hallucinations” that everyone is now familiar with. A model invents facts. It cites literature references that it has never read. It assures you that something is so, even though the opposite is the case – with absolute conviction.

Now you may be asking yourself: But these hallucinations are known, why aren’t they simply corrected?

Because they are in the nature of technology. Generative AI models work according to the principle: “What is the most likely next sequence of tokens?” They do not optimize for truth, but for probability and plausibility.

The Solution: Context Engineering and Hybrid AI

How do you solve this? The BCG briefing shows that an enormous methodical effort is required. But not with more generative AI – but with a well thought-out combination of analytical and generative AI.

Here are the four pillars of reliable AI systems:

1. Calibrated Models and Human-In-Loop

Models have to learn when they should trigger a human. Not all decisions can be automated. A calibrated system recognizes: “In this situation, I don’t have enough confidence to decide on my own. I’m escalating to a human.”

2. Confidence Scores

Each answer from the system should have a confidence score – a numerical rating of how confident the model is with the answer. It’s not perfect, but it gives you an indication.

3. Constraint Rules

Clear rules define what the agent may release and when, and when it must escalate. These are not simply technical rules – they are business rules that reflect your risk management.

4. Analytical AI as a Foundation

And here comes the crucial point: all of this only works if you have a foundation of analytical AI.

The Symbiosis: Analytical AI + Generative AI

At moresophy, we call this the symbiosis of analytical and generative AI.

Analytical AI works like hard-working ants:

  • It prepares data systematically, transparently and comprehensibly.
  • She misses your data.
  • It systematically compares historical and current cases.
  • It identifies anomalies and patterns.
  • It provides precise metrics that you can use to control the whole thing.

That is reproducible. It is transparent. It is comprehensible. And – this is crucial it works with around 3% of the energy that generative AI costs.

In the next step, generative AI comes into play: it interprets this data and makes signals understandable for humans. It helps with evaluation and decision-making. It is the tool for intelligent interpretation – not for data preparation.

This is the answer to the question: “Calculating instead of guessing.”

Example from Practice

Banking and Chart Analysis

In banking, an analyst wants to understand: Why does stock A react differently to political changes than stock B?

Naive AI approach: “Analyze the last 10 years of chart data for stock A and B.” This will not work because the AI cannot recognize any real causality.

The right approach:

  1. Analytical AI collects multi-source data: Chart data, market developments, technology trends, industry reports, brand reputation, macro factors such as climate and political developments.
  2. It measures and correlates: The system recognizes which of these factors correlate with share movements – and to what extent.
  3. Signals as metrics: The system provides signals such as “If regulation in sector X is tightened, share A will fall by an average of 2%, but share B will fall by 5%.”
  4. Generative AI explains the correlations. The AI system interprets: “Stock B reacts more strongly because company B has higher compliance costs. That is the main driver relationship here.”

This is vreliable, transparent and, explainable.

The Core Principle: The Value Lies in the Focus

Here’s the key takeaway from this piece:

The value is not in the amount of AI agents you implement. The value lies in focusing on the right processes and executing them reliably.

This means not quickly building 10 AI agents and hoping that one will work. Instead: really understand 1-2 critical processes, process them with analytical AI, control them with precise metrics and then – only then – use generative AI intelligently.

That is the difference between the 95% failures and the 5% successes.

 

In the next part of this series, we will show you why regulation – especially BaFin requirements – is not your enemy, but your competitive advantage.

 

Portrait von Prof. Dr. Heiko Beier

CEO of MORESOPHY

Heiko Beier is a professor of media communication and an entrepreneur specializing in data analytics and artificial intelligence. As an expert in cognitive business transformation, he supports companies in various industries in the design and implementation of digital business models based on smart data technologies.

More articles from Data-Driven Business

Demografischer Wandel als Treiber
Friederike Scholz

|

September 1, 2025

|

7 min. read
Scroll to Top
Cookie Consent Banner by Real Cookie Banner