Automatic decision-making systems based on machine learning often contain a so-called “bias “. This refers to evaluations that are made on the basis of insufficient knowledge and therefore a one-sided assessment of certain facts. This can occur not only with ML algorithms, as humans also often incorporate prejudices into their decisions – usually unconsciously. The key ethical questions in machine learning in this regard are therefore: Can machine learning algorithms contain bias? And if so, when and how much bias is acceptable?
What Is Bias?
But before we turn our attention to bias in machine learning, it is helpful to realize that we humans are not free from biases and distortions of thought either. One of the most well-known cognitive biases, for example, is confirmation bias: People, especially under the influence of strong emotions, tend to process information in such a way that their previous view on it is confirmed.
This can happen in different ways: on the one hand, when selecting information by trusting sources and people who confirm one’s own view and ignoring disconfirming sources of information. On the other hand, in the interpretation of information. In the case of data in particular, it often becomes clear that there can be different explanations for a data curve, for example. For example, there were scientists who attributed the rising positive rate of coronavirus infections in the UK when the B117 mutation appeared at the same time to the mutant itself. Other researchers pointed to the increased and mixed-age meetings during the Christmas period, which alone can explain such an increase. In this way, different interpretations of reality can arise from the same database.
Finally, the recall of information can also be distorted, primarily by remembering evidence that confirms a particular thesis. The recall of information is particularly influenced by strong emotions, so that feelings of strong fear or anger can lead to selective recall.
What Can Be the Consequences of Bias in ML Systems?
Bias in decision-making systems can result in both groups of people and individuals being treated unfairly – with far-reaching consequences for their creditworthiness or potential job offers, for example, depending on where the system is used. Even entire regions can be affected when ML algorithms decide in which districts the police presence should be increased. This can even result in something of a self-fulfilling prophecy, as an increased police presence often leads to a higher number of criminal acts being detected.
What Are the Causes of Bias in Machine Learning?
Machine learning systems identify patterns in training data and use them to predict future data. The learning process is carried out autonomously by the programmer, except for the possible definition of certain parameters. It is a mathematical optimization in which the prediction quality is improved using statistical metrics. This means that machine learning algorithms create their own model based on the available data and data no longer plays a passive role, but an active one. This can be problematic because correlations are not necessarily an indication of causality, but can be caused by an indirect relationship. Probably the most famous example of this is the correlation between stork pairs and the number of births in Europe. In reality, of course, it is not the storks that are the cause, but the fact that storks nest more in rural areas where people tend to have more children.
The above example shows that the training data set itself can lead to such a bias if it uses apparent correlations for prediction. The selection of the training data can also lead to such a bias: Data sets can be incomplete or unrepresentative , or far too small to generate reliable predictions. Especially in the early days of machine learning at universities, it was not uncommon for students to be recruited for the generation of training data, which led to a considerable sampling bias in face recognition, for example ( Selection Bias ), as this was not a cross-section of the population. The socio-cultural context also plays a role. For example, a data set representative of Germany is not necessarily suitable for making predictions in India or China.
How Can a Bias be Prevented?
A bias usually arises unintentionally. Management or the development department often do not reflect on the unintended consequences of their technology. Often, an insufficient understanding of the mechanisms of ML systems plays a role, or there is a lack of awareness of the problem of bias in data from the outset, or there is even a human bias. It is therefore advisable to devote time and resources to the quality of the training data and to reflect on and document processes for obtaining training data. Such transparency offers the opportunity to continuously improve the training data.
In Natural Language Processing, this means Instead of crawling as much data as possible from the web for text models, it is important to select data that is as representative as possible. Knowledge of the origin and significance of a bias is important in order to minimize distortions and contain negative consequences.
Project manager
Andreas studied Technology & Media Communication and is primarily responsible for internal and external communication and documentation within the company. This gives him an optimal overview of the various technologies, applications and customers of MORESOPHY.
More articles from Responsible AI


|
|
