In general, a distinction is made between four different types of models in machine learning: Supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. This article is intended to give beginners or non-technicians a clearer understanding of supervised learning.
Learning with the Gold Standard
In supervised learning, an algorithm learns from pairs of inputs and outputs. The actual value or the actual label is available for each data instance: An example of a data instance would be a word, for example, if a model for word type recognition (POS tagging) is to be trained. The corresponding correct label is then the word type defined by humans. This set of actual values is also called the annotations or the gold standard.

The Learning Process of an ML Model Clearly Explained
The learning process of an ML model can be compared to that of a child who receives corrections from adults. At the beginning, without having seen many instances and corrections, a small child learning to speak often still has oversimplified ideas: After his parents have shown him a pigeon three times and named it, he will most likely also initially refer to a “seagull” as a pigeon, after all it is also a bird.
Only after a few corrections will the child learn that not all birds are pigeons. Rather, one must look at the color of the plumage and the shape of the beak, among other things, to correctly name the bird species. The characteristic of having wings and a beak is too general. The more slightly different seagulls the child sees, the more accurate they will be in recognizing seagulls. If, on the other hand, they have never seen a raven, they will not be able to name it correctly. This example clearly shows that the training data of a machine learning model should be representative and complete for the subsequent prediction task.
Does the Gold Standard Always Have to Come from Human Annotators?
The gold standard values do not always have to be generated by human annotation of data, especially in the area of regression (prediction of continuous values such as in the prediction of air pollution values), the values of the gold standard can be determined by measurement sensors, for example. Combined methods are also available, especially as gallable data is not always available and the labeling of data is time-consuming: In semi-supervised learning, a larger part of the annotations is generated by machine, while a smaller part is annotated by humans.
At MORESOPHY, we use machine learning models trained in this way for segment classification within our semantic knowledge graph, for example.
Project manager
Andreas studied Technology & Media Communication and is primarily responsible for internal and external communication and documentation within the company. This gives him an optimal overview of the various technologies, applications and customers of MORESOPHY.
More articles from Responsible AI


|
|
