Machine Learning with Customer-Specific Data
Machine Learning with Customer-Specific Data
AI is only as good as the data available to it. For different use cases you need a sufficient amount of data to achieve very good classification performance. Often it is not just about special domains where small amounts of data can be compensated for by domain adaptation or transfer learning. Rather, it is about the customer having a completely Own definition of classes needed. He has something quite Specials in mind and we should help them with this. Machine learning with customer-specific data is the key to success here.
When Machine Learning Needs More Than Standard Data Sets
Computer science students specializing in machine learning realize when they start their careers that the standard data sets they were able to use at university to focus on model development or even improving the learning process often do not cover the special interests of their customers in practice. Very few customers really need big data, i.e. as much general knowledge of the world as possible extracted from huge amounts of data over a terrabyte in size. Instead, most customers need smart data.
Many users have special interests and very specific information that they want to extract from internal and external information sources. This information should be structured in a semantically meaningful way, making it clearer for employees. Employees should not be flooded and distracted by irrelevant information. They can therefore concentrate on the really important activities instead of spending most of their time searching for the information they need. Customer-specific data is important for data quality in such special cases.
Particularly in the area of document classification, many customers are not helped by standard data and defined standard classes alone. A machine can build up a semantically good understanding of general knowledge: For example, the AI learns to recognize whether a document belongs to the business sector or is about sport. But our customer only wants to classify documents from the business sector into more precise categories anyway. This is why the standard data records are only helpful for pre-selecting the training data. They can be used to create a filter that does not take articles from other categories into consideration. In order to do justice to the user, however, we have to design our own class schema and annotate documents accordingly.
Practical Example: Automated Press Review
One example from our professional practice is a customized press review: A MORESOPHY customer wants current press releases to be automatically classified into categories such as “personal details”, “financial” or “expansion”. The aim of our user is to track the business developments of his own customers. An automated press review, so to speak, which filters which articles are interesting and sorts them into the appropriate categories. It is crucial that only relevant business-specific information on selected corporate customers is included in the application.
At the start of the project, no appropriately labeled training data was available. To achieve a good classification result, it was first necessary to collect and annotate the relevant text data.
The client clearly communicated what was important to him in terms of content. This is a good prerequisite for delivering customized data and results. Of course, this does not rule out automatic processes to identify other potential topics that the customer has not yet thought of. For example, by clustering documents.
Nevertheless, a certain amount of cooperation with the customer is necessary in order to make the subsequent selection of classes for their own classification scheme. After all, he only wants to sort certain documents into the press review and not all documents that mention his customer in any way.
Feedback Loop for Customized Classification
How can customer participation also have a positive influence on the process? For example, through a feedback loop in which they can evaluate the classifications of the AI, which has already been trained but is not yet fully developed, using rating buttons. The machine can thus be adapted directly to the user’s needs. Because what is better for company A in one category may be better for company B in another. In this way, users generate their own labeled data that allows the machine to make the decisions they want. Clicking a button after reading an article in the press review is also not particularly time-consuming and can be easily integrated into the normal workflow. This is exactly what we have successfully implemented with this solution.
Machine Learning Does Not Happen in a Vacuum
Our view of the world is highly individual – this also applies to companies or even their various departments. Customized data provides companies with tailored solutions. The main advantage of AI lies in being able to provide these solutions more quickly, as experience from other use cases helps to select the right classification model. After all, manually recognizing and writing rules is very time-consuming, especially if a different classification system is required for each customer. Machine learning algorithms often use automatically generated properties such as word embeddings for classification.
Purely rule-based approaches, on the other hand, require much more intensive technical collaboration with the customer than machine learning systems. Nevertheless, it does not become completely obsolete, as domain adaptation or transfer learning will not help with a completely new classification scheme. Machine learning with customer-specific data is the solution to this problem.
Another example of machine learning with customer-specific data at MORESOPHY is content classification by market segment: the IAB Taxonomy.
Project manager
Andreas studied Technology & Media Communication and is primarily responsible for internal and external communication and documentation within the company. This gives him an optimal overview of the various technologies, applications and customers of MORESOPHY.
More articles from Responsible AI


|
|

|
|