What is Data Annotation? — Definition by Techslang

3 min readMay 27, 2020

Data annotation is simply the process of labeling information so that machines can use it. It is especially useful for supervised machine learning (ML), where the system relies on labeled datasets to process, understand, and learn from input patterns to arrive at desired outputs.

In ML, data annotation occurs before the information gets fed to a system. The process can be likened to using flashcards to teach children. A flashcard with the picture of an apple and the word “apple” would tell the children how an apple looks and how the word is spelled. In that example, the word “apple” is the label.

Other interesting terms…

Types of Data Annotation in ML

Data can be annotated in various ways for a machine’s use, including:

1. Semantic Annotation

This method involves labeling different concepts with text like “things,” “people,” and “names.” Semantic annotation is used to train chatbots and improve the relevance of search engine results. Watch this video for more information.

2. Image and Video Annotation

Labeling images and videos allow machines to understand pictures and video content. Often, developers use bounding boxes to tell computers what to focus on so they can identify specific objects. Image and video annotation is commonly applied to autonomous vehicles and e-commerce product listing.

3. Text Classification or Categorization

This method refers to the process of extracting generic tags from unstructured text. The generic tags come from a set of predefined categories. Text classification or categorization helps users easily search for information and navigate within a website or an application.

Data Annotation Use Cases

Data annotation is useful in:

1. Improving the Quality of Search Engine Results for Multiple User Types

Search engines need to provide users with comprehensive information. Their algorithms must process high volumes of labeled datasets to give the right answer to do that. Take, for example, Microsoft’s Bing. Since it caters to multiple markets, the vendor needs to make sure that the results the search engine would provide would match the user’s culture, line of business, and so on.

2. Refining Local Search Evaluation

While search engines cater to a global audience, vendors also have to make sure that they give users localized results. Data annotators can help with that by labeling information, images, and other content according to geolocation.

3. Enhancing Social Media Content Relevance

Like search engines, social media platforms also need to provide customized content recommendations to users. Data annotation can help developers classify and categorize content for relevance. An example would be categorizing which content a user is likely to consume or appreciate based on his/her viewing habits and which he/she would find relevant based on where he/she lives or works.

Data annotation is time-consuming and tedious. Thankfully, artificial intelligence (AI) systems are now available to automate the process.

Originally published at https://www.techslang.com on May 27, 2020.