Deep NLP Shifts from Modeling to Annotation

May 18, 2021

Achieve human-level NLP with deep learning, enabled by big data and advances in computing and architectures.

Machine-based Natural Language Processing (NLP) has found its way into our daily lives – Siri, Google Translate, recommendation engines, legal document analysis, and even creative writing. This has been made possible by the creation of high-performing NLP systems with designs that make minimal assumptions about language and learn almost everything they need to ‘know’ from annotated training data. This has made precise, high quality, and efficient annotation particularly important to modern NLP.

Deep NLP

The evolution of NLP systems over the past decades shows how training data came to become so important, and how NLP came to use an approach very different from the way humans understand language. Let’s look at a bit of this history.

Early Optimism: Rules and Dictionaries


In the 1950’s digital computers were relatively new and their ability to run complex programs and their use of databases was seen as heralding a new age of ‘thinking machines’. In 1954 an IBM 701 computer (2KFLOPS, $500,000) was programmed with language rules and word definitions that enabled it to translate Russian sentences into English. As the New York Herald Tribune gushed, ‘Once the Russian words were fed to the machine no human mind intervened!’

Although the range of sentences in the 1954 demo was very limited, it seemed to prove the fundamental feasibility of machine translation. After all, isn’t language just a matter of rules of grammar and dictionaries of definitions? Buoyed by the success of the Russian-to-English demo, experts were predicting that the problem of automatic machine translation would be solved within 3 to 5 years.

The Quest for Language Understanding

The 3-to-5-year prediction proved way too optimistic. Decades of work on NLP systems based on language syntax and word meanings produced only limited success. It became clear that human language is not just about rules and definitions, but also about practical knowledge of the world. When humans use language, they don’t just recognize symbols and put them in the correct order; the symbols evoke objects, actions, ideas, and relationships, all of which are important to human language understanding.

Idioms and metaphors require a lot of context to interpret, and they were troublesome for early NLP systems. Legend has it that the English phrase ‘The spirit is willing, but the flesh is weak’ was sent through English-to-Russian-to-English NLP systems, with the result ‘The vodka is good, but the meat is rotten’!

The limitations of rules and definitions led to NLP being seen primarily as a problem of language understanding – interpreting the meaning of an input phrase in terms of a language-independent world model, then creating a description of the world model in the target language. To address this challenge, workers in NLP began to focus on encoding human knowledge in a way that would be useful to language processing and other AI tasks.

One example is the CYC project, which since 1984 has spent over 1000 person-years creating a database of 25 million items constituting human common sense. While much has been learned from projects such as this, they have not led to any breakthroughs in NLP.

Deep Learning and Big Data

Encoding human knowledge for machine use continues to be an important area of research, and symbolic and statistical knowledge and language models are key to many automated systems today. However, NLP systems based on this approach have not been able to match human performance.

The breakthrough needed achieve human-level NLP was deep learning, enabled by the availability of big data and advances in computing and architectures. Over the last 10 years these systems have led to dramatic improvements. For example, in 2016 Google announced the GNMT system, a deep neural network that reduced translation errors by 60% compared to previous methods, achieving performance comparable to human translators.

Deep NLP systems such as GNMT do not achieve their exceptional performance by explicitly representing world knowledge, linguistic rules, or definitions. They do not recognize language meaning in the way humans do. They are designed to learn only word occurrence patterns, such as which words are likely to precede or follow a particular word, or which words are likely to be contained in a translation of a particular word. It was a surprising discovery that word occurrence patterns are enough for human-level NLP. It is possible only because deep NLP systems are capable of learning extremely complex patterns, through the use of millions or even billions of parameters. However, the need to optimize so many parameters, which constitute everything the ML system ‘knows’ about language, puts a heavy burden on training data and its annotation.


Let’s take a look inside one of these systems.

The example to the left is a component of a deep NLP system called the transformer. It is used in a 375-million-parameter translation system that can simultaneously handle 103 different languages. This system was trained with over 25 billion examples.

The transformer is made up of subunits that convert phrases and letter locations to numbers (embedding and positional encoding), learn how earlier and later parts of a phrase affect a word (multi-head attention), and manage the calculations that bring together all the pieces of evidence to make a final translation decision (add and normalize, feedforward, linear, softmax).

As a side note, the evolution of the transformer from earlier deep NLP architectures illustrates a shift toward even greater reliance on training data. Google’s 2016 GNMT system used an architecture called Long-Short-Term-Memory (LSTM). LSTM is a recurrent neural network (RNN) that is explicitly structured to process language as a sequence. The newer transformer architecture does not explicitly assume a sequence, and instead processes entire phrases simultaneously. This means transformer learning is less constrained, with the result that it can perform better than LSTM, but it needs to get more from its training data to enable its ‘extra’ learning.

Key Takeaways: Annotation Drives NLP Performance

Deep NLP systems achieve breakthrough performance by leaving the meaning of words to humans, and sticking to what machines are good at: finding patterns in data. Deep NLP only works because human annotation allows meaning to be reduced to patterns.  This gives expert data annotation a particularly vital role in modern NLP.

Annotation for NLP must be carefully structured to consistently and accurately meet the specific requirements of the ML system. While machine translation applications can take advantage of large databases of previously translated text, annotation for other NLP applications can require more nuanced or specialized interpretations.

Annotation Drives NLP Performance

Applications such as sentiment and intent analysis, named entity recognition, and entity classification draw heavily on the general knowledge of human annotators, and it is particularly important that the annotation process be driven by clear requirements, training in the specifics of the application, and processes that ensure consistent quality and efficiency.

If you wish to learn more about how to create an effective data annotation process that gives your NLP system the patterns it needs to learn the language of your application, please contact us to talk to an expert.