Encoding transforms information from one representation to another. When we write, we encode our words or thoughts as scribbles on paper. Typing on a computer encodes words to keystrokes to bits to pixels. Sometimes encoding simply translates information from one medium to another, like Morse Code. Other times encoding creates a key that unlocks complex processes, like the DNA code that drives life itself.
An ML system’s overall function can be called encoding. For example, an ML translator can be said to encode English sentences into Chinese, or we can say an ML object recognizer encodes pixels into vectors of object probabilities.
We can also refer to the internal layers of an ML system as producing a series of intermediate encodings. For example, a particular internal layer in a convolutional neural network might encode ‘roundness’.
In this article we discuss intermediate encodings and how they impact the human effort required for ML system training and operation.
Encoding as Human – Machine Collaboration
Of course, ML systems are built by humans, and in that sense, they are one hundred percent the products of human knowledge. However, the encodings learned by ML systems can also be thought of as a combination of human knowledge and machine ‘knowledge’.
Just as our biology provides the basis for human learning, human-provided ML system designs provide frameworks that enable Machine Learning. Through human engineering, these designs bring ML systems to the point where everything they need to ‘know’ about the world can be reflected in their parameters.
Analogous to the role of our parents and teachers, training data annotation drives the learning process toward competent action. Annotation is the crucial link between the ML system and its operational world, and accurate and complete annotation is the only way an ML system can learn to perform well.
So, learning encodings can be considered a collaborative process between humans and machines: humans provide system design and annotated training data, and ML systems create ‘knowledge’ in the form of optimized parameter sets.
The relative contribution of humans and machines to learning encodings depends on the ML architecture. Some architectures require a relatively limited contribution from humans to build, train, and operate. Other architectures require more. Let’s discuss three examples.
Dense Architectures
One way to design an ML system is to use layers of fully connected units. The dense arrays of parameters used by these systems represent a ‘blank slate’ that can be trained to implement nearly any required layer-to-layer encodings. This makes them widely useful, because the architecture is not tailored to a particular application.
The human contributions to learning encodings with this architecture are relatively limited. The architecture itself can be simply specified in terms of numbers of units and layers. While sufficient quantities of accurately labeled training data are essential, the annotation process tends to involve only simple labeling, since the network can accept the input data in essentially any form.
The limited human knowledge contribution for this architecture puts a greater burden on Machine Learning. For example, if the input data is raw pixels, the ML system will have to learn for itself the layer encodings that reflect important features such as edges, lines, and shapes. This leads to a requirement for large amounts of training data, that grows rapidly as the complexity of the ML task increases. In some applications, this architecture may require too many parameters and too much training data to allow it to be trained within practically available time and computing resources.
Structured Architectures
Dense architectures make essentially no assumptions about the operational domain. This makes these systems broadly applicable, but it limits the complexity of the tasks they can handle.
More structured ML architectures are tailored to particular applications. These systems can give an ML system a head start, reducing what it needs to learn from training data. This allows the systems to scale and perform better.
Sequence models used for Natural Language Processing (NLP) is an example of a more structured ML architecture. These systems are designed to encode how patterns of words in an input relate to patterns of words in the output. For example, the input might be a sentence written in French and the output a translation of that sentence into English:
In this example the ML system learns to encode the French sentence to produce an internal encoding. It then learns to decode the sentence into the appropriate English sentence. In order to learn effective encoding and decoding, the ML system is specifically structured to take into account word patterns in long sequences, and to map patterns in one language to patterns in another language. The diagram shows that in this architecture, input word sequences are continuously translated to output sequences, using previously translated words to help with the translation.
Human contributions are relatively high for structured architectures. Complex design is required. Natural Language Processing systems are typically designed with hierarchies of subsystems, and they can have billions of parameters.
Training data for Natural Language Processing systems requires high levels of human-provided information, and often millions of examples. In applications such as sentiment analysis, efficient annotation is essential. Millions of human-translated examples are required to train translators such as the example above, and they are only feasible because large amounts of translated data are available in digital form.
Autoencoders
Autoencoders are a special type of ML system that might use a dense architecture, but typically use structured architectures. Human contribution is reduced in either case, however, since autoencoders do not require labeled training data.
These systems don’t need labeled training data because they have a distinctive purpose. Rather than learning to recognize or categorize their inputs, autoencoders are generative – they are trained to have 1) an encoding stage, to transform their input into a compressed but useful internal representation, and (2) a decoding stage, to transform the internal representation into an accurate likeness of the input. Since these systems only care if the input closely matches the output, their training data doesn’t need to be labeled.
What makes these systems useful is that they can generate realistic altered versions of their training examples. This is done by systematically modifying the internal representation of an example, then passing this through the learned decoding stage. This is the technology behind ‘deep fakes’.
Here is an example of pictures of four faces altered by an autoencoder (a Conditional Generative Adversarial Network or CGAN) to change facial characteristics including hair color, sex, facial hair, eyeglasses, baldness, and age. The autoencoder had been trained to encode many faces, with and without the various characteristics. This made it possible to calculate the differences in internal encoding, between faces with and without the characteristics.
These differences were then applied to the internal encodings of the original (prior) faces and sent through the decoder stage to produce versions of the faces with changed characteristics. The results are shown for two values of a parameter ldist. When ldist = 0.1, the altered faces are constrained to be more similar to the originals.
Summary
Machine Learning can be described as humans and machines collaborating to learn encodings. Humans provide a framework for learning through ML system design. Annotation provides the critical link between the ML system and its operational environment. Machine ‘knowledge’ is created by optimizing parameter sets.
Different ML architectures lead to different combinations of human knowledge and machine ‘knowledge’. The following table summarizes these differences:
Takeaways
- Dense architectures can be used in almost any application, as long as the ML task is not too complex. These architectures are easy to design, and annotation is relatively simple, but must be efficient to accommodate large training data sets.
- Structured architectures require complex designs, and annotation must contribute large amounts of human knowledge. However, these architectures are scalable and can perform very well on complex ML tasks.
- Autoencoders create realistic alterations of reality. While complex human design is often required, training data does not need to be labeled.