Uncovering how Deep Learning Works

This article first appeared in DNA India on 28th September, 2017.

Neural networks and deep learning have been behind the flowering of artificial intelligence (AI) over the last few years. Neural networks (NN), which are loosely based on the architecture of our brain with multiple neurons working together, have been around since the 1960s. However, given the large computing demands of NN, it took the emergence of high-performance computing and the availability of faster chips to enable users to derive practical applications based on NN-based machine learning approaches or ‘deep learning’ as it is now called.

NN is essentially a machine learning approach. Any machine learning approach requires training data that is used to train the algorithm into recognising patterns. Readers have all used the camera function in their mobile phones and have seen how cameras can identify faces within a scene and put a square box around them. This is based on machine learning approaches where training data is provided to the algorithm, allowing it to learn what constitutes a “face” and what could be say, just a cloud in the sky resembling a face. Since a face could be framed in front of a house or a garden or another face, the algorithm needs to be provided with enough diversity of training data to be able to recognize all these differences.

NN belongs to the class of machine learning algorithms known as black-boxes. As the term implies, the inner workings of NN have been poorly understood and hence the reason why NN is able to learn patterns and output results with a high degree of accuracy have been an area of research for scientists for a long time. This is in contrast to “open” algorithms like decision trees where the sequence of steps leading to a decision is well delineated and is easier for humans to comprehend.

Recent research, however, has thrown up something interesting about how NN learn patterns. Scientists at the Hebrew University in Israel have found that in addition to identifying specific patterns within the data, NN has the ability to also throw away irrelevant information. For example, algorithms are able to identify that a car or a tree next to a face is not necessarily the most important information in facial recognition and is able to “blur” this information leading to the capability of generalising their results to cases that lie outside of the training set that the algorithms were trained on. The researchers call it the information “bottleneck” problem since the bottleneck allows only the most pertinent and relevant information to be retained.

In effect, what scientists are now saying is that the black box approaches learn by increasing the signal to noise ratio, by throwing away much of the noise, thereby allowing the true pattern to emerge. They believe that this is what makes deep learning and NN powerful at classification tasks which require classifying objects into buckets (think “face”, “no-face”). However, what comes as a relief to folks involved in cyber-security is they don’t expect deep learning to be able to break cryptographic codes because decryption algorithms demand high fidelity for signals and throwing away any signal along with the noise could render it impossible to decrypt encrypted data.

Somehow, I feel that I can sleep much better knowing that.