Article

Deep Learning: New Kid on the Supervised Machine Learning Block

Frank and Martin discuss the excitement around the power of deep learning.

June 26, 2017 4 min read

In the second installment of this blog, we introduced machine learning as a subfield of artificial intelligence (AI) that is concerned with methods and algorithms that allow machines to improve themselves and to learn from the past. Machine learning is often concerned with making so-called “supervised predictions,” or learning from a training set of historical data where objects or outcomes are known and are labelled. Once trained, our machine or “intelligent agent” is enabled to differentiate between, say, a cat and a mat.

The currently much-hyped “deep learning” is shorthand for the application of many-layered artificial neural networks (ANNs) to machine learning. An ANN is a computational model inspired by the way the human brain works. Think of each neuron in the network as a simple calculator connected to several inputs. The neuron takes these inputs and applies a different “weight” to each input before summing them to produce an output.

If you’ve followed this so far, you might be wondering what all the fuss is about. What we have just described — take a series of inputs, multiply them by a series of coefficients and then perform a summation — sounds a lot like boring, old linear regression. In fact, the perceptron algorithm — one of the very first ANNs constructed — was invented way back in 1957 at Cornell to support image processing and classification (class = “cat” or class = “mat”?). It was also much-hyped, until it was proven that perceptrons could not be trained to recognize many classes of patterns.

Research into ANNs largely stagnated until the mid-’80s, when multilayered neural networks were constructed. In a multilayered ANN, the neurons are organized in layers. The output from the neurons in each layer passes through an activation function — a fancy term for an often nonlinear function that normalizes the output to a number between 0 and 1 — before becoming an input to a neuron in the next layer, and so on, and so on. With the addition of “back propagation” (think feedback loops), these new multilayer ANNs were used as one of several approaches to supervised machine learning through the early ’90s. But they didn’t scale to solve larger problems, so couldn’t break into the mainstream at that time.

The breakthrough came in 2006 when Geoff Hinton, a University of Toronto computer science professor, and his Ph.D. student Ruslan Salakhutdinov, published two papers that demonstrated how very large neural networks could work much faster than before. These new ANNs featured many more layers of computation — and thus the term “deep learning” was born. When researchers started to apply these techniques to huge data sets of speech and image data — and used powerful graphics processing units (GPUs) originally built for video gaming to run the ANN computations — these systems began beating “traditional” machine learning algorithms and could be applied to problems that hadn’t been solved by other machine learning algorithms before.

Milestones in the development of neural networks (Andrew L. Beam)

But why is deep learning so powerful, especially in complex areas like speech and image recognition?

The magic of many-layered ANNs when compared with their “shallow” forebears is that they are able to learn from (relatively) raw and very abstract data, like images of handwriting. Modern ANNs can feature hundreds of layers and are able to learn the weights that should be applied to different inputs at different layers of the network, so that they are effectively able to choose for themselves the “features” that matter in the data and in the intermediate representations of that data in the “hidden” layers.

By contrast, the early ANNs were usually trained on handmade features, with feature extraction representing a separate and time-consuming part of the analysis that required significant expertise and intuition. If that sounds (a) familiar and (b) like a big deal, that’s because it is; when using “traditional” machine learning techniques, data scientists typically spend up to 80 percent of their time cleaning the data, transforming them into an appropriate representation and selecting the features that matter. Only the remaining 20 percent of their time is spent delivering the real value: building, testing and evaluating models.

So, should we all now run around and look for nails for our shiny, new supervised machine learning hammer? We think the answer is yes — but also, no.

There is no question that deep learning is the hot new kid on the machine learning block. Deep learning methods are a brilliant solution for a whole class of problems. But as we have pointed out in earlier instalments of this series, we should always start any new analytic endeavour by attempting to thoroughly understand the business problem we are trying to solve. Every analytic method and technique is associated with different strengths, weaknesses and trade-offs that render it more-or-less appropriate, depending on the use-case and the constraints.

Deep learning’s strengths are predictive accuracy and the ability to short-circuit the arduous business of data preparation and feature engineering. But, like all predictive modeling techniques, it has an Achilles’ heel of its own. Deep Learning models are extremely difficult to interpret, so that the prediction or classification that the model makes has to be taken on trust. Deep learning has a tendency to “overfit” — i.e., to memorise the training data, rather than to produce a model that generalizes well — especially where the training data set is relatively small. And whilst the term “deep learning” sounds like it refers to a single technique, in reality it refers to a family of methods — and choosing the right method and the right network topology are critical to creating a good model for a particular use-case and a particular domain.

All of these issues are the subject of active current research — and so the trade-offs associated with deep learning may change. For now, at least, there is no one modeling technique to rule them all — and the principle of Occam’s razor should always be applied to analytics and machine learning. And that is a theme that we will return to later in this series.

For more on this topic, check out this blog about the business impact of machine learning.

Tags

Martin has over 27-years of experience in the IT industry and has twice been listed in dataIQ’s “Data 100” as one of the most influential people in data-driven business. Before joining Teradata, Martin held data leadership roles at a major UK Retailer and a large conglomerate. Since joining Teradata, Martin has worked globally with over 250 organisations to help them realise increased business value from their data. He has helped organisations develop data and analytic strategies aligned with business objectives; designed and delivered complex technology benchmarks; pioneered the deployment of “big data” technologies; and led the development of Teradata’s AI/ML strategy. Originally a physicist, Martin has a postgraduate certificate in computing and continues to study statistics.

View all posts by Martin Willcox

Dr. Frank Säuberlich leads the Data Science & Data Innovation unit of Teradata Germany. It is part of his repsonsibilities to make the latest market and technology developments available to Teradata customers. Currently, his main focus is on topics such as predictive analytics, machine learning and artificial intelligence.
Following his studies of business mathematics, Frank Säuberlich worked as a research assistant at the Institute for Decision Theory and Corporate Research at the University of Karlsruhe (TH), where he was already dealing with data mining questions.

His professional career included the positions of a senior technical consultant at SAS Germany and of a regional manager customer analytics at Urban Science International. Frank has been with Teradata since 2012. He began as an expert in advanced analytics and data science in the International Data Science team. Later on, he became Director Data Science (International).

His professional career included the positions of a senior technical consultant at SAS Germany and of a regional manager customer analytics at Urban Science International.

Frank Säuberlich has been with Teradata since 2012. He began as an expert in advanced analytics and data science in the International Data Science team. Later on, he became Director Data Science (International).

View all posts by Dr. Frank Säuberlich

Stay in the know

Subscribe to get weekly insights delivered to your inbox.

Business Email*

Country*

Yes

I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

address1

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Statement.

Deep Learning: New Kid on the Supervised Machine Learning Block

About Martin Willcox

About Dr. Frank Säuberlich