Article

Machine Learning Goes Back to the Future

AI gave rise to the study of machine learning – which led in turn to data mining.

May 8, 2017 5 min read

In the first installment of this blog series, we described how the quest for artificial intelligence (AI) gave us the discipline of machine learning – the study of how to enable an intelligent agent to learn from data to improve its performance. But what has any of that got to do with commercial analytics?

Learning and predicting from data pre-dates the study of AI – and dusting off centuries old, tried and tested mathematical techniques like linear regression and Bayesian statistics turned out to be (much) easier than some of the “hard” problems in AI.

Not only that, but as as more and more business processes and systems were computerized during the 70s and 80s – and as commercial databases begin to proliferate as a result – applying these methods to the data in those databases also turned out to have very valuable commercial applications, like forecasting demand for perishable products in grocery retail, for example, or identifying potentially fraudulent transactions in retail finance.

“Knowledge discovery in databases” started as an off-shoot of machine learning, with the first Knowledge Discovery and Data Mining workshop taking place at an AI conference in 1989 and helping to coin the term “data mining” in the process – a term that we will come back to a little later in this blog. And so, AI gave rise to the study of machine learning – which led in turn to data mining.

Supervised and unsupervised methods

Machine learning is often concerned with making so-called “supervised predictions”, i.e. in learning from a training set of historical data in which objects or outcomes are known and are labelled, so that the intelligent agent can differentiate between, say, a cat and a mat. Or so that it can learn to identify the signals in petabytes of sensor data that characterize the imminent failure of a train, a jet engine or a paper mill. The objective, in both cases, is to produce a model that can predict a target variable – whether an object is a cat or a mat, or whether a train will fail or not within the next 36 hours – from input data – images harvested from the Internet, or the readings from the temperature, pressure and vibration sensors on the train.

By contrast, data mining is often also concerned with the discovery of previously unknown patterns or structures in data. Retailers, for example, have long been interested in finding groups of customers who behave in similar ways and in “clustering” shopping missions, to understand consumer behavior and how stores are shopped. These are examples of the applications of “unsupervised methods”; we are still feeding the clustering algorithms historical data, but the data aren’t labelled - because we don’t know exactly which outcomes we are looking for. When one of us undertook our first customer behavioural segmentation project using an unsupervised approach, for example, we were not expecting to find a large group of consumers shopping our stores between 5pm and 9pm and whose baskets almost exclusively contained breath mints, flowers and chocolates – nor another, buying almost exclusively frozen products, apparently for immediate consumption. But there they were!

Four things to remember

We don’t want you to get too hung-up on history or terminology, but we do want you to understand four things.

Firstly, you simply can’t “machine learn everything” – not least because supervised methods pre-suppose that you have a relevant, labelled training data-set to learn from and because the results of an unsupervised analysis may be hard to interpret, or even irrelevant. But also because in many cases there are anyway better routes to goal. By and large the big web properties don’t try to “machine learn” how big or which colour to make the “buy it now button” – they mostly run multiple, concurrent A/B tests instead. It’s quicker and it’s easier. And the output is not a prediction that may – or may not – prove to be accurate; but instead is a measurement of whether treatment A is more effective than treatment B for a particular customer segment right now and that can be easily compared with other similar measurements.

Secondly, that “data mining” was once the cool new term – popularised, in part, by vendors and marketing departments who thought that “knowledge discovery in databases” wasn’t catchy enough – and who wanted to try and distinguish the application of these methods to commercial data captured in databases from dry, dusty and apparently far-off academic concerns about machine leaning and AI. Fast-forward three decades - and now vendor marketing departments are in many cases attempting to differentiate their offers from existing data mining technologies by applying the label “machine learning” to them, apparently without realising that the term pre-dates the term “data mining”. In a very real sense, the marketing hype has literally come full circle.

Thirdly, that data mining started as an off-shoot of machine learning – itself a product of the pursuit of AI – and that the fields remain closely linked and continue to share multiple techniques, algorithms, and researchers. So closely linked, in fact, that the two expressions are often used interchangeably - and in many situations are practically synonymous. When a mobile telecommunications company builds a model to predict which customers are likely to churn based on historical data that describes customers who have already recently cancelled their service, we can – and probably we should - call that “machine learning” (because we are using a computer to build a model from labelled historical data), even if we use a mathematical method, like linear regression, that pre-dates Turing, digital computers and the Dartmouth Conference. In practice, you will find plenty of practitioners describing the same activity as “data mining”, “data science” or just plain old “analytics”. You say tom-ay-to, and I say tom-ah-to.

Lastly, whilst you should absolutely embrace some of the newer machine learning techniques and technologies – as we’ll see later in this series of blogs, the deep learning family of methods in particular has already become the de facto solution for a whole range of high-value business problems - you would be unwise to throw out the more established methods and techniques in the process. Because as we’ll also see later, in many cases we may prefer a simple solution that is sufficiently accurate to a more complex one.

Tags

Martin has over 27-years of experience in the IT industry and has twice been listed in dataIQ’s “Data 100” as one of the most influential people in data-driven business. Before joining Teradata, Martin held data leadership roles at a major UK Retailer and a large conglomerate. Since joining Teradata, Martin has worked globally with over 250 organisations to help them realise increased business value from their data. He has helped organisations develop data and analytic strategies aligned with business objectives; designed and delivered complex technology benchmarks; pioneered the deployment of “big data” technologies; and led the development of Teradata’s AI/ML strategy. Originally a physicist, Martin has a postgraduate certificate in computing and continues to study statistics.

View all posts by Martin Willcox

Dr. Frank Säuberlich leads the Data Science & Data Innovation unit of Teradata Germany. It is part of his repsonsibilities to make the latest market and technology developments available to Teradata customers. Currently, his main focus is on topics such as predictive analytics, machine learning and artificial intelligence.
Following his studies of business mathematics, Frank Säuberlich worked as a research assistant at the Institute for Decision Theory and Corporate Research at the University of Karlsruhe (TH), where he was already dealing with data mining questions.

His professional career included the positions of a senior technical consultant at SAS Germany and of a regional manager customer analytics at Urban Science International. Frank has been with Teradata since 2012. He began as an expert in advanced analytics and data science in the International Data Science team. Later on, he became Director Data Science (International).

His professional career included the positions of a senior technical consultant at SAS Germany and of a regional manager customer analytics at Urban Science International.

Frank Säuberlich has been with Teradata since 2012. He began as an expert in advanced analytics and data science in the International Data Science team. Later on, he became Director Data Science (International).

View all posts by Dr. Frank Säuberlich

Stay in the know

Subscribe to get weekly insights delivered to your inbox.

Business Email*

Country*

Yes

I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

address1

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Statement.

Machine Learning Goes Back to the Future

Supervised and unsupervised methods

Four things to remember

About Martin Willcox

About Dr. Frank Säuberlich