Time-series forecasting is a complex analytical challenge for large enterprises. A large retailer typically has thousands to millions of products to forecast. An energy company could have millions of household smart meters to forecast energy consumption. The manufacturing company could have billions of IoT sensors to forecast and predict anomalies in the near future.
There are various approaches to doing time-series forecasting, such as using Python/R, external forecasting services, and in-database forecasting. Amongst all the approaches, the right way to do time-series forecasting is using an in-database approach. Why, you ask? Let me give you the top three reasons why using in-database forecasting is the best approach when it comes to forecasting at hyper-scale.
Time-series forecasting at hyper-scale is only possible with in-database algorithms
Generally, data scientists are used to solving time-series forecasting with Python/R or some external time-series forecasting services. Such an approach can work for limited forecasts, but soon becomes a constraint when you need to manage thousands or millions or potentially billions of forecasts. The reason is that Python/R may not be suitable for very large-scale forecasting as each forecast model requires an independent Python/R processing thread. This can represent huge computational overhead when you need to make millions of forecasts.
In addition, transferring data to external forecasting services can mean a high amount of data transfer, sometimes transferring the data twice. The first transfer is to send data to forecasting services and the second is to get the forecasted results back to integrate with the rest of the data. Imagine sending millions/billions of records back and forth twice. It can represent a huge overhead.
The optimal way to do a time-series forecast with hyper-scale is to do it in-database. Teradata ClearScape analytics in-database forecast functions are built on world-class Teradata Vantage MPP (massively parallel platform). The platform has various features such as smart scaling, advanced indexing, workload management, and adaptive cost optimizer which enables running in-database functions at hyper-scale. In addition, as the functions are run within the database, there is zero data movement outside of external platforms. Thus, in-database forecast algorithms help you to make forecasts at hyper-scale and at the same time minimize cost and overheads.
Time-series forecasting with SQL can be easier than forecasting with Python/R
Time-series forecasting needs to be hyper-scalable, but in many cases, the algorithmic complexity is not very high. Take for example a retail business scenario. Even if there can be thousands of products to forecast, most of the products would have a similar sales pattern. Red flowers would sell more on Valentine’s Day and ice creams would sell more in summer. Though these are common examples, for most of the products, the sales pattern is predictable. Similarly, forecasting energy consumption for households, as done by Energy companies, is also predictable for most households.
So as algorithmic complexity is less, one can easily make hyper-scale forecasts directly with SQL. As Teradata ClearScape analytic functions are in-database functions, you can easily use them directly with SQL. Additionally, business analysts can also do the forecast themselves as they are comfortable with SQL.
It is also important to note that time series forecasting is different from usual machine learning tasks. It can be required to forecast very frequently as well as at different granular levels such as hourly, daily, weekly, and monthly. So, making it accessible to a business analyst through SQL makes it possible to have forecasts done more frequently and as required by the business.
Our internal benchmarks have shown that time-series forecasting with in-database functions can be at least 11 times more performant than using Python/R time-series algorithms
Accelerate operationalizing of forecast and thus leading to business value
For a retailer, the forecast can represent a multi-million-dollar purchase budget as well as a commitment to fulfill customer demand. So, it needs to integrate with the rest of the data to analyze it further as well as make it operational. For example, you will have to integrate it with stock-level information to get insight into your purchase budget. Operationalizing forecasts also requires analyzing substitutes as well as lost-sales analysis for non-available products.
With the in-database forecast, integration with operational data is seamless, as all data is accessible via the same platform. With Teradata Vantage, you need to make just a single SQL call which will make the forecast integrate it with operational data. It can also access different systems to get the operational data. However, for a business user, it is transparent and thus accelerates the operationalization of forecasts.
In summary, an in-database time-series forecast is suitable for hyper-scale forecasting needs and will help you achieve business value through seamless operationalization.
On a closing note, let me also state that at the time of writing the blog, none of Teradata’s closest competitors have native in-database forecasting time-series functions.