Tutorial for the 25TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Time series forecasting is a key ingredient in the automation and optimization of business processes: in retail, deciding which products to order and where to store them depends on the forecasts of future demand in different regions; in cloud computing, the estimated future usage of services and infrastructure components guides capacity planning; and workforce scheduling in warehouses and factories requires forecasts of the future workload. Recent years have witnessed a paradigm shift in forecasting techniques and applications, from computer-assisted model- and assumption-based to data-driven and fully-automated. This shift can be attributed to the availability of large, rich, and diverse time series data sources and result in a set of challenges that need to be addressed, such as the following: How can we build statistical models to efficiently and effectively learn to forecast from large and diverse data sources? How can we leverage the statistical power of “similar” time series to improve forecasts in the case of limited observations? What are the implications for building forecasting systems that can handle large data volumes?
The objective of this tutorial is to provide a concise and intuitive overview of the most important methods and tools available for solving large-scale forecasting problems. We review the state of the art in three related fields: (1) classical modeling of time series, (2) modern methods including tensor analysis and deep learning for forecasting. Furthermore, we discuss the practical aspects of building a large scale forecasting system, including data integration, feature generation, backtest framework, error tracking and analysis, etc. While our focus is on providing an intuitive overview of the methods and practical issues which we will illustrate via case studies and interactive materials with Jupyter notebooks.
Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), the Research Contributions Award in ICDM 2006, the SIGKDD Innovations Award (2010), twenty ``best paper’’ awards (including two test of time awards), and four teaching awards. Five of his advisees have attracted KDD or SCS dissertation awards. He is an ACM Fellow, he has served as a member of the executive committee of SIGKDD; he has published over 300 refereed articles, 17 book chapters, and two monographs. He holds eight patents and has given over 40 tutorials and over 20 invited distinguished lectures. His research interests include data mining for graphs and streams, fractals, database performance, and indexing for multimedia and bioinformatics data.
Valentin Flunkert is a Senior Machine Learning Scientist in Amazon’s AWS AI Labs, where he has developed new deep learning based forecasting methods and applied them to solve a range of business problems. Prior to joining Amazon he received his PhD in Theoretical Physics from TU Berlin and worked as a software engineer at SAP. His research in theoretical Physics focuses on time-delay in complex nonlinear systems and its applications in complex networks, the the role in control theory.
Jan Gasthaus is a Senior Machine Learning Scientist in the Amazon AI Labs, working mainly on time series forecasting and large-scale probabilistic machine learning. He is passionate about developing novel machine learning solutions for addressing challenging business problems with scalable machine learning systems, all the way from scientific ideation to productization. Prior to joining Amazon, Jan obtained a BS in Cognitive Science from the University of Osnabrueck, an MS in Intelligent Systems from UCL, and pursued a PhD at the Gatsby Unit, UCL, focusing on Nonparametric Bayesian methods for sequence data.
Tim Januschowski is a Machine Learning Science Manager in Amazon AI Labs. He has worked on forecasting since starting his professional career. At Amazon, he has produced end-to-end solutions for a wide variety of forecasting problems, from demand forecasting to server capacity forecasting. Tim’s personal interests in forecasting span applications, system, algorithm and modeling aspects and the downstream mathematical programming problems. He studied Mathematics at TU Berlin, IMPA, Rio de Janeiro, and Zuse-Institute Berlin and holds a PhD from University College Cork.
Yuyang (Bernie) Wang is a Senior Machine Learning Scientist in Amazon AI Labs, working mainly on large-scale probabilistic machine learning with its application in Forecasting. He received his PhD in Computer Science from Tufts University, MA, US and he holds an MS from the Department of Computer Science at Tsinghua University, Beijing, China. His research interests span statistical machine learning, numerical linear algebra, and random matrix theory. In forecasting, Yuyang has worked on all aspects ranging from practical applications to theoretical foundations.
Some recent tutorials by Christos and Co. on big time series mining:
Several of the notebooks come from the time series chapter we are writing for Deep Learning – The Straight Dope, an interactive book on deep learning by our colleagues at Amazon: Zachary C. Lipton (@zackchase), Mu Li (@mli), Alex Smola (@smolix), Sheng Zha (@szha), Aston Zhang (@astonzhang), and others.