Time series forecasting is an important area of machine learning that is often neglected. It is important because there are so many prediction problems that involve a time component. These problems are neglected because it is this time component that makes time series problems more difficult to handle.
Time series vs. normal machine learning dataset
A normal machine learning dataset is a collection of observations. For example:
Predictions are made for new data when the actual outcome may not be known until some future date. The future is being
predicted, but all prior observations are treated equally. Perhaps with some very minor temporal dynamics to overcome the idea of concept drift such as only using the last year of observations rather than all data available.
A time series dataset is different. Time series adds an explicit order dependence between
observations: a time dimension. This additional dimension is both a constraint and a structure that provides a source of additional information.
A time series is a sequence of observations taken sequentially in time.
Time #1, observation
Time #2, observation
Time #3, observation
Time Series Nomenclature
it is essential to quickly establish the standard terms used when describing
time series data. The current time is defined as t, observation at the present time is defined as obs(t).
We are often interested in the observations made at prior times, called lag times or lags.
Times in the past are negative relative to the current time. For example, the previous time is t-1 and the time before that is t-2. The observations at these times are obs(t-1) and obs(t-2) respectively.
- t-n: A prior or lag time (e.g. t-1 for the previous time).
- t: A current time and point of reference.
- t+n: A future or forecast time (e.g. t+1 for the next time).
Time Series Analysis vs. Time Series Forecasting
We have different goals depending on whether we are interested in understanding a dataset or making predictions. Understanding a dataset, called time series analysis, can help to make better predictions, but is not required and can result in a large technical investment in time and expertise not directly aligned with the desired outcome, which is forecasting the future.
Time Series Analysis
When using classical statistics, the primary concern is the analysis of time series. Time series analysis involves developing models that best capture or describe an observed time series to understand the underlying causes. This eld of study seeks the why behind a time series dataset. This often involves making assumptions about the form of the data and decomposing the time series into constitution components. The quality of a descriptive model is determined by how well it describes all available data and the interpretation it provides to better inform the problem domain.
The primary objective of time series analysis is to develop mathematical models that provide plausible descriptions from sample data.
Time Series Forecasting
Making predictions about the future is called extrapolation in the classical statistical handling of time series data. More modern fields focus on the topic and refer to it as time series forecasting.
Forecasting involves taking models t on historical data and using them to predict future observations. Descriptive models can borrow from the future (i.e. to smooth or remove noise), they only seek to best describe the data. An important distinction in forecasting is that the future is completely unavailable and must only be estimated from what has already happened.
The skill of a time series forecasting model is determined by its performance at predicting the future. This is often at the expense of being able to explain why a specific prediction was made, confidence intervals and even better understanding the underlying causes behind the problem.
Components of Time Series
Time series analysis provides a body of techniques to better understand a dataset. Perhaps the most useful of these is the decomposition of a time series into 4 constituent parts:
- Level. The baseline value for the series if it were a straight line.
- Trend. The optional and often linear increasing or decreasing behavior of the series over time.
- Seasonality. The optional repeating patterns or cycles of behavior over time.
- Noise. The optional variability in the observations that cannot be explained by the model.
All time series have a level, most have noise, and the trend and seasonality are optional.
Concerns of forecasting time series
When forecasting, it is important to understand your goal. Use the Socratic method and ask lots of questions to help zoom in on the specifics of your predictive modeling problem. For example:
How much data do you have available and are you able to gather it all together?
- Like in all Machine learning models, more data is often more helpful, offering greater opportunity for exploratory data analysis, model testing, and tuning, and model fidelity.
- What is the time horizon of predictions that is required? Short, medium or long term? Shorter time horizons are often easier to predict with higher confidence.
- Can forecasts be updated frequently over time or must they be made once and remain static? Updating forecasts as new information becomes available often results in more accurate predictions.
- At what temporal frequency are forecasts required? Often forecasts can be made at a lower or higher frequency, allowing you to harness down-sampling, and up-sampling of data, which in turn can offer benefits while modeling.
Time series data often requires cleaning, scaling, and even transformation. For example:
- Frequency. Perhaps data is provided at a frequency that is too high to model or is unevenly spaced through time requiring resampling for use in some models.
- Outliers. Perhaps there are corrupt or extreme outlier values that need to be identified and handled.
- Missing. Perhaps there are gaps or missing data that need to be interpolated or imputed.
Often time series problems are real-time, continually providing new opportunities for prediction. This adds an honesty to time series forecasting that quickly eshes out bad assumptions, errors in modeling and all the other ways that we may be able to fool ourselves.
Examples of Time Series Forecasting
- Forecasting the commodity, like corn, wheat etc. yield in tons by the state each year.
- Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or not.
- Forecasting the closing price of a stock each day.
- Forecasting the birth rate at all hospitals in a city each year.
- Forecasting product sales in units sold each day for a store.
- Forecasting the number of passengers through a train station each day.
- Forecasting unemployment for a state each quarter.
- Forecasting utilization demand on a server each hour.
- Forecasting the size of the rabbit population in a state each breeding season.
- Forecasting the average price of gasoline in a city each day.