Deep learning examples¶
This notebooks contains examples with neural network models.
Table of Contents
[1]:
import torch
import random
import pandas as pd
import numpy as np
from etna.datasets.tsdataset import TSDataset
from etna.pipeline import Pipeline
from etna.transforms import DateFlagsTransform
from etna.transforms import LagTransform
from etna.transforms import LinearTrendTransform
from etna.metrics import SMAPE, MAPE, MAE
from etna.analysis import plot_backtest
from etna.models import SeasonalMovingAverageModel
import warnings
def set_seed(seed: int = 42):
"""Set random seed for reproducibility."""
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
warnings.filterwarnings("ignore")
1. Creating TSDataset¶
We are going to take some toy dataset. Let’s load and look at it.
[2]:
original_df = pd.read_csv("data/example_dataset.csv")
original_df.head()
[2]:
timestamp | segment | target | |
---|---|---|---|
0 | 2019-01-01 | segment_a | 170 |
1 | 2019-01-02 | segment_a | 243 |
2 | 2019-01-03 | segment_a | 267 |
3 | 2019-01-04 | segment_a | 287 |
4 | 2019-01-05 | segment_a | 279 |
Our library works with the special data structure TSDataset. Let’s create it as it was done in “Get started” notebook.
[3]:
df = TSDataset.to_dataset(original_df)
ts = TSDataset(df, freq="D")
ts.head(5)
[3]:
segment | segment_a | segment_b | segment_c | segment_d |
---|---|---|---|---|
feature | target | target | target | target |
timestamp | ||||
2019-01-01 | 170 | 102 | 92 | 238 |
2019-01-02 | 243 | 123 | 107 | 358 |
2019-01-03 | 267 | 130 | 103 | 366 |
2019-01-04 | 287 | 138 | 103 | 385 |
2019-01-05 | 279 | 137 | 104 | 384 |
2. Architecture¶
Our library uses PyTorch Forecasting to work with time series neural networks. There are two ways to use pytorch-forecasting
models: default one and via using PytorchForecastingDatasetBuilder
for using extra features.
To include extra features we use PytorchForecastingDatasetBuilder
class.
Let’s look at it closer.
[4]:
from etna.models.nn.utils import PytorchForecastingDatasetBuilder
[5]:
?PytorchForecastingDatasetBuilder
Init signature:
PytorchForecastingDatasetBuilder(
max_encoder_length: int = 30,
min_encoder_length: Optional[int] = None,
min_prediction_idx: Optional[int] = None,
min_prediction_length: Optional[int] = None,
max_prediction_length: int = 1,
static_categoricals: Optional[List[str]] = None,
static_reals: Optional[List[str]] = None,
time_varying_known_categoricals: Optional[List[str]] = None,
time_varying_known_reals: Optional[List[str]] = None,
time_varying_unknown_categoricals: Optional[List[str]] = None,
time_varying_unknown_reals: Optional[List[str]] = None,
variable_groups: Optional[Dict[str, List[int]]] = None,
constant_fill_strategy: Optional[Dict[str, Union[str, float, int, bool]]] = None,
allow_missing_timesteps: bool = True,
lags: Optional[Dict[str, List[int]]] = None,
add_relative_time_idx: bool = True,
add_target_scales: bool = True,
add_encoder_length: Union[bool, str] = True,
target_normalizer: Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer, str, List[Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer]], Tuple[Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer]]] = 'auto',
categorical_encoders: Optional[Dict[str, pytorch_forecasting.data.encoders.NaNLabelEncoder]] = None,
scalers: Optional[Dict[str, Union[sklearn.preprocessing._data.StandardScaler, sklearn.preprocessing._data.RobustScaler, pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.EncoderNormalizer]]] = None,
)
Docstring: Builder for PytorchForecasting dataset.
Init docstring:
Init dataset builder.
Parameters here is used for initialization of :py:class:`pytorch_forecasting.data.timeseries.TimeSeriesDataSet` object.
File: ~/Projects/etna/etna/models/nn/utils.py
Type: type
Subclasses:
We can see a pretty scary signature, but don’t panic, we will look at the most important parameters.
time_varying_known_reals
— known real values that change across the time (real regressors), now it it necessary to add “time_idx” variable to the list;time_varying_unknown_reals
— our real value target, set it to["target"]
;max_prediction_length
— our horizon for forecasting;max_encoder_length
— length of past context to use;static_categoricals
— static categorical values, for example, if we use multiple segments it can be some its characteristics including identifier: “segment”;time_varying_known_categoricals
— known categorical values that change across the time (categorical regressors);target_normalizer
— class for normalization targets across different segments.
Our library currently supports these models: * DeepAR, * TFT.
3. Testing models¶
In this section we will test our models on example.
3.1 DeepAR¶
Before training let’s fix seeds for reproducibility.
[4]:
set_seed()
Default way¶
[7]:
from etna.models.nn import DeepARModel
HORIZON = 7
model_deepar = DeepARModel(
encoder_length=HORIZON,
decoder_length=HORIZON,
trainer_params=dict(max_epochs=150, gpus=0, gradient_clip_val=0.1),
lr=0.01,
train_batch_size=64,
)
metrics = [SMAPE(), MAPE(), MAE()]
pipeline_deepar = Pipeline(model=model_deepar, horizon=HORIZON)
[8]:
metrics_deepar, forecast_deepar, fold_info_deepar = pipeline_deepar.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------------------------------
0 | loss | NormalDistributionLoss | 0
1 | logging_metrics | ModuleList | 0
2 | embeddings | MultiEmbedding | 0
3 | rnn | LSTM | 1.6 K
4 | distribution_projector | Linear | 22
------------------------------------------------------------------
1.6 K Trainable params
0 Non-trainable params
1.6 K Total params
0.006 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=150` reached.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.1min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------------------------------
0 | loss | NormalDistributionLoss | 0
1 | logging_metrics | ModuleList | 0
2 | embeddings | MultiEmbedding | 0
3 | rnn | LSTM | 1.6 K
4 | distribution_projector | Linear | 22
------------------------------------------------------------------
1.6 K Trainable params
0 Non-trainable params
1.6 K Total params
0.006 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=150` reached.
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 2.3min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------------------------------
0 | loss | NormalDistributionLoss | 0
1 | logging_metrics | ModuleList | 0
2 | embeddings | MultiEmbedding | 0
3 | rnn | LSTM | 1.6 K
4 | distribution_projector | Linear | 22
------------------------------------------------------------------
1.6 K Trainable params
0 Non-trainable params
1.6 K Total params
0.006 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=150` reached.
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 3.6min remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 3.6min finished
[9]:
metrics_deepar
[9]:
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
1 | segment_a | 8.158810 | 7.816301 | 42.378252 | 0 |
1 | segment_a | 3.769166 | 3.832835 | 18.967320 | 1 |
1 | segment_a | 6.454904 | 6.449939 | 33.427503 | 2 |
0 | segment_b | 8.017286 | 7.636825 | 20.131232 | 0 |
0 | segment_b | 4.796795 | 4.846337 | 11.345973 | 1 |
0 | segment_b | 4.915929 | 5.121204 | 11.100294 | 2 |
2 | segment_c | 5.799369 | 5.693384 | 9.809566 | 0 |
2 | segment_c | 5.888097 | 5.771138 | 10.324450 | 1 |
2 | segment_c | 6.347281 | 6.120396 | 11.567204 | 2 |
3 | segment_d | 8.475800 | 8.198147 | 72.796840 | 0 |
3 | segment_d | 3.543986 | 3.644503 | 27.933236 | 1 |
3 | segment_d | 6.499065 | 6.505668 | 52.035304 | 2 |
To summarize it we will take mean value of SMAPE metric because it is scale tolerant.
[10]:
score = metrics_deepar["SMAPE"].mean()
print(f"Average SMAPE for DeepAR: {score:.3f}")
Average SMAPE for DeepAR: 6.056
Dataset Builder: creating dataset for DeepAR with etxtra features.¶
[11]:
from pytorch_forecasting.data import GroupNormalizer
set_seed()
HORIZON = 7
transform_date = DateFlagsTransform(day_number_in_week=True, day_number_in_month=False, out_column="dateflag")
num_lags = 10
transform_lag = LagTransform(
in_column="target",
lags=[HORIZON + i for i in range(num_lags)],
out_column="target_lag",
)
lag_columns = [f"target_lag_{HORIZON+i}" for i in range(num_lags)]
dataset_builder_deepar = PytorchForecastingDatasetBuilder(
max_encoder_length=HORIZON,
max_prediction_length=HORIZON,
time_varying_known_reals=["time_idx"] + lag_columns,
time_varying_unknown_reals=["target"],
time_varying_known_categoricals=["dateflag_day_number_in_week"],
target_normalizer=GroupNormalizer(groups=["segment"]),
)
Now we are going to start backtest.
[12]:
from etna.models.nn import DeepARModel
model_deepar = DeepARModel(
dataset_builder=dataset_builder_deepar,
trainer_params=dict(max_epochs=150, gpus=0, gradient_clip_val=0.1),
lr=0.01,
train_batch_size=64,
)
metrics = [SMAPE(), MAPE(), MAE()]
pipeline_deepar = Pipeline(
model=model_deepar,
horizon=HORIZON,
transforms=[transform_lag, transform_date],
)
[13]:
metrics_deepar, forecast_deepar, fold_info_deepar = pipeline_deepar.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------------------------------
0 | loss | NormalDistributionLoss | 0
1 | logging_metrics | ModuleList | 0
2 | embeddings | MultiEmbedding | 35
3 | rnn | LSTM | 2.2 K
4 | distribution_projector | Linear | 22
------------------------------------------------------------------
2.3 K Trainable params
0 Non-trainable params
2.3 K Total params
0.009 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=150` reached.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 1.1min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------------------------------
0 | loss | NormalDistributionLoss | 0
1 | logging_metrics | ModuleList | 0
2 | embeddings | MultiEmbedding | 35
3 | rnn | LSTM | 2.2 K
4 | distribution_projector | Linear | 22
------------------------------------------------------------------
2.3 K Trainable params
0 Non-trainable params
2.3 K Total params
0.009 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=150` reached.
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 2.7min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------------------------------
0 | loss | NormalDistributionLoss | 0
1 | logging_metrics | ModuleList | 0
2 | embeddings | MultiEmbedding | 35
3 | rnn | LSTM | 2.2 K
4 | distribution_projector | Linear | 22
------------------------------------------------------------------
2.3 K Trainable params
0 Non-trainable params
2.3 K Total params
0.009 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=150` reached.
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.3min remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 4.3min finished
Let’s compare results across different segments.
[14]:
metrics_deepar
[14]:
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
1 | segment_a | 6.049829 | 5.810786 | 31.258065 | 0 |
1 | segment_a | 5.406119 | 5.255942 | 27.799116 | 1 |
1 | segment_a | 5.204423 | 5.032808 | 27.497580 | 2 |
0 | segment_b | 7.012970 | 6.749309 | 17.479985 | 0 |
0 | segment_b | 5.061756 | 5.103116 | 12.263271 | 1 |
0 | segment_b | 2.545311 | 2.558906 | 6.099712 | 2 |
2 | segment_c | 4.462100 | 4.357769 | 7.482326 | 0 |
2 | segment_c | 5.752517 | 5.566167 | 10.335778 | 1 |
2 | segment_c | 5.564647 | 5.389165 | 10.039671 | 2 |
3 | segment_d | 9.123145 | 8.612458 | 75.362462 | 0 |
3 | segment_d | 4.894100 | 4.984388 | 39.399170 | 1 |
3 | segment_d | 4.301217 | 4.193129 | 36.610482 | 2 |
To summarize it we will take mean value of SMAPE metric because it is scale tolerant.
[15]:
score = metrics_deepar["SMAPE"].mean()
print(f"Average SMAPE for DeepAR: {score:.3f}")
Average SMAPE for DeepAR: 5.448
Visualize results.
[16]:
plot_backtest(forecast_deepar, ts, history_len=20)
3.2 TFT¶
Let’s move to the next model.
[17]:
set_seed()
Default way¶
[18]:
from etna.models.nn import TFTModel
model_tft = TFTModel(
encoder_length=HORIZON,
decoder_length=HORIZON,
trainer_params=dict(max_epochs=200, gpus=0, gradient_clip_val=0.1),
lr=0.01,
train_batch_size=64,
)
pipeline_tft = Pipeline(
model=model_tft,
horizon=HORIZON,
)
[19]:
metrics_tft, forecast_tft, fold_info_tft = pipeline_tft.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
----------------------------------------------------------------------------------------
0 | loss | QuantileLoss | 0
1 | logging_metrics | ModuleList | 0
2 | input_embeddings | MultiEmbedding | 0
3 | prescalers | ModuleDict | 96
4 | static_variable_selection | VariableSelectionNetwork | 1.7 K
5 | encoder_variable_selection | VariableSelectionNetwork | 1.8 K
6 | decoder_variable_selection | VariableSelectionNetwork | 1.2 K
7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K
8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K
9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K
10 | static_context_enrichment | GatedResidualNetwork | 1.1 K
11 | lstm_encoder | LSTM | 2.2 K
12 | lstm_decoder | LSTM | 2.2 K
13 | post_lstm_gate_encoder | GatedLinearUnit | 544
14 | post_lstm_add_norm_encoder | AddNorm | 32
15 | static_enrichment | GatedResidualNetwork | 1.4 K
16 | multihead_attn | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm | GateAddNorm | 576
18 | pos_wise_ff | GatedResidualNetwork | 1.1 K
19 | pre_output_gate_norm | GateAddNorm | 576
20 | output_layer | Linear | 119
----------------------------------------------------------------------------------------
18.4 K Trainable params
0 Non-trainable params
18.4 K Total params
0.074 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=200` reached.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 2.3min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
----------------------------------------------------------------------------------------
0 | loss | QuantileLoss | 0
1 | logging_metrics | ModuleList | 0
2 | input_embeddings | MultiEmbedding | 0
3 | prescalers | ModuleDict | 96
4 | static_variable_selection | VariableSelectionNetwork | 1.7 K
5 | encoder_variable_selection | VariableSelectionNetwork | 1.8 K
6 | decoder_variable_selection | VariableSelectionNetwork | 1.2 K
7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K
8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K
9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K
10 | static_context_enrichment | GatedResidualNetwork | 1.1 K
11 | lstm_encoder | LSTM | 2.2 K
12 | lstm_decoder | LSTM | 2.2 K
13 | post_lstm_gate_encoder | GatedLinearUnit | 544
14 | post_lstm_add_norm_encoder | AddNorm | 32
15 | static_enrichment | GatedResidualNetwork | 1.4 K
16 | multihead_attn | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm | GateAddNorm | 576
18 | pos_wise_ff | GatedResidualNetwork | 1.1 K
19 | pre_output_gate_norm | GateAddNorm | 576
20 | output_layer | Linear | 119
----------------------------------------------------------------------------------------
18.4 K Trainable params
0 Non-trainable params
18.4 K Total params
0.074 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=200` reached.
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 4.7min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[ ]:
metrics_tft
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
1 | segment_a | 40.105186 | 33.025277 | 179.474618 | 0 |
1 | segment_a | 35.062597 | 29.437708 | 159.125462 | 1 |
1 | segment_a | 9.914503 | 9.435532 | 51.782240 | 2 |
2 | segment_b | 33.376703 | 41.339931 | 98.667385 | 0 |
2 | segment_b | 39.817141 | 50.942255 | 120.017343 | 1 |
2 | segment_b | 3.448081 | 3.374089 | 8.339597 | 2 |
0 | segment_c | 68.130415 | 104.336163 | 177.381147 | 0 |
0 | segment_c | 68.657526 | 105.778584 | 186.160152 | 1 |
0 | segment_c | 10.907864 | 10.253288 | 19.409007 | 2 |
3 | segment_d | 82.687114 | 58.152047 | 507.902906 | 0 |
3 | segment_d | 74.428508 | 53.941775 | 443.268306 | 1 |
3 | segment_d | 19.083581 | 17.603466 | 152.972674 | 2 |
[ ]:
score = metrics_tft["SMAPE"].mean()
print(f"Average SMAPE for TFT: {score:.3f}")
Average SMAPE for TFT: 40.468
Dataset Builder¶
[ ]:
set_seed()
transform_date = DateFlagsTransform(day_number_in_week=True, day_number_in_month=False, out_column="dateflag")
num_lags = 10
transform_lag = LagTransform(
in_column="target",
lags=[HORIZON + i for i in range(num_lags)],
out_column="target_lag",
)
lag_columns = [f"target_lag_{HORIZON+i}" for i in range(num_lags)]
dataset_builder_tft = PytorchForecastingDatasetBuilder(
max_encoder_length=HORIZON,
max_prediction_length=HORIZON,
time_varying_known_reals=["time_idx"],
time_varying_unknown_reals=["target"],
time_varying_known_categoricals=["dateflag_day_number_in_week"],
static_categoricals=["segment"],
target_normalizer=GroupNormalizer(groups=["segment"]),
)
[ ]:
model_tft = TFTModel(
dataset_builder=dataset_builder_tft,
trainer_params=dict(max_epochs=200, gpus=0, gradient_clip_val=0.1),
lr=0.01,
train_batch_size=64,
)
pipeline_tft = Pipeline(
model=model_tft,
horizon=HORIZON,
transforms=[transform_lag, transform_date],
)
[ ]:
metrics_tft, forecast_tft, fold_info_tft = pipeline_tft.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
----------------------------------------------------------------------------------------
0 | loss | QuantileLoss | 0
1 | logging_metrics | ModuleList | 0
2 | input_embeddings | MultiEmbedding | 47
3 | prescalers | ModuleDict | 96
4 | static_variable_selection | VariableSelectionNetwork | 1.8 K
5 | encoder_variable_selection | VariableSelectionNetwork | 1.9 K
6 | decoder_variable_selection | VariableSelectionNetwork | 1.3 K
7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K
8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K
9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K
10 | static_context_enrichment | GatedResidualNetwork | 1.1 K
11 | lstm_encoder | LSTM | 2.2 K
12 | lstm_decoder | LSTM | 2.2 K
13 | post_lstm_gate_encoder | GatedLinearUnit | 544
14 | post_lstm_add_norm_encoder | AddNorm | 32
15 | static_enrichment | GatedResidualNetwork | 1.4 K
16 | multihead_attn | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm | GateAddNorm | 576
18 | pos_wise_ff | GatedResidualNetwork | 1.1 K
19 | pre_output_gate_norm | GateAddNorm | 576
20 | output_layer | Linear | 119
----------------------------------------------------------------------------------------
18.9 K Trainable params
0 Non-trainable params
18.9 K Total params
0.075 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=200` reached.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 2.7min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
----------------------------------------------------------------------------------------
0 | loss | QuantileLoss | 0
1 | logging_metrics | ModuleList | 0
2 | input_embeddings | MultiEmbedding | 47
3 | prescalers | ModuleDict | 96
4 | static_variable_selection | VariableSelectionNetwork | 1.8 K
5 | encoder_variable_selection | VariableSelectionNetwork | 1.9 K
6 | decoder_variable_selection | VariableSelectionNetwork | 1.3 K
7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K
8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K
9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K
10 | static_context_enrichment | GatedResidualNetwork | 1.1 K
11 | lstm_encoder | LSTM | 2.2 K
12 | lstm_decoder | LSTM | 2.2 K
13 | post_lstm_gate_encoder | GatedLinearUnit | 544
14 | post_lstm_add_norm_encoder | AddNorm | 32
15 | static_enrichment | GatedResidualNetwork | 1.4 K
16 | multihead_attn | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm | GateAddNorm | 576
18 | pos_wise_ff | GatedResidualNetwork | 1.1 K
19 | pre_output_gate_norm | GateAddNorm | 576
20 | output_layer | Linear | 119
----------------------------------------------------------------------------------------
18.9 K Trainable params
0 Non-trainable params
18.9 K Total params
0.075 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=200` reached.
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 5.5min remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
----------------------------------------------------------------------------------------
0 | loss | QuantileLoss | 0
1 | logging_metrics | ModuleList | 0
2 | input_embeddings | MultiEmbedding | 47
3 | prescalers | ModuleDict | 96
4 | static_variable_selection | VariableSelectionNetwork | 1.8 K
5 | encoder_variable_selection | VariableSelectionNetwork | 1.9 K
6 | decoder_variable_selection | VariableSelectionNetwork | 1.3 K
7 | static_context_variable_selection | GatedResidualNetwork | 1.1 K
8 | static_context_initial_hidden_lstm | GatedResidualNetwork | 1.1 K
9 | static_context_initial_cell_lstm | GatedResidualNetwork | 1.1 K
10 | static_context_enrichment | GatedResidualNetwork | 1.1 K
11 | lstm_encoder | LSTM | 2.2 K
12 | lstm_decoder | LSTM | 2.2 K
13 | post_lstm_gate_encoder | GatedLinearUnit | 544
14 | post_lstm_add_norm_encoder | AddNorm | 32
15 | static_enrichment | GatedResidualNetwork | 1.4 K
16 | multihead_attn | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm | GateAddNorm | 576
18 | pos_wise_ff | GatedResidualNetwork | 1.1 K
19 | pre_output_gate_norm | GateAddNorm | 576
20 | output_layer | Linear | 119
----------------------------------------------------------------------------------------
18.9 K Trainable params
0 Non-trainable params
18.9 K Total params
0.075 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=200` reached.
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 8.2min remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 8.2min finished
[ ]:
metrics_tft
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
3 | segment_a | 4.909472 | 4.770604 | 26.157928 | 0 |
3 | segment_a | 10.161843 | 9.648282 | 51.786006 | 1 |
3 | segment_a | 7.775577 | 7.438396 | 41.532362 | 2 |
2 | segment_b | 6.654099 | 6.395358 | 16.623533 | 0 |
2 | segment_b | 7.753074 | 7.229456 | 19.099444 | 1 |
2 | segment_b | 3.309338 | 3.356554 | 8.014324 | 2 |
1 | segment_c | 5.110395 | 4.988732 | 8.812023 | 0 |
1 | segment_c | 5.470811 | 5.283700 | 9.889158 | 1 |
1 | segment_c | 3.135104 | 3.050495 | 5.756559 | 2 |
0 | segment_d | 7.421877 | 7.121675 | 64.413321 | 0 |
0 | segment_d | 6.992388 | 6.570917 | 54.510236 | 1 |
0 | segment_d | 2.502250 | 2.448497 | 19.110604 | 2 |
[ ]:
score = metrics_tft["SMAPE"].mean()
print(f"Average SMAPE for TFT: {score:.3f}")
Average SMAPE for TFT: 5.933
[ ]:
plot_backtest(forecast_tft, ts, history_len=20)
3.3 Simple model¶
For comparison let’s train a much more simpler model.
[ ]:
model_sma = SeasonalMovingAverageModel(window=5, seasonality=7)
linear_trend_transform = LinearTrendTransform(in_column="target")
pipeline_sma = Pipeline(model=model_sma, horizon=HORIZON, transforms=[linear_trend_transform])
[ ]:
metrics_sma, forecast_sma, fold_info_sma = pipeline_sma.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.2s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.2s finished
[ ]:
metrics_sma
segment | SMAPE | MAPE | MAE | fold_number | |
---|---|---|---|---|---|
3 | segment_a | 6.343943 | 6.124296 | 33.196532 | 0 |
3 | segment_a | 5.346946 | 5.192455 | 27.938101 | 1 |
3 | segment_a | 7.510347 | 7.189999 | 40.028565 | 2 |
2 | segment_b | 7.178822 | 6.920176 | 17.818102 | 0 |
2 | segment_b | 5.672504 | 5.554555 | 13.719200 | 1 |
2 | segment_b | 3.327846 | 3.359712 | 7.680919 | 2 |
1 | segment_c | 6.430429 | 6.200580 | 10.877718 | 0 |
1 | segment_c | 5.947090 | 5.727531 | 10.701336 | 1 |
1 | segment_c | 6.186545 | 5.943679 | 11.359563 | 2 |
0 | segment_d | 4.707899 | 4.644170 | 39.918646 | 0 |
0 | segment_d | 5.403426 | 5.600978 | 43.047332 | 1 |
0 | segment_d | 2.505279 | 2.543719 | 19.347565 | 2 |
[ ]:
score = metrics_sma["SMAPE"].mean()
print(f"Average SMAPE for Seasonal MA: {score:.3f}")
Average SMAPE for Seasonal MA: 5.547
[ ]:
plot_backtest(forecast_sma, ts, history_len=20)
As we can see, neural networks are a bit better in this particular case.
4. Etna native deep models¶
PytorchForecastingTransform
now.RNNModel¶
We’ll use RNN model based on LSTM cell
[ ]:
from etna.models.nn import RNNModel
from etna.transforms import StandardScalerTransform
[ ]:
model_rnn = RNNModel(
decoder_length=HORIZON,
encoder_length=2 * HORIZON,
input_size=11,
trainer_params=dict(max_epochs=5),
lr=1e-3,
)
pipeline_rnn = Pipeline(
model=model_rnn,
horizon=HORIZON,
transforms=[StandardScalerTransform(in_column="target"), transform_lag],
)
[ ]:
metrics_rnn, forecast_rnn, fold_info_rnn = pipeline_rnn.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
---------------------------------------
0 | loss | MSELoss | 0
1 | rnn | LSTM | 4.0 K
2 | projection | Linear | 17
---------------------------------------
4.0 K Trainable params
0 Non-trainable params
4.0 K Total params
0.016 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 3.3s remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
---------------------------------------
0 | loss | MSELoss | 0
1 | rnn | LSTM | 4.0 K
2 | projection | Linear | 17
---------------------------------------
4.0 K Trainable params
0 Non-trainable params
4.0 K Total params
0.016 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 6.5s remaining: 0.0s
GPU available: True (mps), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
---------------------------------------
0 | loss | MSELoss | 0
1 | rnn | LSTM | 4.0 K
2 | projection | Linear | 17
---------------------------------------
4.0 K Trainable params
0 Non-trainable params
4.0 K Total params
0.016 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 9.9s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 9.9s finished
[ ]:
score = metrics_rnn["SMAPE"].mean()
print(f"Average SMAPE for LSTM: {score:.3f}")
Average SMAPE for LSTM: 6.402
[ ]:
plot_backtest(forecast_rnn, ts, history_len=20)
Deep State Model¶
Deep State Model
works well with multiple similar time-series. It inffers shared patterns from them.
We have to determine the type of seasonality in data (based on data granularity), SeasonalitySSM
class is responsible for this. In this example, we have daily data, so we use day-of-week (7 seasons) and day-of-month (31 seasons) models. We also set the trend component using the LevelTrendSSM
class. Also in the model we use time-based features like day-of-week, day-of-month and time independent feature representing the segment of time series.
[16]:
from etna.models.nn import DeepStateModel
from etna.models.nn.deepstate import CompositeSSM, SeasonalitySSM, LevelTrendSSM
from etna.transforms import StandardScalerTransform, DateFlagsTransform, SegmentEncoderTransform
[17]:
HORIZON = 7
metrics = [SMAPE(), MAPE(), MAE()]
[18]:
transforms = [
SegmentEncoderTransform(),
StandardScalerTransform(in_column="target"),
DateFlagsTransform(
day_number_in_week=True,
day_number_in_month=True,
week_number_in_month=False,
week_number_in_year=False,
month_number_in_year=False,
year_number=False,
is_weekend=False,
out_column="df",
),
]
[19]:
monthly_smm = SeasonalitySSM(num_seasons=31, timestamp_transform=lambda x: x.day - 1)
weekly_smm = SeasonalitySSM(num_seasons=7, timestamp_transform=lambda x: x.weekday())
[20]:
model_dsm = DeepStateModel(
ssm=CompositeSSM(seasonal_ssms=[weekly_smm, monthly_smm], nonseasonal_ssm=LevelTrendSSM()),
decoder_length=HORIZON,
encoder_length=2 * HORIZON,
input_size=3,
trainer_params=dict(max_epochs=5),
lr=1e-3,
)
pipeline_dsm = Pipeline(
model=model_dsm,
horizon=HORIZON,
transforms=transforms,
)
[21]:
metrics_dsm, forecast_dsm, fold_info_dsm = pipeline_dsm.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------
0 | RNN | LSTM | 7.2 K
1 | projectors | ModuleDict | 5.0 K
------------------------------------------
12.2 K Trainable params
0 Non-trainable params
12.2 K Total params
0.049 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 12.3s remaining: 0.0s
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------
0 | RNN | LSTM | 7.2 K
1 | projectors | ModuleDict | 5.0 K
------------------------------------------
12.2 K Trainable params
0 Non-trainable params
12.2 K Total params
0.049 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 22.7s remaining: 0.0s
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
------------------------------------------
0 | RNN | LSTM | 7.2 K
1 | projectors | ModuleDict | 5.0 K
------------------------------------------
12.2 K Trainable params
0 Non-trainable params
12.2 K Total params
0.049 Total estimated model params size (MB)
`Trainer.fit` stopped: `max_epochs=5` reached.
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 33.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 33.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.2s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.3s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 2 out of 2 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=1)]: Done 3 out of 3 | elapsed: 0.1s finished
[22]:
score = metrics_dsm["SMAPE"].mean()
print(f"Average SMAPE for DeepStateModel: {score:.3f}")
Average SMAPE for DeepStateModel: 6.857
[23]:
plot_backtest(forecast_dsm, ts, history_len=20)