Why sktime is a Game-Changer for Time-Series Modeling
Time-series data—whether it’s stock prices, sensor readings, or website traffic—requires specialized tools that understand temporal dependencies. sktime fills that gap by offering a unified interface for classical and modern algorithms, seamless pipelines, and native support for forecasting, classification, and regression tasks.
In this guide, we’ll walk through the entire workflow: data preparation, model selection, hyper‑parameter tuning, and evaluation. By the end, you’ll have a production‑ready pipeline you can adapt to any domain.
Setting Up Your Environment
Before diving into code, make sure you have the following packages installed:
- Python 3.9 or newer
sktime(latest stable release)pandas,numpy,scikit-learnmatplotliborseabornfor visualisation
Install everything with a single command:
pip install sktime pandas numpy scikit-learn matplotlib seaborn
Once installed, import the core modules:
import pandas as pd
import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.compose import ForecastingPipeline
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.arima import AutoARIMA
from sktime.metrics.forecasting import mean_absolute_error
Loading and Preparing Data
sktime ships with several benchmark datasets. For illustration, we’ll use the classic Airline dataset, which records monthly passenger numbers from 1949 to 1960.
# Load data
y = load_airline()
# Visualise the series
y.plot(figsize=(10, 4), title='Monthly Airline Passengers')
The series is already a pd.Series with a PeriodIndex, which is ideal for time‑aware operations. If your data comes from a CSV, convert the datetime column to a period index:
df = pd.read_csv('sales.csv', parse_dates=['date'])
df['date'] = df['date'].dt.to_period('M')
series = df.set_index('date')['sales']
Choosing the Right Model
sktime provides three main families of models:
- Naïve baselines (e.g., last value, seasonal naive)
- Statistical models such as ARIMA, Exponential Smoothing
- Machine‑learning models wrapped for time series (e.g., RandomForestRegressor, XGBoost)
Start with a baseline to set a performance floor:
# Split data
y_train, y_test = temporal_train_test_split(y, test_size=12)
# Naïve forecaster (last observed value)
naive = NaiveForecaster(strategy='last')
naive.fit(y_train)
fh = np.arange(1, len(y_test) + 1) # forecasting horizon
y_pred = naive.predict(fh)
mae = mean_absolute_error(y_test, y_pred)
print(f'Naïve MAE: {mae:.2f}')
With a baseline in place, you can experiment with more sophisticated models. AutoARIMA automatically selects the best ARIMA order based on information criteria:
arima = AutoARIMA(stepwise=True, suppress_warnings=True)
arima.fit(y_train)
y_pred_arima = arima.predict(fh)
mae_arima = mean_absolute_error(y_test, y_pred_arima)
print(f'AutoARIMA MAE: {mae_arima:.2f}')
For machine‑learning approaches, wrap a regressor in a ForecastingPipeline that creates lag features automatically:
from sktime.forecasting.compose import make_reduction
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=200, random_state=42)
rf_pipe = make_reduction(regressor, strategy='recursive', window_length=12)
rf_pipe.fit(y_train)
y_pred_rf = rf_pipe.predict(fh)
mae_rf = mean_absolute_error(y_test, y_pred_rf)
print(f'Random Forest MAE: {mae_rf:.2f}')
Compare MAE values to decide which model meets your accuracy requirements.
Hyper‑Parameter Tuning with sktime’s GridSearch
Fine‑tuning can close the gap between a good model and a great one. sktime integrates with scikit‑learn‘s GridSearchCV through its own ForecastingGridSearchCV class.
from sktime.forecasting.model_selection import ForecastingGridSearchCV
from sklearn.ensemble import GradientBoostingRegressor
param_grid = {
'regressor__n_estimators': [100, 200],
'regressor__learning_rate': [0.01, 0.1],
'window_length': [6, 12]
}
gbr = GradientBoostingRegressor(random_state=42)
pipe = make_reduction(gbr, strategy='recursive')
search = ForecastingGridSearchCV(
forecaster=pipe,
cv=5,
param_grid=param_grid,
scoring='neg_mean_absolute_error'
)
search.fit(y_train)
print('Best params:', search.best_params_)
y_pred_best = search.predict(fh)
print('Tuned MAE:', mean_absolute_error(y_test, y_pred_best))
The search evaluates each combination across a time‑series cross‑validation split, ensuring that temporal leakage never occurs.
Evaluating and Visualising Results
Beyond MAE, consider additional metrics such as SMAPE or MAPE for business‑oriented interpretation. sktime ships with a variety of forecasting metrics that accept a forecasting horizon object.
from sktime.metrics.forecasting import smape
print('SMAPE:', smape(y_test, y_pred_best))
Visual comparison helps stakeholders quickly grasp model performance:
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
y_test.plot(label='Actual')
y_pred_best.plot(label='Forecast')
plt.title('Actual vs Forecast')
plt.legend()
plt.show()
Notice how the tuned Gradient Boosting model tracks the seasonal peaks more closely than the baseline.
Deploying a sktime Forecasting Pipeline
When you’re ready for production, serialize the fitted pipeline using joblib or pickle. Because sktime pipelines encapsulate feature engineering, you only need to load the object and call predict on new data.
import joblib
joblib.dump(search.best_estimator_, 'sktime_forecast.pkl')
# In a production script
model = joblib.load('sktime_forecast.pkl')
future_fh = np.arange(1, 13) # next 12 periods
future_pred = model.predict(future_fh)
print(future_pred)
Wrap this logic in a Flask or FastAPI endpoint for real‑time forecasting, or schedule a batch job with Airflow for daily updates.
Key Takeaways
- sktime unifies classical and machine‑learning time‑series models under a single API.
- Start with a naive baseline to establish a performance floor.
- Use
AutoARIMAfor quick statistical models andmake_reductionfor ML regressors. - Leverage
ForecastingGridSearchCVfor robust hyper‑parameter tuning without leakage. - Serialize the full pipeline for seamless deployment.
With these steps, you can confidently build, tune, and ship time‑series models that scale across industries.
Next Steps & Call to Action
Ready to level up your forecasting projects? Download the full Jupyter notebook from our GitHub repository, experiment with your own datasets, and share your results in the comments. If you need a custom solution or consulting, get in touch—our data science team is happy to help you harness the power of sktime.