Time Series Forecasting using SVR
Now, lets make a model on Time Series Forecasting with Support Vector Regression. For this we will be using using the Electric_Production dataset.
Step 1: Importing the necessary libraries
In this step, we will import the libraries required for our analysis. NumPy for numerical operations, Matplotlib for visualization, Pandas for data manipulation, StandardScaler for feature scaling, and SVR for Support Vector Regression.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
Step 2: Loading the dataset
In this part, we will load the Electric_Production dataset into a DataFrame, convert the date column to datetime format, and set it as the index of the DataFrame for time-series analysis.
# Load the dataset
df = pd.read_csv('Electric_Production.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df.set_index('DATE', inplace=True)
# Visualize the data
plt.figure(figsize=(15, 8))
plt.plot(df.index, df['IPG2211A2N'], label='Electric Production')
plt.xlabel('Date')
plt.ylabel('Electric Production')
plt.title('Electric Production Data')
plt.legend()
plt.show()
Output:
Step 3: Data Preprocessing
Create training and testing datasets
In this step, the dataset is split into training set and testing set. The training set contains data from the beginning of the dataset up to the specified date, and the testing set contains data from that date onward. train_start_dt and test_start_dt are the variables used to mention the starting dates for the training and testing datasets, respectively. Starting date of train dataset is 1 Jan 1985, and for test dataset it is 1 Jan 2005.
# Create training and testing datasets
train_start_dt = '1985-01-01'
test_start_dt = '2005-01-01'
train = df.loc[train_start_dt:test_start_dt]
test = df.loc[test_start_dt:]
train.head()
Output:
IPG2211A2N DATE 1985-01-01 72.5052 1985-02-01 70.6720 1985-03-01 62.4502 1985-04-01 57.4714 1985-05-01 55.3151
Data Scaling
The data is scaled using MinMaxScaler to ensure that all features are within the same range, typically between 0 and 1.
# Scale the data
scaler = MinMaxScaler()
train_scaled = scaler.fit_transform(train)
test_scaled = scaler.transform(test)
Prepare the data for training
Now, the training and testing datasets are transformed into sequences of data points with a defined number of time steps. We did so that we can prepare the data for input into the SVR model. The TIME_STEP is the number of time steps to use for creating sequences of input features and target variables.
It is used to prepare the training and testing datasets by creating sequences of input features and target variables based on the specified time steps, which is essential for training sequence prediction models like recurrent neural networks (RNNs) or support vector regression (SVR).
# Prepare the data for training
def create_dataset(X, y, time_steps=1):
Xs, ys = [], []
for i in range(len(X) - time_steps):
v = X[i:i + time_steps]
Xs.append(v)
ys.append(y[i + time_steps])
return np.array(Xs), np.array(ys)
TIME_STEPS = 5
X_train, y_train = create_dataset(train_scaled, train_scaled, TIME_STEPS)
X_test, y_test = create_dataset(test_scaled, test_scaled, TIME_STEPS)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
Output:
(236, 5, 1) (236, 1) (152, 5, 1) (152, 1)
Step 4: Define the SVR model
An SVR model is initialized with specified hyperparameters such as kernel type, gamma, C, and epsilon.
- kernel=’rbf’ specifies the kernel function to be used. The radial basis function (RBF) kernel is commonly used in SVR.
- gamma=0.5 is the parameter for the RBF kernel. It controls the influence of each training example. A low value means ‘far’ and a high value means ‘close’.
- C=10 is the regularization parameter. It trades off correct classification of training examples against maximization of the decision function’s margin.
- epsilon=0.05 is the epsilon parameter in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function.
The SVR model is trained on the training dataset using the fit method. The SVR model learns from the training data to understand the patterns and relationships between input features and target variables. This process allows the model to make predictions based on new, unseen data. So, fitting the model is like teaching it how to make predictions by showing it examples from the training dataset.
# SVR model
model = SVR(kernel='rbf', gamma=0.5, C=10, epsilon=0.05)
# Fit the model
model.fit(X_train.reshape(X_train.shape[0], -1), y_train)
Step 5: Make predictions
The trained model is used to make predictions on both the training and testing datasets. Then, the predicted values are inverse scaled to obtain the original scale of the electric production data. The inverse_transform() function is used to inverse transform the scaled predictions and target variables back to their original scale.
# Make predictions
train_pred = model.predict(X_train.reshape(X_train.shape[0], -1))
test_pred = model.predict(X_test.reshape(X_test.shape[0], -1))
# Inverse scaling
train_pred_inv = scaler.inverse_transform(train_pred.reshape(-1, 1))
y_train_inv = scaler.inverse_transform(y_train.reshape(-1, 1))
test_pred_inv = scaler.inverse_transform(test_pred.reshape(-1, 1))
y_test_inv = scaler.inverse_transform(y_test.reshape(-1, 1))
Step 6: Model Evaluation
The Mean Squared Error (MSE) is calculated to evaluate the performance of the model on both the training and testing datasets. mean_squared_error() function calculates the MSE error.
# Evaluate the model
mse_train = mean_squared_error(y_train_inv, train_pred_inv)
mse_test = mean_squared_error(y_test_inv, test_pred_inv)
print("Mean Squared Error on Training Data:", mse_train)
print("Mean Squared Error on Testing Data:", mse_test)
Output:
Mean Squared Error on Training Data: 7.289992919226583 Mean Squared Error on Testing Data: 21.345321073477777
In simple terms, the Mean Squared Error (MSE) tells us how much our predictions differ from the actual values. It’s like calculating the average of the squared differences between what we predicted and what actually happened.
Now in above code:
- The MSE on training data is 7.29, it means, on average, the squared difference between our predictions and the actual values in the training data is 7.29.
- Thee MSE on testing data is 21.35, it means, on average, the squared difference between our predictions and the actual values in the testing data is 21.35.
Lower MSE values indicate that our model’s predictions are closer to the actual values, which is what we aim for. So, ideally, we want to minimize the MSE as much as possible.
Step 7: Visualizing the prediction for train and test dataset
The original and predicted values for the training dataset are plotted to visualize how well the model fits the training data. This is the plot of the original and predicted electric production values for the training dataset over time. The x-axis represents the date, and the y-axis represents the electric production. The green line represents the original electric production values, while the red line represents the predicted values.
# Plotting the results for train dataset
plt.figure(figsize=(15, 8))
plt.plot(df.index[TIME_STEPS:TIME_STEPS+len(train_pred_inv)], y_train_inv, label='Original', color='darkgreen')
plt.plot(df.index[TIME_STEPS:TIME_STEPS+len(train_pred_inv)], train_pred_inv, label='Predicted', color='red')
plt.xlabel('Date')
plt.ylabel('Electric Production')
plt.title('Electric Production Forecasting with SVR - Training Dataset')
plt.legend()
plt.show()
Output:
Similarly, the original and predicted values for the testing dataset are plotted to assess the model’s performance on unseen data. We can see, how well it is performing.
# Plotting the results for test dataset
plt.figure(figsize=(15, 8))
plt.plot(df.index[TIME_STEPS+len(train_pred_inv)+TIME_STEPS-1:], y_test_inv, label='Original', color='darkgreen')
plt.plot(df.index[TIME_STEPS+len(train_pred_inv)+TIME_STEPS-1:], test_pred_inv, label='Predicted', color='red')
plt.xlabel('Date')
plt.ylabel('Electric Production')
plt.title('Electric Production Forecasting with SVR - Test Dataset')
plt.legend()
plt.show()
Output:
Step 8: Plotting Forecast Graph
For plotting the forecast graph we have followed the steps mentioned below:
- Set the number of time steps that we want to forecast
- Create a forecasting loop, the next prediction is based on the last sequence using SVR and predicted valued is appended to future_forecast then, the last sequence is updated by shifting it by one position and replacing the last element with the predicted value.
- Perform inverse scaling
- Generate future timestamps then, plot the graph.
# Number of future time steps to forecast
future_steps = 10 # Adjust this as needed
# Last `TIME_STEPS` values from the test set to start forecasting
last_sequence = X_test[-1]
# Forecast future values
future_forecast = []
for _ in range(future_steps):
# Predict next value based on the last sequence
next_pred = model.predict(last_sequence.reshape(1, -1))
future_forecast.append(next_pred[0])
# Update the last sequence by removing the first element and adding the predicted value
last_sequence = np.roll(last_sequence, -1)
last_sequence[-1] = next_pred
# Inverse scaling for future forecast
future_forecast_inv = scaler.inverse_transform(np.array(future_forecast).reshape(-1, 1))
# Generate future timestamps
last_date = df.index[-1]
future_dates = pd.date_range(start=last_date, periods=future_steps+1, freq='M')[1:]
# Plotting the future forecast
plt.figure(figsize=(15, 8))
plt.plot(df.index, df['IPG2211A2N'], label='Historical Data', color='darkgreen')
plt.plot(future_dates, future_forecast_inv, label='Future Forecast', color='blue')
plt.xlabel('Date')
plt.ylabel('Electric Production')
plt.title('Electric Production Future Forecast with SVR')
plt.legend()
plt.show()
Output:
The output is a plot showing the historical electric production data overlaid with the forecasted future values. The x-axis represents the date, and the y-axis represents the electric production. The historical data is displayed as a solid line in dark green, while the future forecast is represented by points connected with a line in blue.
Time Series Forecasting with Support Vector Regression
Time series forecasting is a critical aspect of data analysis, with applications spanning from financial markets to weather predictions. In recent years, Support Vector Regression (SVR) has emerged as a powerful tool for time series forecasting due to its ability to handle nonlinear relationships and high-dimensional data. In this project, we’ll delve into time series forecasting using SVR, focusing specifically on forecasting electric production of next 10 months.