Step-by-step implementation of Multivariate Forecast
Importing required modules
At first, we will import all required Python modules like Pandas, NumPy, Matplotlib, TensorFlow and SKlearn etc.
Python3
import datetime import sklearn from sklearn.impute import SimpleImputer from sklearn.preprocessing import MinMaxScaler from sklearn.decomposition import KernelPCA import numpy as np import pandas as pd import math import tensorflow as tf import matplotlib.pyplot as plt tf.random.set_seed( 99 ) |
Dataset loading
Now, we will load a time-series dataset.
Python3
# Dataset loading url = 'https://raw.githubusercontent.com/SusmitSekharBhakta/Stock-market-price-prediction/main/final_data_adj.csv' dataFrame = pd.read_csv(url) print (dataFrame.head()) |
Output:
Date Open High Low Close \
0 2017-08-28 9907.150391 9925.750000 9882.000000 9912.799805
1 2017-08-29 9886.400391 9887.349609 9783.750000 9796.049805
2 2017-08-30 9859.500000 9909.450195 9850.799805 9884.400391
3 2017-08-31 9905.700195 9925.099609 9856.950195 9917.900391
4 2017-09-01 9937.650391 9983.450195 9909.849609 9974.400391
Adj Close Volume RSI MACD MACDsig MACDhist \
0 9912.799805 159600.0 55.406997 28.647258 28.317577 0.515867
1 9796.049805 173300.0 55.406997 28.647258 28.317577 0.515867
2 9884.400391 157800.0 55.406997 28.647258 28.317577 0.515867
3 9917.900391 327700.0 55.406997 28.647258 28.317577 0.515867
4 9974.400391 157800.0 55.406997 28.647258 28.317577 0.515867
SMA CCI Aroon Up Aroon Down Sadj
0 12759.905212 24.363507 0.0 0.0 NaN
1 12759.905212 24.363507 0.0 0.0 162.055635
2 12759.905212 24.363507 0.0 0.0 -22.453545
3 12759.905212 24.363507 0.0 0.0 -9.197608
4 12759.905212 24.363507 0.0 0.0 5259.919641
Data preprocessing
- The ‘Date’ column is then dropped, and any missing values in the DataFrame are imputed using the mean of each column.
- The data is then normalized using Min-Max scaling to ensure that all features fall within the range of 0 to 1. Specifically, the ‘Open’ and ‘Close’ columns are separately scaled using a target Min-Max scaler.
- The resulting scaled DataFrame, named
df_scaled
, is then cast to a float data type, ensuring numerical consistency for subsequent operations. - As we are performing multivariate forecast so we will predict two variables of the dataset which are ‘Open’ and ‘Close’. These two are most important components in stock prediction.
Python3
imputer = SimpleImputer(missing_values = np.nan) dataFrame.drop(columns = [ 'Date' ], inplace = True ) dataFrame = pd.DataFrame(imputer.fit_transform( dataFrame), columns = dataFrame.columns) dataFrame = dataFrame.reset_index(drop = True ) # Applying feature scaling scaler = MinMaxScaler(feature_range = ( 0 , 1 )) df_scaled = scaler.fit_transform(dataFrame.to_numpy()) df_scaled = pd.DataFrame(df_scaled, columns = list (dataFrame.columns)) target_scaler = MinMaxScaler(feature_range = ( 0 , 1 )) df_scaled[[ 'Open' , 'Close' ]] = target_scaler.fit_transform( dataFrame[[ 'Open' , 'Close' ]].to_numpy()) df_scaled = df_scaled.astype( float ) print (df_scaled.head()) |
Output:
Open High Low Close Adj Close Volume RSI \
0 0.199868 0.178737 0.216833 0.211888 0.211888 0.088128 0.573174
1 0.197958 0.175103 0.207848 0.201145 0.201145 0.095693 0.573174
2 0.195483 0.177194 0.213980 0.209275 0.209275 0.087134 0.573174
3 0.199734 0.178675 0.214542 0.212358 0.212358 0.180950 0.573174
4 0.202674 0.184197 0.219380 0.217557 0.217557 0.087134 0.573174
MACD MACDsig MACDhist SMA CCI Aroon Up Aroon Down \
0 0.753954 0.738291 0.504494 0.441765 0.541449 0.0 0.0
1 0.753954 0.738291 0.504494 0.441765 0.541449 0.0 0.0
2 0.753954 0.738291 0.504494 0.441765 0.541449 0.0 0.0
3 0.753954 0.738291 0.504494 0.441765 0.541449 0.0 0.0
4 0.753954 0.738291 0.504494 0.441765 0.541449 0.0 0.0
Sadj
0 0.172391
1 0.167173
2 0.166825
3 0.166850
4 0.176766
Dataset transformation
- A function named
singleStepSampler
is defined to prepare the dataset for single-step time-series forecasting. It takes two arguments: a dataframedf
and a window size. - Within this function,
xRes
andyRes
lists are initialized to store the input features and target values, respectively. - Two nested loops iterate over the dataframe rows to create sequences of input features (
xRes
) and corresponding target values (yRes
) based on the specified window size. - The input features are constructed as a sequence of windowed data points, where each data point is a list containing values from each column of the dataframe.
- The target values (
'Open'
and'Close'
columns) for each window are appended toyRes
. - Finally, the function returns numpy arrays
xRes
andyRes
.
Python3
# Single step dataset preparation def singleStepSampler(df, window): xRes = [] yRes = [] for i in range ( 0 , len (df) - window): res = [] for j in range ( 0 , window): r = [] for col in df.columns: r.append(df[col][i + j]) res.append(r) xRes.append(res) yRes.append(df[[ 'Open' , 'Close' ]].iloc[i + window].values) return np.array(xRes), np.array(yRes) (xVal, yVal) = singleStepSampler(df_scaled, 20 ) |
Data Splitting
- A constant
SPLIT
with a value of 0.85 is defined, specifying the proportion of data to be used for training. - The
singleStepSampler
function is applied to the scaled dataframedf_scaled
with a window size of 20, resulting in feature vectorsxVal
and target vectorsyVal
. - These feature and target vectors are split into training and testing sets according to the
SPLIT
ratio, with the training set containing 85% of the data and the testing set containing the remaining 15%.
Python3
# Dataset splitting SPLIT = 0.85 X_train = xVal[: int (SPLIT * len (xVal))] y_train = yVal[: int (SPLIT * len (yVal))] X_test = xVal[ int (SPLIT * len (xVal)):] y_test = yVal[ int (SPLIT * len (yVal)):] |
Defining the model
multivariate_gru = tf.keras.Sequential()
: Initializes a sequential model, which is a linear stack of layers.- In this step, a multivariate Gated Recurrent Unit neural network model is defined using TensorFlow’s Keras API. The model is initialized as a sequential model.
- It consists of a GRU layer with 200 units, taking input sequences with a shape defined by the number of features (columns) in the training data (X_train).
- A dropout layer is added to prevent overfitting and the output layer is a dense layer with 2 units, representing the predicted values for the two predictor variables (‘Open’ and ‘Close’).
- The activation function for this output layer is set to linear. The model is compiled using mean squared error as the loss function, and MAE and MSE as metrics for further evaluation.
- The Adam optimizer is used for training.
- The summary() method provides a summary of the model architecture, including the number of parameters and layer configurations.
Python3
multivariate_gru = tf.keras.Sequential() multivariate_gru.add( tf.keras.layers.GRU( 200 , input_shape = (X_train.shape[ 1 ], X_train.shape[ 2 ]))) multivariate_gru.add( tf.keras.layers.Dropout( 0.5 )) # Output layer for two predictor variables multivariate_gru.add( tf.keras.layers.Dense( 2 , activation = 'linear' )) # Compile the model multivariate_gru. compile (loss = 'MeanSquaredError' , metrics = [ 'MAE' , 'MSE' ], optimizer = tf.keras.optimizers.Adam()) multivariate_gru.summary() |
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru (GRU) (None, 200) 130200
dropout (Dropout) (None, 200) 0
dense (Dense) (None, 2) 402
=================================================================
Total params: 130602 (510.16 KB)
Trainable params: 130602 (510.16 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Model training
Now ,we will train our model on 20 epochs.
Python3
history = multivariate_gru.fit(X_train, y_train, epochs = 20 ) |
Output:
Epoch 1/20
48/48 [==============================] - 4s 5ms/step - loss: 0.0337 - MAE: 0.1330 - MSE: 0.0337
Epoch 2/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0086 - MAE: 0.0721 - MSE: 0.0086
Epoch 3/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0057 - MAE: 0.0580 - MSE: 0.0057
Epoch 4/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0045 - MAE: 0.0518 - MSE: 0.0045
Epoch 5/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0041 - MAE: 0.0489 - MSE: 0.0041
Epoch 6/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0038 - MAE: 0.0467 - MSE: 0.0038
Epoch 7/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0034 - MAE: 0.0444 - MSE: 0.0034
Epoch 8/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0032 - MAE: 0.0427 - MSE: 0.0032
Epoch 9/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0028 - MAE: 0.0398 - MSE: 0.0028
Epoch 10/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0027 - MAE: 0.0387 - MSE: 0.0027
Epoch 11/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0025 - MAE: 0.0377 - MSE: 0.0025
Epoch 12/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0024 - MAE: 0.0370 - MSE: 0.0024
Epoch 13/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0025 - MAE: 0.0372 - MSE: 0.0025
Epoch 14/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0026 - MAE: 0.0385 - MSE: 0.0026
Epoch 15/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0020 - MAE: 0.0334 - MSE: 0.0020
Epoch 16/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0021 - MAE: 0.0341 - MSE: 0.0021
Epoch 17/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0021 - MAE: 0.0339 - MSE: 0.0021
Epoch 18/20
48/48 [==============================] - 0s 6ms/step - loss: 0.0019 - MAE: 0.0323 - MSE: 0.0019
Epoch 19/20
48/48 [==============================] - 0s 6ms/step - loss: 0.0017 - MAE: 0.0305 - MSE: 0.0017
Epoch 20/20
48/48 [==============================] - 0s 6ms/step - loss: 0.0017 - MAE: 0.0309 - MSE: 0.0017
Forecasting of values
As the training is completed, now we will forecast the both predictor variables i.e. ‘Open’ and ‘Close’.
Python3
# Forecast Plot predicted_values = multivariate_gru.predict(X_test) d = { 'Predicted_Open' : predicted_values[:, 0 ], 'Predicted_Close' : predicted_values[:, 1 ], 'Actual_Open' : y_test[:, 0 ], 'Actual_Close' : y_test[:, 1 ], } d = pd.DataFrame(d) fig, ax = plt.subplots(figsize = ( 10 , 6 )) plt.plot(d[[ 'Actual_Open' , 'Predicted_Open' ]], label = [ 'Actual_Open' , 'Predicted_Open' ]) plt.plot(d[[ 'Actual_Close' , 'Predicted_Close' ]], label = [ 'Actual_Close' , 'Predicted_Close' ]) plt.xlabel( 'Timestamps' ) plt.ylabel( 'Values' ) ax.legend() plt.show() |
Output:
The above plot shows our model is forecasting the predictor variables very efficiently with very less deviation.
Model evaluation
Now we will evaluate the model’s performance in terms of MSE, MAE and R2-Score for each predictor variable.
Python3
# Model Evaluation def eval (model): return { 'MSE' : sklearn.metrics.mean_squared_error(d[f 'Actual_{model.split("_")[1]}' ].to_numpy(), d[model].to_numpy()), 'MAE' : sklearn.metrics.mean_absolute_error(d[f 'Actual_{model.split("_")[1]}' ].to_numpy(), d[model].to_numpy()), 'R2' : sklearn.metrics.r2_score(d[f 'Actual_{model.split("_")[1]}' ].to_numpy(), d[model].to_numpy()) } result = dict () for item in [ 'Predicted_Open' , 'Predicted_Close' ]: result[item] = eval (item) result |
Output:
{'Predicted_Open': {'MSE': 0.0009372416648234283,
'MAE': 0.028610593217509906,
'R2': 0.7725483601493502},
'Predicted_Close': {'MSE': 0.0007147844364524972,
'MAE': 0.023082124163354124,
'R2': 0.824012103548659}}
So, we can see that for both the predictor variables the errors are very less and R2-score is high enough. It depicts that our model is performing well but can perform better with hyper-parameter tuning and advance loss reduction.
Multivariate Time Series Forecasting with GRUs
Multivariate forecasting steps up as a game-changer in business analysis, bringing a fresh perspective that goes beyond the limits of one-variable predictions. In this article, we will explore the world of multivariate forecasting, peeling back the layers to understand its core, explore its applications, and grasp the revolutionary influence it has on steering decision-making towards the future.