Step-by-step implementation of Multivariate Forecast

Importing required modules

At first, we will import all required Python modules like Pandas, NumPy, Matplotlib, TensorFlow and SKlearn etc.

Python3

import datetime
import sklearn
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import KernelPCA
import numpy as np
import pandas as pd
import math
import tensorflow as tf
import matplotlib.pyplot as plt
tf.random.set_seed(99)

Dataset loading

Now, we will load a time-series dataset.

Python3

# Dataset loading
url = 'https://raw.githubusercontent.com/SusmitSekharBhakta/Stock-market-price-prediction/main/final_data_adj.csv'
 
dataFrame = pd.read_csv(url)  
print(dataFrame.head())

Output:

         Date         Open         High          Low        Close  \
0  2017-08-28  9907.150391  9925.750000  9882.000000  9912.799805   
1  2017-08-29  9886.400391  9887.349609  9783.750000  9796.049805   
2  2017-08-30  9859.500000  9909.450195  9850.799805  9884.400391   
3  2017-08-31  9905.700195  9925.099609  9856.950195  9917.900391   
4  2017-09-01  9937.650391  9983.450195  9909.849609  9974.400391   

     Adj Close    Volume        RSI       MACD    MACDsig  MACDhist  \
0  9912.799805  159600.0  55.406997  28.647258  28.317577  0.515867   
1  9796.049805  173300.0  55.406997  28.647258  28.317577  0.515867   
2  9884.400391  157800.0  55.406997  28.647258  28.317577  0.515867   
3  9917.900391  327700.0  55.406997  28.647258  28.317577  0.515867   
4  9974.400391  157800.0  55.406997  28.647258  28.317577  0.515867   

            SMA        CCI  Aroon Up  Aroon Down         Sadj  
0  12759.905212  24.363507       0.0         0.0          NaN  
1  12759.905212  24.363507       0.0         0.0   162.055635  
2  12759.905212  24.363507       0.0         0.0   -22.453545  
3  12759.905212  24.363507       0.0         0.0    -9.197608  
4  12759.905212  24.363507       0.0         0.0  5259.919641

Data preprocessing

The ‘Date’ column is then dropped, and any missing values in the DataFrame are imputed using the mean of each column.
The data is then normalized using Min-Max scaling to ensure that all features fall within the range of 0 to 1. Specifically, the ‘Open’ and ‘Close’ columns are separately scaled using a target Min-Max scaler.
The resulting scaled DataFrame, named df_scaled, is then cast to a float data type, ensuring numerical consistency for subsequent operations.
As we are performing multivariate forecast so we will predict two variables of the dataset which are ‘Open’ and ‘Close’. These two are most important components in stock prediction.

Python3

imputer = SimpleImputer(missing_values=np.nan)
dataFrame.drop(columns=['Date'], inplace=True)
dataFrame = pd.DataFrame(imputer.fit_transform(
    dataFrame), columns=dataFrame.columns)
dataFrame = dataFrame.reset_index(drop=True)
# Applying feature scaling
scaler = MinMaxScaler(feature_range=(0, 1))
df_scaled = scaler.fit_transform(dataFrame.to_numpy())
df_scaled = pd.DataFrame(df_scaled, columns=list(dataFrame.columns))
 
target_scaler = MinMaxScaler(feature_range=(0, 1))
 
df_scaled[['Open', 'Close']] = target_scaler.fit_transform(
    dataFrame[['Open', 'Close']].to_numpy())
 
df_scaled = df_scaled.astype(float)
print(df_scaled.head())

Output:

       Open      High       Low     Close  Adj Close    Volume       RSI  \
0  0.199868  0.178737  0.216833  0.211888   0.211888  0.088128  0.573174   
1  0.197958  0.175103  0.207848  0.201145   0.201145  0.095693  0.573174   
2  0.195483  0.177194  0.213980  0.209275   0.209275  0.087134  0.573174   
3  0.199734  0.178675  0.214542  0.212358   0.212358  0.180950  0.573174   
4  0.202674  0.184197  0.219380  0.217557   0.217557  0.087134  0.573174   

       MACD   MACDsig  MACDhist       SMA       CCI  Aroon Up  Aroon Down  \
0  0.753954  0.738291  0.504494  0.441765  0.541449       0.0         0.0   
1  0.753954  0.738291  0.504494  0.441765  0.541449       0.0         0.0   
2  0.753954  0.738291  0.504494  0.441765  0.541449       0.0         0.0   
3  0.753954  0.738291  0.504494  0.441765  0.541449       0.0         0.0   
4  0.753954  0.738291  0.504494  0.441765  0.541449       0.0         0.0   

       Sadj  
0  0.172391  
1  0.167173  
2  0.166825  
3  0.166850  
4  0.176766

Dataset transformation

A function named singleStepSampler is defined to prepare the dataset for single-step time-series forecasting. It takes two arguments: a dataframe df and a window size.
Within this function, xRes and yRes lists are initialized to store the input features and target values, respectively.
Two nested loops iterate over the dataframe rows to create sequences of input features (xRes) and corresponding target values (yRes) based on the specified window size.
The input features are constructed as a sequence of windowed data points, where each data point is a list containing values from each column of the dataframe.
The target values ('Open' and 'Close' columns) for each window are appended to yRes.
Finally, the function returns numpy arrays xRes and yRes.

Python3

# Single step dataset preparation
def singleStepSampler(df, window):
    xRes = []
    yRes = []
    for i in range(0, len(df) - window):
        res = []
        for j in range(0, window):
            r = []
            for col in df.columns:
                r.append(df[col][i + j])
            res.append(r)
        xRes.append(res)
        yRes.append(df[['Open', 'Close']].iloc[i + window].values)
    return np.array(xRes), np.array(yRes)
 
(xVal, yVal) = singleStepSampler(df_scaled, 20)

Data Splitting

A constant SPLIT with a value of 0.85 is defined, specifying the proportion of data to be used for training.
The singleStepSampler function is applied to the scaled dataframe df_scaled with a window size of 20, resulting in feature vectors xVal and target vectors yVal.
These feature and target vectors are split into training and testing sets according to the SPLIT ratio, with the training set containing 85% of the data and the testing set containing the remaining 15%.

Python3

# Dataset splitting
SPLIT = 0.85
 
X_train = xVal[:int(SPLIT * len(xVal))]
y_train = yVal[:int(SPLIT * len(yVal))]
X_test = xVal[int(SPLIT * len(xVal)):]
y_test = yVal[int(SPLIT * len(yVal)):]

Defining the model

multivariate_gru = tf.keras.Sequential(): Initializes a sequential model, which is a linear stack of layers.
In this step, a multivariate Gated Recurrent Unit neural network model is defined using TensorFlow’s Keras API. The model is initialized as a sequential model.
It consists of a GRU layer with 200 units, taking input sequences with a shape defined by the number of features (columns) in the training data (X_train).
A dropout layer is added to prevent overfitting and the output layer is a dense layer with 2 units, representing the predicted values for the two predictor variables (‘Open’ and ‘Close’).
The activation function for this output layer is set to linear. The model is compiled using mean squared error as the loss function, and MAE and MSE as metrics for further evaluation.
The Adam optimizer is used for training.
The summary() method provides a summary of the model architecture, including the number of parameters and layer configurations.

Python3

multivariate_gru = tf.keras.Sequential()
multivariate_gru.add(
    tf.keras.layers.GRU(200, input_shape=(X_train.shape[1], X_train.shape[2])))
multivariate_gru.add(
    tf.keras.layers.Dropout(0.5))
 
# Output layer for two predictor variables
multivariate_gru.add(
    tf.keras.layers.Dense(2, activation='linear'))
 
# Compile the model
multivariate_gru.compile(loss='MeanSquaredError',
                         metrics=['MAE', 'MSE'],
                         optimizer=tf.keras.optimizers.Adam())
multivariate_gru.summary()

Output:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 gru (GRU)                   (None, 200)               130200    
                                                                 
 dropout (Dropout)           (None, 200)               0         
                                                                 
 dense (Dense)               (None, 2)                 402       
                                                                 
=================================================================
Total params: 130602 (510.16 KB)
Trainable params: 130602 (510.16 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Model training

Now ,we will train our model on 20 epochs.

Python3

history = multivariate_gru.fit(X_train, y_train, epochs=20)

Output:

Epoch 1/20
48/48 [==============================] - 4s 5ms/step - loss: 0.0337 - MAE: 0.1330 - MSE: 0.0337
Epoch 2/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0086 - MAE: 0.0721 - MSE: 0.0086
Epoch 3/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0057 - MAE: 0.0580 - MSE: 0.0057
Epoch 4/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0045 - MAE: 0.0518 - MSE: 0.0045
Epoch 5/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0041 - MAE: 0.0489 - MSE: 0.0041
Epoch 6/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0038 - MAE: 0.0467 - MSE: 0.0038
Epoch 7/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0034 - MAE: 0.0444 - MSE: 0.0034
Epoch 8/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0032 - MAE: 0.0427 - MSE: 0.0032
Epoch 9/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0028 - MAE: 0.0398 - MSE: 0.0028
Epoch 10/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0027 - MAE: 0.0387 - MSE: 0.0027
Epoch 11/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0025 - MAE: 0.0377 - MSE: 0.0025
Epoch 12/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0024 - MAE: 0.0370 - MSE: 0.0024
Epoch 13/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0025 - MAE: 0.0372 - MSE: 0.0025
Epoch 14/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0026 - MAE: 0.0385 - MSE: 0.0026
Epoch 15/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0020 - MAE: 0.0334 - MSE: 0.0020
Epoch 16/20
48/48 [==============================] - 0s 5ms/step - loss: 0.0021 - MAE: 0.0341 - MSE: 0.0021
Epoch 17/20
48/48 [==============================] - 0s 4ms/step - loss: 0.0021 - MAE: 0.0339 - MSE: 0.0021
Epoch 18/20
48/48 [==============================] - 0s 6ms/step - loss: 0.0019 - MAE: 0.0323 - MSE: 0.0019
Epoch 19/20
48/48 [==============================] - 0s 6ms/step - loss: 0.0017 - MAE: 0.0305 - MSE: 0.0017
Epoch 20/20
48/48 [==============================] - 0s 6ms/step - loss: 0.0017 - MAE: 0.0309 - MSE: 0.0017

Forecasting of values

As the training is completed, now we will forecast the both predictor variables i.e. ‘Open’ and ‘Close’.

Python3

# Forecast Plot
predicted_values = multivariate_gru.predict(X_test)
 
d = {
    'Predicted_Open': predicted_values[:, 0],
    'Predicted_Close': predicted_values[:, 1],
    'Actual_Open': y_test[:, 0],
    'Actual_Close': y_test[:, 1],
}
 
d = pd.DataFrame(d)
 
fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(d[['Actual_Open', 'Predicted_Open']], label=['Actual_Open', 'Predicted_Open'])
plt.plot(d[['Actual_Close', 'Predicted_Close']], label=['Actual_Close', 'Predicted_Close'])
plt.xlabel('Timestamps')
plt.ylabel('Values')
ax.legend()
plt.show()

Output:

Multivariate forecasting

The above plot shows our model is forecasting the predictor variables very efficiently with very less deviation.

Model evaluation

Now we will evaluate the model’s performance in terms of MSE, MAE and R2-Score for each predictor variable.

Python3

# Model Evaluation
def eval(model):
    return {
        'MSE': sklearn.metrics.mean_squared_error(d[f'Actual_{model.split("_")[1]}'].to_numpy(), d[model].to_numpy()),
        'MAE': sklearn.metrics.mean_absolute_error(d[f'Actual_{model.split("_")[1]}'].to_numpy(), d[model].to_numpy()),
        'R2': sklearn.metrics.r2_score(d[f'Actual_{model.split("_")[1]}'].to_numpy(), d[model].to_numpy())
    }
 
result = dict()
 
for item in ['Predicted_Open', 'Predicted_Close']:
    result[item] = eval(item)
 
result

Output:

{'Predicted_Open': {'MSE': 0.0009372416648234283,
  'MAE': 0.028610593217509906,
  'R2': 0.7725483601493502},
 'Predicted_Close': {'MSE': 0.0007147844364524972,
  'MAE': 0.023082124163354124,
  'R2': 0.824012103548659}}

So, we can see that for both the predictor variables the errors are very less and R2-score is high enough. It depicts that our model is performing well but can perform better with hyper-parameter tuning and advance loss reduction.

Step-by-step implementation of Multivariate Forecast

Importing required modules

Python3

Dataset loading

Python3

Data preprocessing

Python3

Dataset transformation

Python3

Data Splitting

Python3

Defining the model

Python3

Model training

Python3

Forecasting of values

Python3

Model evaluation

Python3

Multivariate Time Series Forecasting with GRUs

Categories

Contact US

Step-by-step implementation of Multivariate Forecast

Importing required modules

Python3

Dataset loading

Python3

Data preprocessing

Python3

Dataset transformation

Python3

Data Splitting

Python3

Defining the model

Python3

Model training

Python3

Forecasting of values

Python3

Model evaluation

Python3

Multivariate Time Series Forecasting with GRUs

Similar Reads

Categories

Contact US