Application of Box-Jenkins Methodology
Here we are using apple stock data from yfinance, we will be using Box-Jenkins method to analyze the stock data, here’s the step-by-step code with explanation:
Importing Libraries:
The code imports necessary libraries yfinance for downloading stock price data, pandas for data manipulation, matplotlib.pyplot for plotting, statsmodels for time series analysis and ARIMA modeling, warnings to suppress warnings during execution.
Python3
import yfinance as yf import pandas as pd import matplotlib.pyplot as plt from statsmodels.tsa.stattools import adfuller from statsmodels.graphics.tsaplots import plot_acf, plot_pacf from statsmodels.tsa.arima.model import ARIMA from statsmodels.stats.diagnostic import acorr_ljungbox import warnings warnings.filterwarnings( 'ignore' ) |
Function Definitions:
Now we will be using the functions that are defined for checking stationarity using the Augmented Dickey-Fuller (ADF) test and for plotting the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF).
Python3
# Function to check stationarity using Augmented Dickey-Fuller test def check_stationarity(ts): result = adfuller(ts) print (f 'ADF Statistic: {result[0]}' ) print (f 'p-value: {result[1]}' ) print (f 'Critical Values: {result[4]}' ) # Function to plot ACF and PACF def plot_acf_pacf(ts): fig, (ax1, ax2) = plt.subplots( 1 , 2 , figsize = ( 12 , 4 )) plot_acf(ts, ax = ax1, lags = 20 ) plot_pacf(ts, ax = ax2, lags = 20 ) plt.show() |
Data Loading and Preprocessing:
Stock price data for Apple Inc. (AAPL) is downloaded using yfinance. The data is collected from the start of 2015 to the start of 2023. Log returns are calculated to stabilize variance and make the time series more suitable for modeling.
Python3
# Load stock data stock_symbol = "AAPL" start_date = "2015-01-01" end_date = "2023-01-01" stock_data = yf.download(stock_symbol, start = start_date, end = end_date)[ 'Close' ] # Log transformation to stabilize variance log_returns = stock_data.pct_change().dropna() log_returns = log_returns. apply ( lambda x: pd.np.log( 1 + x)) |
Stationarity Check and Differencing:
The stationarity of the log returns is checked before and after differencing. The time series is differenced to achieve stationarity. ACF and PACF plots are created for the differenced series to help determine ARIMA orders.
Python3
# Check stationarity check_stationarity(log_returns) # Differencing to make the series stationary log_returns_diff = log_returns.diff().dropna() # Check stationarity after differencing check_stationarity(log_returns_diff) # Plot ACF and PACF after differencing plot_acf_pacf(log_returns_diff) |
Output:
ADF Statistic: -13.869148958528394
p-value: 6.51329302121344e-26
Critical Values: {'1%': -3.4336173133865064, '5%': -2.86298332472282, '10%': -2.5675383641200633}
ADF Statistic: -14.058039719328459
p-value: 3.091971442666415e-26
Critical Values: {'1%': -3.433648628001351, '5%': -2.8629971502062155, '10%': -2.5675457254979093}
Model Order Selection with AIC and BIC
The code iterates through different values of p, d, and q to find the combination that minimizes both the AIC and BIC values, helping to identify the optimal ARIMA model order.
Python3
# Find optimal values for p, d, q based on AIC and BIC best_aic = float ( 'inf' ) best_bic = float ( 'inf' ) best_order = None for p in range ( 3 ): # Choose a range for p for d in range ( 2 ): # Choose a range for d for q in range ( 3 ): # Choose a range for q arima_model = ARIMA(log_returns, order = (p, d, q)) arima_results = arima_model.fit() # Calculate AIC and BIC current_aic = arima_results.aic current_bic = arima_results.bic # Update best values if current_aic < best_aic and current_bic < best_bic: best_aic = current_aic best_bic = current_bic best_order = (p, d, q) print (f 'Best AIC: {best_aic}, Best BIC: {best_bic}, Best Order: {best_order}' ) |
Output:
Best AIC: -10277.232291010881, Best BIC: -10260.410146733962, Best Order: (0, 0, 1)
ARIMA Model Fitting and Diagnostics:
The ARIMA model is fitted using the optimal orders obtained from the AIC and BIC selection process. Diagnostics are performed on the residuals, including checking for stationarity. The Ljung-Box test is conducted to assess the autocorrelation in residuals.
Python3
# Fit ARIMA model with the best order arima_model = ARIMA(log_returns, order = best_order) arima_results = arima_model.fit() # Diagnostics residuals = arima_results.resid check_stationarity(residuals) # Ljung-Box test for autocorrelation in residuals lb_test_stat, lb_test_pvalue = acorr_ljungbox(residuals, lags = 20 ) print (f 'Ljung-Box test statistics: {lb_test_stat}' ) print (f 'Ljung-Box p-values: {lb_test_pvalue}' ) |
Output:
ADF Statistic: -13.478138873971695
p-value: 3.2812344010002946e-25
Critical Values: {'1%': -3.4336189466940414, '5%': -2.8629840458358933, '10%': -2.5675387480760885}
Ljung-Box test statistics: lb_stat
Ljung-Box p-values: lb_pvalue
Plotting Results:
Finally, the observed log returns and the fitted values from the ARIMA model are plotted to visualize the model’s performance.
Python3
# Plotting the predicted vs. actual values plt.figure(figsize = ( 12 , 6 )) plt.plot(log_returns_diff, label = 'Observed' ) plt.plot(arima_results.fittedvalues, color = 'red' , label = 'Fitted' , alpha = 0.7 ) plt.legend() plt.title(f 'ARIMA{best_order} Model for {stock_symbol} Stock Returns' ) plt.show() |
Output:
The code mentioned above provides a comprehensive example of applying the Box-Jenkins methodology, including stationarity checks, differencing, model fitting, diagnostics, and result visualization for time series analysis and forecasting of stock returns. Adjustments to the model orders and parameters may be necessary based on the diagnostic results.
Box-Jenkins Methodology for ARIMA Models
Time series data records data points with respect to time intervals. The analysis of such dataset is important to recognize patterns and making predictions as well as providing informative insights. Box-Jenkins model is a forecasting method that is used to forecasts time series data for a specific period of time.
In this article we will be taking a dive into the Box-Jenkins method for ARIMA modelling as it helps us analyze and forecast time series data.
Table of Content
- ARIMA Modelling
- Box-Jenkins Method
- Application of Box-Jenkins Methodology
Let us first discuss an overview about what is an ARIMA model so that we can get a sound understanding about the process.