Skip to content

Commit 06cb376

Browse files
committed
finished article on ARIMA
1 parent 732b7e4 commit 06cb376

33 files changed

+869
-810
lines changed

content/posts/finance/stock_prediction/ARIMA/arima_example.ipynb

+249-372
Large diffs are not rendered by default.
Loading
Loading
Loading
Loading
Loading
Loading

content/posts/finance/stock_prediction/ARIMA/index.md

+186-17
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ hero: images/test_forecast.png
1212
tags: ["Finance", "Statistics", "Forecasting"]
1313
categories: ["Finance"]
1414
---
15-
16-
1715
## 1. Introduction
1816

1917
Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
@@ -46,6 +44,7 @@ ARIMA models combine three components:
4644
3. **MA (Moving Average)**: The model uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.
4745

4846
The ARIMA model is typically denoted as ARIMA(p,d,q), where:
47+
4948
- p is the order of the AR term
5049
- d is the degree of differencing
5150
- q is the order of the MA term
@@ -56,9 +55,11 @@ The ARIMA model can be written as:
5655

5756
$$
5857
Y_t = c + \varphi_1 Y_{t-1} + \varphi_2 Y_{t-2} + ... + \varphi_p Y_{t-p} + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + ... + \theta_q \epsilon_{t-q} + \epsilon_t
58+
5959
$$
6060

6161
Where:
62+
6263
- $Y_t$ is the differenced series (it may have been differenced more than once)
6364
- **c** is a constant
6465
- $\phi_i$ are the parameters of the autoregressive part
@@ -118,8 +119,9 @@ if p_val > 0.05:
118119

119120
print(f"\nd = {d}")
120121
```
122+
121123
> *Output:*
122-
>
124+
>
123125
> d = 1
124126
125127
![png](images/time_series.png)
@@ -135,19 +137,21 @@ Choosing the right ARIMA model involves selecting appropriate values for p, d, a
135137
3. **Diagnostic checking**: Analyzing residuals to ensure they resemble white noise.
136138

137139
### Finding ARIMA Parameters (p, d, q)
140+
138141
Determining the optimal ARIMA parameters involves a combination of statistical tests, visual inspection, and iterative processes. Here's a systematic approach to finding p, d, and q:
139142

140-
* Determine d (Differencing Order):
143+
* Determine d (Differencing Order):
141144
- Use the Augmented Dickey-Fuller test to check for stationarity.
142145
- If the series is not stationary, difference it and test again until stationarity is achieved.
143-
* Determine p (AR Order) and q (MA Order):
146+
* Determine p (AR Order) and q (MA Order):
144147
- After differencing, use ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots.
145148
- The lag where the ACF cuts off indicates the q value.
146149
- The lag where the PACF cuts off indicates the p value.
147150
* Fine-tune with Information Criteria:
148151
- Use AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to compare different models.
149152

150153
### Finding d parameter from plots
154+
151155
Since, the stationary was already checkd in the previous, this paragraph is useful for graphical and comphrension purpose. Moreover, with autocorrelation parameters, it is possible to find better values of d that the ADF test cannot recognize.
152156

153157
```python
@@ -174,19 +178,21 @@ plot_acf(df.Close.diff().diff().dropna(), ax=axes[2, 1], lags=len(df)/7-3, color
174178
plt.tight_layout()
175179
plt.show()
176180
```
181+
177182
![png](images/find_d.png)
178183

179-
Indeed, from the plot, *d=2* is probably a better solution since we have few coefficient that goes above the confidence threshold.
184+
Indeed, from the plot, *d=2* is probably a better solution since we have few coefficient that goes above the confidence threshold.
180185

181186
### Finding p parameter from plots
187+
182188
As suggest previously, Partical Correlation Plot is adopted to find the **p** parameter.
183189

184190
```python
185191
plt.rcParams.update({'figure.figsize':(15,5), 'figure.dpi':80})
186192
fig, axes = plt.subplots(1, 2, sharex=False)
187193
axes[0].plot(df.index, df.Close.diff()); axes[0].set_title('1st Differencing')
188194
axes[1].set(ylim=(0,5))
189-
plot_pacf(df.Close.diff().dropna(), ax=axes[1], lags=20, color='k', auto_ylims=True, zero=False)
195+
plot_pacf(df.Close.diff().dropna(), ax=axes[1], lags=200, color='k', auto_ylims=True, zero=False)
190196

191197
plt.tight_layout()
192198
plt.show()
@@ -198,7 +204,22 @@ A possible choice of **p** can 8 or 18, where the coefficient crosses the confid
198204

199205
### Finding q parameter from plots
200206

207+
```python
208+
plt.rcParams.update({'figure.figsize':(15,5), 'figure.dpi':80})
209+
fig, axes = plt.subplots(1, 2, sharex=False)
210+
axes[0].plot(df.Close.diff()); axes[0].set_title('1st Differencing')
211+
axes[1].set(ylim=(0,1.2))
212+
plot_acf(df.Close.diff().dropna(), ax=axes[1], lags=200, color='k', auto_ylims=True, zero=False)
213+
plt.tight_layout()
214+
plt.show()
215+
```
216+
217+
![png](images/find_q.png)
218+
219+
ACF looks very similar to PCF for smaller lags. Hence, even in this case a value of 8 can be used as *q*.
220+
201221
### Grid Search
222+
202223
Here's a Python function to perform a grid search:
203224

204225
```python
@@ -222,9 +243,157 @@ def grid_search_arima(ts, p_range, d_range, q_range):
222243
best_order = grid_search_arima(ts_diff, range(3), range(2), range(3))
223244
```
224245

246+
## 6. ARIMA model fitting
247+
248+
### Predict ARIMA model on all data
249+
250+
```python
251+
model = ARIMA(df.Close, order=(8,2,8)) # p,d,q
252+
results = model.fit()
253+
print(results.summary())
254+
255+
# Actual vs Fitted o
256+
plt.plot(results.predict()[-100:], '-*', label='prediction')
257+
plt.plot(df.Close[-100:], '-*', label='actual')
258+
plt.legend()
259+
plt.title("Prediction vs. Actual on All Data ")
260+
plt.tight_layout()
261+
plt.show()
262+
```
225263

264+
![png](images/train_pred.png)
226265

227-
## 6. Limitations and Considerations
266+
### Train/ Test split
267+
268+
```python
269+
from statsmodels.tsa.stattools import acf
270+
271+
# Create Training and Test
272+
train = df.Close[:int(len(df)*0.8)]
273+
test = df.Close[int(len(df)*0.8):]
274+
275+
# model = ARIMA(train, order=(3,2,1))
276+
model = ARIMA(train, order=(8, 2, 8))
277+
fitted = model.fit()
278+
279+
# Forecast
280+
fc = fitted.get_forecast(steps=len(test), alpha=0.05) # 95% conf
281+
conf = fc.conf_int()
282+
283+
# Make as pandas series
284+
fc_series = pd.Series(fitted.forecast(steps=len(test)).values, index=test.index)
285+
lower_series = pd.Series(conf.iloc[:, 0].values, index=test.index)
286+
upper_series = pd.Series(conf.iloc[:, 1].values, index=test.index)
287+
288+
# Plot
289+
plt.figure(figsize=(12,5), dpi=100)
290+
plt.plot(train[-200:], label='training')
291+
plt.plot(test, label='actual')
292+
plt.plot(fc_series, label='forecast')
293+
plt.fill_between(lower_series.index, lower_series, upper_series,
294+
color='k', alpha=.15)
295+
plt.title('Forecast vs Actuals')
296+
plt.legend(loc='upper left', fontsize=10)
297+
plt.tight_layout()
298+
plt.show()
299+
```
300+
301+
![png](images/test_forecast.png)
302+
303+
```python
304+
# Accuracy metrics
305+
def forecast_accuracy(forecast, actual):
306+
mape = np.mean(np.abs(forecast - actual)/np.abs(actual)) # MAPE
307+
me = np.mean(forecast - actual) # ME
308+
mae = np.mean(np.abs(forecast - actual)) # MAE
309+
mpe = np.mean((forecast - actual)/actual) # MPE
310+
rmse = np.mean((forecast - actual)**2)**.5 # RMSE
311+
corr = np.corrcoef(forecast, actual)[0,1] # corr
312+
mins = np.amin(np.hstack([forecast[:,None],
313+
actual[:,None]]), axis=1)
314+
maxs = np.amax(np.hstack([forecast[:,None],
315+
actual[:,None]]), axis=1)
316+
minmax = 1 - np.mean(mins/maxs) # minmax
317+
acf1 = acf(forecast-test)[1] # ACF1
318+
return({'mape':mape, 'me':me, 'mae': mae,
319+
'mpe': mpe, 'rmse':rmse, 'acf1':acf1,
320+
'corr':corr, 'minmax':minmax})
321+
322+
forecast_accuracy(fc_series.values, test.values)
323+
```
324+
325+
> Output:
326+
>
327+
> {'mape': 0.07829701788549515,
328+
>
329+
> 'me': -12.898037657120996,
330+
>
331+
> 'mae': 14.483068468837455,
332+
>
333+
> 'mpe': -0.068860507560246,
334+
>
335+
> 'rmse': 16.906382957008496,
336+
>
337+
> 'acf1': 0.9702976318229376,
338+
>
339+
> 'corr': 0.4484875181364141,
340+
>
341+
> 'minmax': 0.07810488835602647}
342+
343+
### Grid Search
344+
345+
```python
346+
def grid_search_arima(train, test, p_range, d_range, q_range):
347+
best_aic = float('inf')
348+
best_mape = float('inf')
349+
best_order = None
350+
for p in p_range:
351+
for d in d_range:
352+
for q in q_range:
353+
try:
354+
model = ARIMA(train.values, order=(p,d,q))
355+
results = model.fit()
356+
fc_series = pd.Series(results.forecast(steps=len(test)), index=test.index) # 95% conf
357+
test_metrics = forecast_accuracy(fc_series.values, test.values)
358+
# if results.aic < best_aic:
359+
# best_aic = results.aic
360+
# best_order = (p,d,q)
361+
print(p,d,q, test_metrics['mape'])
362+
if test_metrics['mape'] < best_mape:
363+
best_mape = test_metrics['mape']
364+
best_order = (p,d,q)
365+
print("temp best:", best_order, test_metrics['mape'])
366+
except Exception as e:
367+
print(e)
368+
continue
369+
return best_order
370+
371+
# Grid search for best p and q (assuming d is known)
372+
best_order = grid_search_arima(train, test, range(1,9), [d, d+1], range(1,9))
373+
print(f"Best ARIMA order based on grid search: {best_order}")
374+
```
375+
376+
> Suggested d value: 1
377+
>
378+
> temp best: (1, 1, 1) 0.14570196898952395
379+
>
380+
> temp best: (1, 1, 5) 0.14514639508226412
381+
>
382+
> temp best: (1, 1, 6) 0.14499024417142595
383+
>
384+
> temp best: (1, 1, 7) 0.1439625731680348
385+
>
386+
> temp best: (1, 2, 1) 0.07729490750827837
387+
>
388+
> temp best: (1, 2, 2) 0.0764917667521908
389+
>
390+
> temp best: (3, 2, 4) 0.07647187068962996
391+
>
392+
> Best ARIMA order based on grid search: (3, 2, 4)
393+
394+
In g
395+
396+
## 7. Limitations and Considerations
228397

229398
While ARIMA models can be powerful for time series prediction, they have limitations:
230399

@@ -234,15 +403,6 @@ While ARIMA models can be powerful for time series prediction, they have limitat
234403
4. **Assumption of constant variance**: This may not hold for volatile stock prices.
235404
5. **No consideration of external factors**: ARIMA models only use past values of the time series, ignoring other potentially relevant variables.
236405

237-
## 7. Advanced Topics and Extensions
238-
239-
Several extensions to basic ARIMA models address some of these limitations:
240-
241-
1. **SARIMA**: Incorporates seasonality
242-
2. **ARIMAX**: Includes exogenous variables
243-
3. **GARCH**: Models time-varying volatility
244-
4. **Vector ARIMA**: Handles multiple related time series simultaneously
245-
246406
## 8. Conclusion
247407

248408
Time series analysis and ARIMA models provide valuable tools for understanding and predicting stock price movements. While they have limitations, particularly in the complex and often non-linear world of financial markets, they serve as a strong foundation for more advanced modeling techniques.
@@ -257,3 +417,12 @@ When applying these models to real-world financial data, it's crucial to:
257417

258418
As with all financial modeling, remember that past performance does not guarantee future results. Time series models should be one tool in a broader analytical toolkit, complemented by fundamental analysis, market sentiment assessment, and a deep understanding of the specific stock and its market context.
259419

420+
### Next Steps
421+
422+
In next articles, we are going to explore about time-series decomposition, seasanality, exogenous variables.
423+
Indeed, several extensions to basic ARIMA models address some of these limitations:
424+
425+
1. **SARIMA**: Incorporates seasonality.
426+
2. **ARIMAX**: Includes exogenous variables.
427+
3. **GARCH**: Models time-varying volatility.
428+
4. **Vector ARIMA**: Handles multiple related time series simultaneously.

public/categories/finance/index.html

+4-4
Original file line numberDiff line numberDiff line change
@@ -297,14 +297,14 @@
297297
<div class="card">
298298
<div class="card-head">
299299
<a href="/posts/finance/stock_prediction/arima/" class="post-card-link">
300-
<img class="card-img-top" src='/images/default-hero.jpg' alt="Hero Image">
300+
<img class="card-img-top" src='/posts/finance/stock_prediction/arima/images/test_forecast.png' alt="Hero Image">
301301
</a>
302302
</div>
303303
<div class="card-body">
304304
<a href="/posts/finance/stock_prediction/arima/" class="post-card-link">
305305
<h5 class="card-title">Time Series Analysis and ARIMA Models for Stock Price Prediction</h5>
306-
<p class="card-text post-summary">Time Series Analysis and ARIMA Models for Stock Price Prediction 1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
307-
This article will explore the application of time series analysis and ARIMA models to stock price prediction.</p>
306+
<p class="card-text post-summary">1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
307+
This article will explore the application of time series analysis and ARIMA models to stock price prediction. We&rsquo;ll cover the theoretical foundations, practical implementation in Python, and critical considerations for using these models in real-world financial scenarios.</p>
308308
</a>
309309

310310
<div class="tags">
@@ -327,7 +327,7 @@ <h5 class="card-title">Time Series Analysis and ARIMA Models for Stock Price Pre
327327
<div class="card-footer">
328328
<span class="float-start">
329329
Friday, June 28, 2024
330-
| 5 minutes </span>
330+
| 9 minutes </span>
331331
<a
332332
href="/posts/finance/stock_prediction/arima/"
333333
class="float-end btn btn-outline-info btn-sm">Read</a>

public/categories/finance/index.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
<pubDate>Fri, 28 Jun 2024 00:00:00 +0100</pubDate>
1313

1414
<guid>http://localhost:1313/posts/finance/stock_prediction/arima/</guid>
15-
<description>Time Series Analysis and ARIMA Models for Stock Price Prediction 1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
16-
This article will explore the application of time series analysis and ARIMA models to stock price prediction.</description>
15+
<description>1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
16+
This article will explore the application of time series analysis and ARIMA models to stock price prediction. We&amp;rsquo;ll cover the theoretical foundations, practical implementation in Python, and critical considerations for using these models in real-world financial scenarios.</description>
1717
</item>
1818

1919
<item>

public/index.json

+1-1
Large diffs are not rendered by default.

public/posts/finance/index.html

+4-4
Original file line numberDiff line numberDiff line change
@@ -433,14 +433,14 @@
433433
<div class="card">
434434
<div class="card-head">
435435
<a href="/posts/finance/stock_prediction/arima/" class="post-card-link">
436-
<img class="card-img-top" src='/images/default-hero.jpg' alt="Hero Image">
436+
<img class="card-img-top" src='/posts/finance/stock_prediction/arima/images/test_forecast.png' alt="Hero Image">
437437
</a>
438438
</div>
439439
<div class="card-body">
440440
<a href="/posts/finance/stock_prediction/arima/" class="post-card-link">
441441
<h5 class="card-title">Time Series Analysis and ARIMA Models for Stock Price Prediction</h5>
442-
<p class="card-text post-summary">Time Series Analysis and ARIMA Models for Stock Price Prediction 1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
443-
This article will explore the application of time series analysis and ARIMA models to stock price prediction.</p>
442+
<p class="card-text post-summary">1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
443+
This article will explore the application of time series analysis and ARIMA models to stock price prediction. We&rsquo;ll cover the theoretical foundations, practical implementation in Python, and critical considerations for using these models in real-world financial scenarios.</p>
444444
</a>
445445

446446
<div class="tags">
@@ -463,7 +463,7 @@ <h5 class="card-title">Time Series Analysis and ARIMA Models for Stock Price Pre
463463
<div class="card-footer">
464464
<span class="float-start">
465465
Friday, June 28, 2024
466-
| 5 minutes </span>
466+
| 9 minutes </span>
467467
<a
468468
href="/posts/finance/stock_prediction/arima/"
469469
class="float-end btn btn-outline-info btn-sm">Read</a>

public/posts/finance/index.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
<pubDate>Fri, 28 Jun 2024 00:00:00 +0100</pubDate>
1313

1414
<guid>http://localhost:1313/posts/finance/stock_prediction/arima/</guid>
15-
<description>Time Series Analysis and ARIMA Models for Stock Price Prediction 1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
16-
This article will explore the application of time series analysis and ARIMA models to stock price prediction.</description>
15+
<description>1. Introduction Time series analysis is a fundamental technique in quantitative finance, particularly for understanding and predicting stock price movements. Among the various time series models, ARIMA (Autoregressive Integrated Moving Average) models have gained popularity due to their flexibility and effectiveness in capturing complex patterns in financial data.
16+
This article will explore the application of time series analysis and ARIMA models to stock price prediction. We&amp;rsquo;ll cover the theoretical foundations, practical implementation in Python, and critical considerations for using these models in real-world financial scenarios.</description>
1717
</item>
1818

1919
<item>

public/posts/finance/stock_prediction/arima/arima_example.ipynb

+249-372
Large diffs are not rendered by default.
Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)