Skip to content

Commit a14f92b

Browse files
committed
first article finished
1 parent 481d17b commit a14f92b

File tree

18 files changed

+250
-54
lines changed

18 files changed

+250
-54
lines changed
Loading

content/posts/finance/stock_prediction/GRU/index.md

+104-9
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,9 @@ hero: images/stock-market-prediction-using-data-mining-techniques.jpg
1212
tags: ["Finance", "Deep Learning", "Forecasting"]
1313
categories: ["Finance"]
1414
---
15-
1615
## Introduction
1716

18-
In this article, we will explore time series data extracted from the **stock market**, focusing on prominent technology companies such as Apple, Amazon, Google, and Microsoft. Our objective is to equip data analysts and scientists with the essential skills to effectively manipulate and interpret stock market data.
17+
In this article, we will explore time series data extracted from the **stock market**, focusing on prominent technology companies such as Apple, Amazon, Google, and Microsoft. Our objective is to equip data analysts and scientists with the essential skills to effectively manipulate and interpret stock market data.
1918

2019
To achieve this, we will utilize the *yfinance* library to fetch stock information and leverage visualization tools such as Seaborn and Matplotlib to illustrate various facets of the data. Specifically, we will explore methods to analyze stock risk based on historical performance, and implement predictive modeling using **GRU/ LSTM** models.
2120

@@ -27,16 +26,20 @@ Throughout this tutorial, we aim to address the following key questions:
2726
4. What is the **correlation** between different stocks?
2827
5. How can we forecast future stock behavior, exemplified by predicting the closing price of Apple Inc. using LSTM or GRU?"
2928

30-
***
29+
---
3130

3231
## Getting Data
32+
3333
The initial step involves **acquiring and loading** the data into memory. Our source of stock data is the **Yahoo Finance** website, renowned for its wealth of financial market data and investment tools. To access this data, we'll employ the **yfinance** library, known for its efficient and Pythonic approach to downloading market data from Yahoo. For further insights into yfinance, refer to the article titled [Reliably download historical market data from with Python](https://aroussi.com/post/python-yahoo-finance).
3434

3535
### Install Dependencies
36+
3637
```bash
3738
pip install -qU yfinance seaborn
3839
```
40+
3941
### Configuration Code
42+
4043
```python
4144
import pandas as pd
4245
import numpy as np
@@ -62,13 +65,16 @@ data = yf.download("MSFT", start, end)
6265
```
6366

6467
## Statistical Analysis on the price
68+
6569
### Summary
70+
6671
```python
6772
# Summary Stats
6873
data.describe()
6974
```
7075

7176
### Closing Price
77+
7278
The closing price is the last price at which the stock is traded during the regular trading day. A stock’s closing price is the standard benchmark used by investors to track its performance over time.
7379

7480
```python
@@ -80,8 +86,11 @@ plt.title('Stock Price History')
8086
plt.legend()
8187
plt.show()
8288
```
89+
8390
### Volume of Sales
91+
8492
Volume is the amount of an asset or security that _changes hands over some period of time_, often over the course of a day. For instance, the stock trading volume would refer to the number of shares of security traded between its daily open and close. Trading volume, and changes to volume over the course of time, are important inputs for technical traders.
93+
8594
```python
8695
plt.figure(figsize=(14, 5))
8796
plt.plot(data['Volume'], label='Volume')
@@ -92,8 +101,8 @@ plt.show()
92101
```
93102

94103
### Moving Average
95-
The moving average (MA) is a simple **technical analysis** tool that smooths out price data by creating a constantly updated average price. The average is taken over a specific period of time, like 10 days, 20 minutes, 30 weeks, or any time period the trader chooses.
96104

105+
The moving average (MA) is a simple **technical analysis** tool that smooths out price data by creating a constantly updated average price. The average is taken over a specific period of time, like 10 days, 20 minutes, 30 weeks, or any time period the trader chooses.
97106

98107
```python
99108
ma_day = [10, 20, 50]
@@ -112,7 +121,9 @@ plt.show()
112121
```
113122

114123
## Statistical Analysis on the returns
124+
115125
Now that we've done some baseline analysis, let's go ahead and dive a little deeper. We're now going to analyze the risk of the stock. In order to do so we'll need to take a closer look at the daily changes of the stock, and not just its absolute value. Let's go ahead and use pandas to retrieve teh daily returns for the **Microsoft** stock.
126+
116127
```python
117128
# Compute daily return in percentage
118129
data['Daily Return'] = data['Adj Close'].pct_change()
@@ -131,7 +142,9 @@ plt.title('MSFT Daily Return')
131142
plt.show()
132143

133144
```
145+
134146
## Data Preparation
147+
135148
```python
136149
# Create a new dataframe with only the 'Close column
137150
X = data.filter(['Adj Close'])
@@ -147,7 +160,9 @@ scaled_data = scaler.fit_transform(X)
147160

148161
scaled_data
149162
```
163+
150164
Split training data into small chunks to ingest into LSTM and GRU
165+
151166
```python
152167
# Create the training data set
153168
# Create the scaled training data set
@@ -162,7 +177,7 @@ for i in range(seq_length, len(train_data)):
162177
if i<= seq_length+1:
163178
print(x_train)
164179
print(y_train, end="\n\n")
165-
180+
166181
# Convert the x_train and y_train to numpy arrays
167182
x_train, y_train = np.array(x_train), np.array(y_train)
168183

@@ -171,7 +186,9 @@ x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
171186
```
172187

173188
## GRU
189+
174190
Gated-Recurrent Unit (GRU) is adopted in this part
191+
175192
```python
176193
from tensorflow.keras.models import Sequential
177194
from tensorflow.keras.layers import GRU, Dense, Dropout
@@ -184,11 +201,13 @@ lstm_model.add(Dropout(0.2))
184201
lstm_model.add(Dense(units=1))
185202

186203
lstm_model.compile(optimizer='adam', loss='mean_squared_error')
187-
lstm_model.fit(x_train, y_train, epochs=10, batch_size=4)
204+
lstm_model.fit(x_train, y_train, epochs=10, batch_size=8)
188205
```
189206

190207
## LSTM
208+
191209
Long Short-Term Memory (LSTM) is adopted in this part
210+
192211
```python
193212
from tensorflow.keras.layers import LSTM
194213

@@ -200,13 +219,89 @@ lstm_model.add(Dropout(0.2))
200219
lstm_model.add(Dense(units=1))
201220

202221
lstm_model.compile(optimizer='adam', loss='mean_squared_error')
203-
lstm_model.fit(x_train, y_train, epochs=10, batch_size=4)
222+
lstm_model.fit(x_train, y_train, epochs=10, batch_size=8)
204223
```
205224

206-
207225
## Testing Metrics
208-
* mean squared error
226+
227+
* root mean squared error (RMSE)
228+
229+
```python
230+
231+
# Create the testing data set
232+
# Create a new array containing scaled values from index 1543 to 2002
233+
test_data = scaled_data[training_data_len - 60: , :]
234+
# Create the data sets x_test and y_test
235+
x_test = []
236+
y_test = dataset[training_data_len:, :]
237+
for i in range(60, len(test_data)):
238+
x_test.append(test_data[i-60:i, 0])
239+
240+
# Convert the data to a numpy array
241+
x_test = np.array(x_test)
242+
243+
# Reshape the data
244+
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
245+
246+
# Get the models predicted price values
247+
predictions_gru = gru_model.predict(x_test)
248+
predictions_gru = scaler.inverse_transform(predictions_gru)
249+
predictions_lstm = lstm_model.predict(x_test)
250+
predictions_lstm = scaler.inverse_transform(predictions_lstm)
251+
252+
# Get the root mean squared error (RMSE)
253+
rmse_lstm = np.sqrt(np.mean(((predictions_lstm - y_test) ** 2)))
254+
rmse_gru = np.sqrt(np.mean(((predictions_gru - y_test) ** 2)))
255+
print(f"LSTM RMSE: {rmse_lstm:.4f}, GRU RMSE: {rmse_gru:.4f}")
256+
```
257+
258+
> "LSTM RMSE: 4.2341, GRU RMSE: {3.3575}"
209259
210260
### Test Plot
211261

262+
{{< img src="/posts/finance/stock_prediction/GRU/images/test_results.png" align="center" title="Results">}}
263+
GRU-based model shows a bit better results both graphically and on MSE. However, this does not tell us anything about the actual profitability of these models.
264+
212265
## Possible trading performance
266+
267+
The strategy implementation is:
268+
269+
* BUY: if prediction > actual_price
270+
* SELL: if prediction < actual_price
271+
272+
To close a position the next candle _close_ is waited. However, LSTM and GRU has some offset that does not allow a proper utilization of this strategy.
273+
274+
Hence, the **returns** of the predictions are adopted.
275+
276+
```python
277+
# Assume a trading capital of $10,000
278+
trading_capital = 10000
279+
pred_gru_df = pd.DataFrame(predictions_gru, columns=['Price'])
280+
pred_test_df = pd.DataFrame(y_test, columns=['Price'])
281+
pred_gru_df['returns'] = pred_gru_df.pct_change(-1)
282+
pred_test_df['returns'] = pred_test_df.pct_change(-1)
283+
284+
# Compute Wins
285+
wins = ((pred_gru_df.dropna().returns<0) & (pred_test_df.dropna().returns<0)) | ((pred_gru_df.dropna().returns>0) & (pred_test_df.dropna().returns>0))
286+
print(wins.value_counts())
287+
288+
returns_df = pd.concat([pred_gru_df.returns, pred_test_df.returns], axis=1).dropna()
289+
total_pos_return = pred_test_df.dropna().returns[wins].abs().sum()
290+
total_neg_return = pred_test_df.dropna().returns[np.logical_not(wins)].abs().sum()
291+
292+
# compute final capital and compare with BUY&HOLD strategy
293+
final_capital = trading_capital*(1+total_pos_return-total_neg_return)
294+
benchmark_return = (valid.Close.iloc[-1] - valid.Close.iloc[0])/valid.Close.iloc[0]
295+
bench_capital = trading_capital*(1+benchmark_return)
296+
print(final_capital, bench_capital)
297+
```
298+
299+
> returns
300+
> True 81
301+
> False 72
302+
> Name: count, dtype: int64
303+
> 10535.325897548326 9617.616876598737
304+
305+
## Conclusion
306+
As showed in the previous section, these two simple Deep Learning models exhibits interesting positive results both regarding regression and trading metrics.
307+
The latter is particularly important, indeed a return of **5%** is obtained while the stock price decreased of approximately 4%. This also lead to a very high sharpe and colmar ratio.

public/categories/finance/index.html

+5-5
Original file line numberDiff line numberDiff line change
@@ -296,14 +296,14 @@
296296
<div class="card">
297297
<div class="card-head">
298298
<a href="/posts/finance/stock_prediction/gru/" class="post-card-link">
299-
<img class="card-img-top" src='/images/default-hero.jpg' alt="Hero Image">
299+
<img class="card-img-top" src='/posts/finance/stock_prediction/gru/images/stock-market-prediction-using-data-mining-techniques.jpg' alt="Hero Image">
300300
</a>
301301
</div>
302302
<div class="card-body">
303303
<a href="/posts/finance/stock_prediction/gru/" class="post-card-link">
304-
<h5 class="card-title">Microsoft Stock Prediction using LSTM or GRU</h5>
305-
<p class="card-text post-summary">Pick a stock commodity &rsquo; &hellip; '
306-
Statistical Analysis on the price Statistical Analysis on the returns GRU Model Init Training Testing Metrics mean squared error LSTM Comparison Possible trading performance </p>
304+
<h5 class="card-title">MSFT Stock Prediction using LSTM or GRU</h5>
305+
<p class="card-text post-summary">Introduction In this article, we will explore time series data extracted from the stock market, focusing on prominent technology companies such as Apple, Amazon, Google, and Microsoft. Our objective is to equip data analysts and scientists with the essential skills to effectively manipulate and interpret stock market data.
306+
To achieve this, we will utilize the yfinance library to fetch stock information and leverage visualization tools such as Seaborn and Matplotlib to illustrate various facets of the data.</p>
307307
</a>
308308

309309
<div class="tags">
@@ -326,7 +326,7 @@ <h5 class="card-title">Microsoft Stock Prediction using LSTM or GRU</h5>
326326
<div class="card-footer">
327327
<span class="float-start">
328328
Sunday, June 16, 2024
329-
| 1 minute </span>
329+
| 6 minutes </span>
330330
<a
331331
href="/posts/finance/stock_prediction/gru/"
332332
class="float-end btn btn-outline-info btn-sm">Read</a>

public/categories/finance/index.xml

+3-3
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@
77
<generator>Hugo -- gohugo.io</generator>
88
<language>en</language>
99
<lastBuildDate>Sun, 16 Jun 2024 00:00:00 +0100</lastBuildDate><atom:link href="http://localhost:1313/categories/finance/index.xml" rel="self" type="application/rss+xml" /><item>
10-
<title>Microsoft Stock Prediction using LSTM or GRU</title>
10+
<title>MSFT Stock Prediction using LSTM or GRU</title>
1111
<link>http://localhost:1313/posts/finance/stock_prediction/gru/</link>
1212
<pubDate>Sun, 16 Jun 2024 00:00:00 +0100</pubDate>
1313

1414
<guid>http://localhost:1313/posts/finance/stock_prediction/gru/</guid>
15-
<description>Pick a stock commodity &amp;rsquo; &amp;hellip; &#39;
16-
Statistical Analysis on the price Statistical Analysis on the returns GRU Model Init Training Testing Metrics mean squared error LSTM Comparison Possible trading performance </description>
15+
<description>Introduction In this article, we will explore time series data extracted from the stock market, focusing on prominent technology companies such as Apple, Amazon, Google, and Microsoft. Our objective is to equip data analysts and scientists with the essential skills to effectively manipulate and interpret stock market data.
16+
To achieve this, we will utilize the yfinance library to fetch stock information and leverage visualization tools such as Seaborn and Matplotlib to illustrate various facets of the data.</description>
1717
</item>
1818

1919

0 commit comments

Comments
 (0)