Skip to content

Commit 74146ff

Browse files
authored
Create C. Data Allocation Challenge.md
1 parent 8a9f05b commit 74146ff

File tree

1 file changed

+247
-0
lines changed

1 file changed

+247
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
## Case Study #4: Data Bank - Data Allocation Challenge
2+
3+
To test out a few different hypotheses - the Data Bank team wants to run an experiment where different groups of customers would be allocated data using 3 different options:
4+
5+
- **Option 1**: data is allocated based off the amount of money at the end of the previous month
6+
- **Option 2**: data is allocated on the average amount of money kept in the account in the previous 30 days
7+
- **Option 3**: data is updated real-time
8+
9+
10+
For this multi-part challenge question - you have been requested to generate the following data elements to help the Data Bank team estimate how much data will need to be provisioned for each option:
11+
- running customer balance column that includes the impact each transaction
12+
```sql
13+
WITH transaction_amt_cte AS
14+
(SELECT *,
15+
month(txn_date) AS txn_month,
16+
SUM(CASE
17+
WHEN txn_type="deposit" THEN txn_amount
18+
ELSE -txn_amount
19+
END) AS net_transaction_amt
20+
FROM customer_transactions
21+
GROUP BY customer_id,
22+
txn_date
23+
ORDER BY customer_id,
24+
txn_date),
25+
running_customer_balance_cte AS
26+
(SELECT customer_id,
27+
txn_date,
28+
txn_month,
29+
txn_type,
30+
txn_amount,
31+
sum(net_transaction_amt) over(PARTITION BY customer_id
32+
ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance
33+
FROM transaction_amt_cte)
34+
SELECT *
35+
FROM running_customer_balance_cte;
36+
```
37+
- customer balance at the end of each month
38+
```sql
39+
WITH transaction_amt_cte AS
40+
(SELECT *,
41+
month(txn_date) AS txn_month,
42+
SUM(CASE
43+
WHEN txn_type="deposit" THEN txn_amount
44+
ELSE -txn_amount
45+
END) AS net_transaction_amt
46+
FROM customer_transactions
47+
GROUP BY customer_id,
48+
txn_date
49+
ORDER BY customer_id,
50+
txn_date),
51+
running_customer_balance_cte AS
52+
(SELECT customer_id,
53+
txn_date,
54+
txn_month,
55+
txn_type,
56+
txn_amount,
57+
sum(net_transaction_amt) over(PARTITION BY customer_id
58+
ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance
59+
FROM transaction_amt_cte),
60+
month_end_balance_cte AS
61+
(SELECT *,
62+
last_value(running_customer_balance) over(PARTITION BY customer_id, txn_month
63+
ORDER BY txn_month) AS month_end_balance
64+
FROM running_customer_balance_cte
65+
GROUP BY customer_id,
66+
txn_month)
67+
SELECT customer_id,
68+
txn_month,
69+
month_end_balance
70+
FROM month_end_balance_cte;
71+
```
72+
- minimum, average and maximum values of the running balance for each customer
73+
```sql
74+
WITH transaction_amt_cte AS
75+
(SELECT *,
76+
month(txn_date) AS txn_month,
77+
SUM(CASE
78+
WHEN txn_type="deposit" THEN txn_amount
79+
ELSE -txn_amount
80+
END) AS net_transaction_amt
81+
FROM customer_transactions
82+
GROUP BY customer_id,
83+
txn_date
84+
ORDER BY customer_id,
85+
txn_date),
86+
running_customer_balance_cte AS
87+
(SELECT customer_id,
88+
txn_date,
89+
txn_month,
90+
txn_type,
91+
txn_amount,
92+
sum(net_transaction_amt) over(PARTITION BY customer_id
93+
ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance
94+
FROM transaction_amt_cte
95+
GROUP BY customer_id,
96+
txn_month)
97+
SELECT customer_id,
98+
min(running_customer_balance),
99+
max(running_customer_balance),
100+
round(avg(running_customer_balance), 2) AS 'avg(running_customer_balance)'
101+
FROM running_customer_balance_cte
102+
GROUP BY customer_id
103+
ORDER BY customer_id ;
104+
```
105+
106+
107+
Using all of the data available - how much data would have been required for each option on a monthly basis?
108+
109+
### **Option 1**: Data is allocated based off the amount of money at the end of the previous month
110+
How much data would have been required on a monthly basis?
111+
112+
```sql
113+
WITH transaction_amt_cte AS
114+
(SELECT *,
115+
month(txn_date) AS txn_month,
116+
SUM(CASE
117+
WHEN txn_type="deposit" THEN txn_amount
118+
ELSE -txn_amount
119+
END) AS net_transaction_amt
120+
FROM customer_transactions
121+
GROUP BY customer_id,
122+
txn_date
123+
ORDER BY customer_id,
124+
txn_date),
125+
running_customer_balance_cte AS
126+
(SELECT customer_id,
127+
txn_date,
128+
txn_month,
129+
txn_type,
130+
txn_amount,
131+
sum(net_transaction_amt) over(PARTITION BY customer_id
132+
ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance
133+
FROM transaction_amt_cte),
134+
month_end_balance_cte AS
135+
(SELECT *,
136+
last_value(running_customer_balance) over(PARTITION BY customer_id, txn_month
137+
ORDER BY txn_month) AS month_end_balance
138+
FROM running_customer_balance_cte),
139+
customer_month_end_balance_cte AS
140+
(SELECT customer_id,
141+
txn_month,
142+
month_end_balance
143+
FROM month_end_balance_cte
144+
GROUP BY customer_id,
145+
txn_month)
146+
SELECT txn_month,
147+
sum(month_end_balance) AS data_required_per_month
148+
FROM customer_month_end_balance_cte
149+
GROUP BY txn_month
150+
ORDER BY txn_month
151+
```
152+
153+
#### Result set:
154+
![image](https://user-images.githubusercontent.com/77529445/166265817-f2bd74cf-0759-43d2-8b32-aabaa40453aa.png)
155+
156+
**Observed**: Data required per month is negative. This is caused due to negative account balance maintained by customers at the end of the month.
157+
158+
**Assumption**: Some customers do not maintain a positive account balance at the end of the month. I'm assuming that no data is allocated when the
159+
amount of money at the end of the previous month is negative. we can use **SUM(IF(month_end_balance > 0, month_end_balance, 0))** in the select clause to compute the total data requirement per month.
160+
161+
#### Result set:
162+
![image](https://user-images.githubusercontent.com/77529445/166266334-1a6ea8e8-7495-4832-90b0-3801017ab991.png)
163+
164+
***
165+
166+
### **Option 2**: Data is allocated on the average amount of money kept in the account in the previous 30 days
167+
How much data would have been required on a monthly basis?
168+
169+
```sql
170+
WITH transaction_amt_cte AS
171+
(SELECT *,
172+
month(txn_date) AS txn_month,
173+
SUM(CASE
174+
WHEN txn_type="deposit" THEN txn_amount
175+
ELSE -txn_amount
176+
END) AS net_transaction_amt
177+
FROM customer_transactions
178+
GROUP BY customer_id,
179+
txn_date
180+
ORDER BY customer_id,
181+
txn_date),
182+
running_customer_balance_cte AS
183+
(SELECT customer_id,
184+
txn_date,
185+
txn_month,
186+
txn_type,
187+
txn_amount,
188+
sum(net_transaction_amt) over(PARTITION BY customer_id
189+
ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance
190+
FROM transaction_amt_cte
191+
GROUP BY customer_id,
192+
txn_month),
193+
avg_running_customer_balance AS
194+
(SELECT customer_id,
195+
txn_month,
196+
avg(running_customer_balance) over(PARTITION BY customer_id) AS 'avg_running_customer_balance'
197+
FROM running_customer_balance_cte
198+
GROUP BY customer_id,
199+
txn_month
200+
ORDER BY customer_id)
201+
SELECT txn_month,
202+
round(sum(avg_running_customer_balance)) AS data_required_per_month
203+
FROM avg_running_customer_balance
204+
GROUP BY txn_month;
205+
```
206+
207+
#### Result set:
208+
![image](https://user-images.githubusercontent.com/77529445/166285983-4bd22c19-f272-4338-a845-56ef1137b81a.png)
209+
210+
211+
212+
### **Option 3**: Data is updated real-time
213+
How much data would have been required on a monthly basis?
214+
215+
```sql
216+
WITH transaction_amt_cte AS
217+
(SELECT *,
218+
month(txn_date) AS txn_month,
219+
SUM(CASE
220+
WHEN txn_type="deposit" THEN txn_amount
221+
ELSE -txn_amount
222+
END) AS net_transaction_amt
223+
FROM customer_transactions
224+
GROUP BY customer_id,
225+
txn_date
226+
ORDER BY customer_id,
227+
txn_date),
228+
running_customer_balance_cte AS
229+
(SELECT customer_id,
230+
txn_date,
231+
txn_month,
232+
txn_type,
233+
txn_amount,
234+
net_transaction_amt,
235+
sum(net_transaction_amt) over(PARTITION BY customer_id
236+
ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance
237+
FROM transaction_amt_cte)
238+
SELECT txn_month,
239+
SUM(running_customer_balance) AS data_required_per_month
240+
FROM running_customer_balance_cte
241+
GROUP BY txn_month;
242+
```
243+
244+
#### Result set:
245+
![image](https://user-images.githubusercontent.com/77529445/167304936-5586815b-fd25-4245-8658-c5ab8b3c54f2.png)
246+
247+
***

0 commit comments

Comments
 (0)