|
| 1 | +## Case Study #4: Data Bank - Data Allocation Challenge |
| 2 | + |
| 3 | +To test out a few different hypotheses - the Data Bank team wants to run an experiment where different groups of customers would be allocated data using 3 different options: |
| 4 | + |
| 5 | +- **Option 1**: data is allocated based off the amount of money at the end of the previous month |
| 6 | +- **Option 2**: data is allocated on the average amount of money kept in the account in the previous 30 days |
| 7 | +- **Option 3**: data is updated real-time |
| 8 | + |
| 9 | + |
| 10 | +For this multi-part challenge question - you have been requested to generate the following data elements to help the Data Bank team estimate how much data will need to be provisioned for each option: |
| 11 | +- running customer balance column that includes the impact each transaction |
| 12 | +```sql |
| 13 | + WITH transaction_amt_cte AS |
| 14 | + (SELECT *, |
| 15 | + month(txn_date) AS txn_month, |
| 16 | + SUM(CASE |
| 17 | + WHEN txn_type="deposit" THEN txn_amount |
| 18 | + ELSE -txn_amount |
| 19 | + END) AS net_transaction_amt |
| 20 | + FROM customer_transactions |
| 21 | + GROUP BY customer_id, |
| 22 | + txn_date |
| 23 | + ORDER BY customer_id, |
| 24 | + txn_date), |
| 25 | + running_customer_balance_cte AS |
| 26 | + (SELECT customer_id, |
| 27 | + txn_date, |
| 28 | + txn_month, |
| 29 | + txn_type, |
| 30 | + txn_amount, |
| 31 | + sum(net_transaction_amt) over(PARTITION BY customer_id |
| 32 | + ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance |
| 33 | + FROM transaction_amt_cte) |
| 34 | +SELECT * |
| 35 | +FROM running_customer_balance_cte; |
| 36 | +``` |
| 37 | +- customer balance at the end of each month |
| 38 | +```sql |
| 39 | + WITH transaction_amt_cte AS |
| 40 | + (SELECT *, |
| 41 | + month(txn_date) AS txn_month, |
| 42 | + SUM(CASE |
| 43 | + WHEN txn_type="deposit" THEN txn_amount |
| 44 | + ELSE -txn_amount |
| 45 | + END) AS net_transaction_amt |
| 46 | + FROM customer_transactions |
| 47 | + GROUP BY customer_id, |
| 48 | + txn_date |
| 49 | + ORDER BY customer_id, |
| 50 | + txn_date), |
| 51 | + running_customer_balance_cte AS |
| 52 | + (SELECT customer_id, |
| 53 | + txn_date, |
| 54 | + txn_month, |
| 55 | + txn_type, |
| 56 | + txn_amount, |
| 57 | + sum(net_transaction_amt) over(PARTITION BY customer_id |
| 58 | + ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance |
| 59 | + FROM transaction_amt_cte), |
| 60 | + month_end_balance_cte AS |
| 61 | + (SELECT *, |
| 62 | + last_value(running_customer_balance) over(PARTITION BY customer_id, txn_month |
| 63 | + ORDER BY txn_month) AS month_end_balance |
| 64 | + FROM running_customer_balance_cte |
| 65 | + GROUP BY customer_id, |
| 66 | + txn_month) |
| 67 | +SELECT customer_id, |
| 68 | + txn_month, |
| 69 | + month_end_balance |
| 70 | +FROM month_end_balance_cte; |
| 71 | +``` |
| 72 | +- minimum, average and maximum values of the running balance for each customer |
| 73 | +```sql |
| 74 | +WITH transaction_amt_cte AS |
| 75 | + (SELECT *, |
| 76 | + month(txn_date) AS txn_month, |
| 77 | + SUM(CASE |
| 78 | + WHEN txn_type="deposit" THEN txn_amount |
| 79 | + ELSE -txn_amount |
| 80 | + END) AS net_transaction_amt |
| 81 | + FROM customer_transactions |
| 82 | + GROUP BY customer_id, |
| 83 | + txn_date |
| 84 | + ORDER BY customer_id, |
| 85 | + txn_date), |
| 86 | + running_customer_balance_cte AS |
| 87 | + (SELECT customer_id, |
| 88 | + txn_date, |
| 89 | + txn_month, |
| 90 | + txn_type, |
| 91 | + txn_amount, |
| 92 | + sum(net_transaction_amt) over(PARTITION BY customer_id |
| 93 | + ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance |
| 94 | + FROM transaction_amt_cte |
| 95 | + GROUP BY customer_id, |
| 96 | + txn_month) |
| 97 | +SELECT customer_id, |
| 98 | + min(running_customer_balance), |
| 99 | + max(running_customer_balance), |
| 100 | + round(avg(running_customer_balance), 2) AS 'avg(running_customer_balance)' |
| 101 | +FROM running_customer_balance_cte |
| 102 | +GROUP BY customer_id |
| 103 | +ORDER BY customer_id ; |
| 104 | +``` |
| 105 | + |
| 106 | + |
| 107 | +Using all of the data available - how much data would have been required for each option on a monthly basis? |
| 108 | + |
| 109 | +### **Option 1**: Data is allocated based off the amount of money at the end of the previous month |
| 110 | +How much data would have been required on a monthly basis? |
| 111 | + |
| 112 | +```sql |
| 113 | +WITH transaction_amt_cte AS |
| 114 | + (SELECT *, |
| 115 | + month(txn_date) AS txn_month, |
| 116 | + SUM(CASE |
| 117 | + WHEN txn_type="deposit" THEN txn_amount |
| 118 | + ELSE -txn_amount |
| 119 | + END) AS net_transaction_amt |
| 120 | + FROM customer_transactions |
| 121 | + GROUP BY customer_id, |
| 122 | + txn_date |
| 123 | + ORDER BY customer_id, |
| 124 | + txn_date), |
| 125 | + running_customer_balance_cte AS |
| 126 | + (SELECT customer_id, |
| 127 | + txn_date, |
| 128 | + txn_month, |
| 129 | + txn_type, |
| 130 | + txn_amount, |
| 131 | + sum(net_transaction_amt) over(PARTITION BY customer_id |
| 132 | + ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance |
| 133 | + FROM transaction_amt_cte), |
| 134 | + month_end_balance_cte AS |
| 135 | + (SELECT *, |
| 136 | + last_value(running_customer_balance) over(PARTITION BY customer_id, txn_month |
| 137 | + ORDER BY txn_month) AS month_end_balance |
| 138 | + FROM running_customer_balance_cte), |
| 139 | + customer_month_end_balance_cte AS |
| 140 | + (SELECT customer_id, |
| 141 | + txn_month, |
| 142 | + month_end_balance |
| 143 | + FROM month_end_balance_cte |
| 144 | + GROUP BY customer_id, |
| 145 | + txn_month) |
| 146 | +SELECT txn_month, |
| 147 | + sum(month_end_balance) AS data_required_per_month |
| 148 | +FROM customer_month_end_balance_cte |
| 149 | +GROUP BY txn_month |
| 150 | +ORDER BY txn_month |
| 151 | +``` |
| 152 | + |
| 153 | +#### Result set: |
| 154 | + |
| 155 | + |
| 156 | +**Observed**: Data required per month is negative. This is caused due to negative account balance maintained by customers at the end of the month. |
| 157 | + |
| 158 | +**Assumption**: Some customers do not maintain a positive account balance at the end of the month. I'm assuming that no data is allocated when the |
| 159 | +amount of money at the end of the previous month is negative. we can use **SUM(IF(month_end_balance > 0, month_end_balance, 0))** in the select clause to compute the total data requirement per month. |
| 160 | + |
| 161 | +#### Result set: |
| 162 | + |
| 163 | + |
| 164 | +*** |
| 165 | + |
| 166 | +### **Option 2**: Data is allocated on the average amount of money kept in the account in the previous 30 days |
| 167 | +How much data would have been required on a monthly basis? |
| 168 | + |
| 169 | +```sql |
| 170 | +WITH transaction_amt_cte AS |
| 171 | + (SELECT *, |
| 172 | + month(txn_date) AS txn_month, |
| 173 | + SUM(CASE |
| 174 | + WHEN txn_type="deposit" THEN txn_amount |
| 175 | + ELSE -txn_amount |
| 176 | + END) AS net_transaction_amt |
| 177 | + FROM customer_transactions |
| 178 | + GROUP BY customer_id, |
| 179 | + txn_date |
| 180 | + ORDER BY customer_id, |
| 181 | + txn_date), |
| 182 | + running_customer_balance_cte AS |
| 183 | + (SELECT customer_id, |
| 184 | + txn_date, |
| 185 | + txn_month, |
| 186 | + txn_type, |
| 187 | + txn_amount, |
| 188 | + sum(net_transaction_amt) over(PARTITION BY customer_id |
| 189 | + ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance |
| 190 | + FROM transaction_amt_cte |
| 191 | + GROUP BY customer_id, |
| 192 | + txn_month), |
| 193 | + avg_running_customer_balance AS |
| 194 | + (SELECT customer_id, |
| 195 | + txn_month, |
| 196 | + avg(running_customer_balance) over(PARTITION BY customer_id) AS 'avg_running_customer_balance' |
| 197 | + FROM running_customer_balance_cte |
| 198 | + GROUP BY customer_id, |
| 199 | + txn_month |
| 200 | + ORDER BY customer_id) |
| 201 | +SELECT txn_month, |
| 202 | + round(sum(avg_running_customer_balance)) AS data_required_per_month |
| 203 | +FROM avg_running_customer_balance |
| 204 | +GROUP BY txn_month; |
| 205 | +``` |
| 206 | + |
| 207 | +#### Result set: |
| 208 | + |
| 209 | + |
| 210 | + |
| 211 | + |
| 212 | +### **Option 3**: Data is updated real-time |
| 213 | +How much data would have been required on a monthly basis? |
| 214 | + |
| 215 | +```sql |
| 216 | +WITH transaction_amt_cte AS |
| 217 | + (SELECT *, |
| 218 | + month(txn_date) AS txn_month, |
| 219 | + SUM(CASE |
| 220 | + WHEN txn_type="deposit" THEN txn_amount |
| 221 | + ELSE -txn_amount |
| 222 | + END) AS net_transaction_amt |
| 223 | + FROM customer_transactions |
| 224 | + GROUP BY customer_id, |
| 225 | + txn_date |
| 226 | + ORDER BY customer_id, |
| 227 | + txn_date), |
| 228 | + running_customer_balance_cte AS |
| 229 | + (SELECT customer_id, |
| 230 | + txn_date, |
| 231 | + txn_month, |
| 232 | + txn_type, |
| 233 | + txn_amount, |
| 234 | + net_transaction_amt, |
| 235 | + sum(net_transaction_amt) over(PARTITION BY customer_id |
| 236 | + ORDER BY txn_month ROWS BETWEEN UNBOUNDED preceding AND CURRENT ROW) AS running_customer_balance |
| 237 | + FROM transaction_amt_cte) |
| 238 | +SELECT txn_month, |
| 239 | + SUM(running_customer_balance) AS data_required_per_month |
| 240 | +FROM running_customer_balance_cte |
| 241 | +GROUP BY txn_month; |
| 242 | +``` |
| 243 | + |
| 244 | +#### Result set: |
| 245 | + |
| 246 | + |
| 247 | +*** |
0 commit comments