Skip to content

Commit 0098db9

Browse files
author
pubudu
committed
Cleanup
1 parent b8fcb7b commit 0098db9

4 files changed

+139
-137
lines changed

content/2.NumPy_Data_Types.md

-40
Original file line numberDiff line numberDiff line change
@@ -360,46 +360,6 @@ Understanding the distinctions between Python's general-purpose types and NumPy'
360360
* Apply an alternative analytical approach using Z-ratio methodology to complement standard differential expression tools like DESeq
361361
* Ranks immune-related genes based on their relative expression differences between the patient groups
362362

363-
***Workflow:***
364-
365-
```{mermaid}
366-
flowchart TD
367-
A[Load Sample Group Info] --> B{Filter by Group}
368-
B -->|iweak| C[Identify iweak samples]
369-
B -->|istrong| D[Identify istrong samples]
370-
371-
E[Load Count Matrix] --> F[Match columns with samples]
372-
373-
F --> G[Convert to numeric]
374-
G --> H[Log2 transformation]
375-
376-
C --> F
377-
D --> F
378-
379-
H --> I1[Calculate iweak mean & std]
380-
H --> I2[Calculate istrong mean & std]
381-
382-
I1 --> J1[Compute Z-scores for iweak]
383-
I2 --> J2[Compute Z-scores for istrong]
384-
385-
J1 --> K[Calculate Z-score difference]
386-
J2 --> K
387-
388-
K --> L[Calculate standard deviation]
389-
390-
L --> M[Compute Z-ratio]
391-
392-
M --> N[Rank genes by Z-ratio]
393-
394-
classDef dataNode fill:#f9f9f9,stroke:#aaa,stroke-width:2px;
395-
classDef processNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
396-
classDef resultNode fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
397-
398-
class A,E dataNode;
399-
class B,C,D,F,G,H,I1,I2,J1,J2,K,L,M processNode;
400-
class N resultNode;
401-
```
402-
403363
***Dataset description:***
404364

405365
* `test_data/Sample_group_info.csv`

content/3.Indexing_and_Slicing.md

+49-74
Original file line numberDiff line numberDiff line change
@@ -185,19 +185,28 @@ dna_seq = np.array([0, 1, 2, 3, 0, 0, 1, 2, 3, 3, 2, 1, 0, 0, 2, 3]) # "ACGTAAC
185185

186186
```python
187187
# 1. First 5 nucleotides
188-
print("First 5 nucleotides:", dna_seq[:5]) # array([0, 1, 2, 3, 0]) = "ACGTA"
188+
print("First 5 nucleotides:", dna_seq[:5])
189189

190190
# 2. Last 4 nucleotides
191-
print("Last 4 nucleotides:", dna_seq[-4:]) # array([0, 0, 2, 3]) = "AAGT"
191+
print("Last 4 nucleotides:", dna_seq[-4:])
192192

193193
# 3. Every third nucleotide
194-
print("Every third nucleotide:", dna_seq[::3]) # array([0, 0, 3, 0, 3]) = "AATA"
194+
print("Every third nucleotide:", dna_seq[::3])
195195

196196
# 4. Subsequence from position 6 to 10
197-
print("Subsequence pos 6-10:", dna_seq[6:11]) # array([1, 2, 3, 3, 2]) = "CGTTG"
197+
print("Subsequence pos 6-10:", dna_seq[6:11])
198198
# Note: Upper bound is exclusive in slicing, so we use 11 to include position 10
199199
```
200200

201+
Output
202+
203+
```none
204+
First 5 nucleotides: [0 1 2 3 0]
205+
Last 4 nucleotides: [0 0 2 3]
206+
Every third nucleotide: [0 3 1 3 0 3]
207+
Subsequence pos 6-10: [1 2 3 3 2]
208+
```
209+
201210
:::
202211

203212
:::{exercise}
@@ -220,31 +229,43 @@ gene_expr = np.array([
220229
**Tasks:**
221230

222231
1. Extract the expression values for Gene 3
223-
2. Extract the expression values for all genes under condition 4 (fifth column)
224-
3. Extract a sub-matrix containing Genes 2-4 under conditions 2-3
225-
4. Find the expression value for Gene 5 under condition 2
232+
2. Extract the expression values for all genes under fifth column
233+
3. Extract a sub-matrix containing Genes 2-4 under columns 2-3
234+
4. Find the expression value for Gene 5 under columns 2
226235

227236
:::
228237

229238
:::{solution}
230239

231240
```python
232241
# 1. Expression values for Gene 3
233-
print("Gene 3 expression:", gene_expr[2]) # array([8.4, 7.5, 9.2, 8.1, 10.5])
242+
print("Gene 3 expression:", gene_expr[2])
234243
# Alternative: gene_expr[2, :]
235244

236-
# 2. Expression values for all genes under condition 4
237-
print("Condition 4 expression:", gene_expr[:, 4]) # array([25.3, 19.7, 10.5, 36.2, 18.2])
245+
# 2. Expression values for all genes under column 5
246+
print("Condition 4 expression:", gene_expr[:, 4])
238247

239-
# 3. Sub-matrix of Genes 2-4 under conditions 2-3
240-
print("Sub-matrix (Genes 2-4, Conditions 2-3):")
248+
# 3. Sub-matrix of Genes 2-4 under columns 2-3
249+
print("Sub-matrix (Genes 2-4, columns 2-3):")
241250
print(gene_expr[1:4, 1:3])
242251
# array([[38.1, 29.6],
243252
# [7.5, 9.2],
244253
# [29.8, 27.5]])
245254

246-
# 4. Expression value for Gene 5 under condition 2
247-
print("Gene 5, Condition 2:", gene_expr[4, 1]) # 19.8
255+
# 4. Expression value for Gene 5 under columns 2
256+
print("Gene 5, columns 2:", gene_expr[4, 1])
257+
```
258+
259+
Output
260+
261+
```none
262+
Gene 3 expression: [ 8.4 7.5 9.2 8.1 10.5]
263+
Condition 4 expression: [25.3 19.7 10.5 36.2 18.2]
264+
Sub-matrix (Genes 2-4, columns 2-3):
265+
[[38.1 29.6]
266+
[ 7.5 9.2]
267+
[29.8 27.5]]
268+
Gene 5, columns 2: 19.8
248269
```
249270

250271
:::
@@ -253,7 +274,7 @@ print("Gene 5, Condition 2:", gene_expr[4, 1]) # 19.8
253274

254275
## Exercise 3: Multi-sequence Alignment Analysis (2-3 minutes)
255276

256-
Consider a simplified alignment scoring matrix where each row represents a protein sequence and each column represents a position in the alignment:
277+
Consider a simplified alignment scoring matrix where each row represents a match (1) or mismatch (0) and each column represents a position in the alignment:
257278

258279
```python
259280
import numpy as np
@@ -280,79 +301,33 @@ alignment_scores = np.array([
280301
# 1. Positions where all sequences match
281302
all_match = np.all(alignment_scores == 1, axis=0)
282303
print("Positions where all sequences match:", np.where(all_match)[0])
283-
# array([3]) - only position 3 has all matches
284304

285305
# 2. Scores for positions 3-7 for all sequences
286306
print("Positions 3-7 scores:")
287307
print(alignment_scores[:, 3:8])
288-
# array([[1, 0, 1, 0, 0],
289-
# [1, 0, 0, 1, 1],
290-
# [1, 1, 0, 0, 1],
291-
# [1, 1, 1, 0, 0]])
292308

293309
# 3. Matching pattern for Sequence 3
294310
seq3_matches = alignment_scores[2] == 1
295311
print("Sequence 3 match positions:", np.where(seq3_matches)[0])
296-
# array([1, 2, 3, 4, 7])
297312

298313
# 4. Sub-alignment of first two sequences for last five positions
299314
print("Sub-alignment (Seq 1-2, last 5 positions):")
300315
print(alignment_scores[0:2, 5:])
301-
# array([[1, 0, 0, 1, 1],
302-
# [0, 1, 1, 0, 1]])
303316
```
304317

305-
:::
306-
307-
:::{exercise}
308-
309-
## Exercise 4: Combining Indexing and Boolean Operations (2-3 minutes)
310-
311-
Using the gene expression matrix from Exercise 2:
312-
313-
```python
314-
import numpy as np
315-
gene_expr = np.array([
316-
[15.2, 21.5, 18.9, 11.8, 25.3], # Gene 1
317-
[42.3, 38.1, 29.6, 33.2, 19.7], # Gene 2
318-
[8.4, 7.5, 9.2, 8.1, 10.5], # Gene 3
319-
[31.6, 29.8, 27.5, 34.9, 36.2], # Gene 4
320-
[17.3, 19.8, 22.5, 21.3, 18.2] # Gene 5
321-
])
322-
```
323-
324-
**Tasks:**
325-
326-
1. Find all expression values greater than 30
327-
2. Identify which genes have at least one expression value greater than 30
328-
3. Create a boolean mask showing positions where expression is between 15 and 25
329-
4. Extract all expression values from condition 2 that are less than 20
330-
331-
:::
332-
333-
:::{solution}
334-
335-
```python
336-
# 1. Expression values greater than 30
337-
high_expr = gene_expr > 30
338-
print("Values > 30:", gene_expr[high_expr])
339-
# array([42.3, 38.1, 31.6, 33.2, 34.9, 36.2])
340-
341-
# 2. Genes with at least one expression value > 30
342-
genes_with_high_expr = np.any(gene_expr > 30, axis=1)
343-
print("Genes with expression > 30:", np.where(genes_with_high_expr)[0])
344-
# array([1, 3]) - Gene 2 and Gene 4 (indices 1 and 3)
345-
346-
# 3. Boolean mask for expression between 15 and 25
347-
mid_range_expr = (gene_expr >= 15) & (gene_expr <= 25)
348-
print("Expression between 15-25:")
349-
print(mid_range_expr)
350-
# Boolean matrix where True indicates values between 15-25
351-
352-
# 4. Expression values from condition 2 that are less than 20
353-
condition2_low = gene_expr[:, 1] < 20
354-
print("Condition 2 values < 20:", gene_expr[:, 1][condition2_low])
355-
# array([7.5, 19.8])
318+
Output
319+
320+
```none
321+
Positions where all sequences match: [3]
322+
Positions 3-7 scores:
323+
[[1 0 1 0 0]
324+
[1 0 0 1 1]
325+
[1 1 0 0 1]
326+
[1 1 1 0 0]]
327+
Sequence 3 match positions: [1 2 3 4 7]
328+
Sub-alignment (Seq 1-2, last 5 positions):
329+
[[1 0 0 1 1]
330+
[0 1 1 0 1]]
356331
```
357332

358333
:::

content/4.Advance_indexing_filtering.md

+61-23
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,11 @@ For large datasets, these techniques are drastically faster than traditional ite
223223

224224
Create a NumPy array of 20 random integers between 0 and 100. Then:
225225

226+
```python
227+
np.random.seed(42) # for reproducibility
228+
numbers = np.random.randint(0, 101, 20)
229+
```
230+
226231
* Create a boolean mask to identify all numbers divisible by 7
227232
* Use the mask to extract these numbers
228233
* Count how many numbers are divisible by 7
@@ -252,21 +257,15 @@ count = np.sum(mask) # True values are treated as 1, False as 0
252257
print(f"Count of numbers divisible by 7: {count}")
253258
```
254259

255-
:::
256-
257-
:::{exercise}
260+
Output
258261

259-
**Exercise 2 - Combined Conditions:**
260-
261-
Generate a NumPy array of 30 random integers between -50 and 50. Then:
262-
263-
Create a mask to find all numbers that are both positive and even
264-
Create another mask to find all numbers that are either negative or divisible by 5
265-
Apply both masks to the array and display the results
266-
267-
:::
268-
269-
:::{solution}
262+
```none
263+
Original array: [51 92 14 71 60 20 82 86 74 74 87 99 23 2 21 52 1 87 29 37]
264+
Boolean mask: [False False True False False False False False False False False False
265+
False False True False False False False False]
266+
Numbers divisible by 7: [14 21]
267+
Count of numbers divisible by 7: 2
268+
```
270269

271270
:::
272271

@@ -275,11 +274,12 @@ Apply both masks to the array and display the results
275274

276275
Create a 4x4 matrix of random integers between 1 and 20. Then:
277276

277+
```python
278+
np.random.seed(42)
279+
matrix = np.random.randint(1, 21, (4, 4))
280+
```
281+
278282
* Use np.where() to replace all odd numbers with -1 while keeping even numbers unchanged
279-
* Use np.where() again to create a new matrix where:
280-
* Numbers less than 10 remain the same
281-
* Numbers between 10 and 15 are replaced with 100
282-
* Numbers greater than 15 are replaced with 200
283283
:::
284284

285285
:::{solution}
@@ -296,12 +296,22 @@ odd_replaced = np.where(matrix % 2 == 0, matrix, -1)
296296
print("\nMatrix with odd numbers replaced by -1:")
297297
print(odd_replaced)
298298

299-
# Replace based on value ranges
300-
transformed = np.where(matrix < 10, matrix,
301-
np.where((matrix >= 10) & (matrix <= 15), 100, 200))
302-
print("\nMatrix with conditional replacements:")
303-
print(transformed)
299+
```
300+
301+
Output
304302

303+
```none
304+
Original matrix:
305+
[[ 7 20 15 11]
306+
[ 8 7 19 11]
307+
[11 4 8 3]
308+
[ 2 12 6 2]]
309+
310+
Matrix with odd numbers replaced by -1:
311+
[[-1 20 -1 -1]
312+
[ 8 -1 -1 -1]
313+
[-1 4 8 -1]
314+
[ 2 12 6 2]]
305315
```
306316

307317
:::
@@ -336,6 +346,13 @@ print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")
336346

337347
```
338348

349+
Output
350+
351+
```none
352+
DNA sequence: GCAGGCAAGTGGGGCACCCGTATCCTTTCCAACTTACAAGGGTCCCCGTT
353+
A: 10, T: 11, G: 13, C: 16
354+
```
355+
339356
:::
340357

341358
## Key Takeaways
@@ -361,6 +378,27 @@ print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")
361378
1. Filter samples by group (`iweak`/`istrong`)
362379
2. Match count matrix columns with sample IDs
363380

381+
***Workflow:***
382+
383+
```{mermaid}
384+
flowchart TD
385+
A[Load Sample Group Info] --> B{Filter by Group}
386+
B -->|iweak| C[Identify iweak samples]
387+
B -->|istrong| D[Identify istrong samples]
388+
389+
E[Load Count Matrix] --> F[Match columns with samples]
390+
391+
392+
classDef dataNode fill:#f9f9f9,stroke:#aaa,stroke-width:2px;
393+
classDef processNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
394+
classDef resultNode fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
395+
396+
class A,E dataNode;
397+
class B,C,D,F processNode;
398+
class N resultNode;
399+
```
400+
401+
364402
:::{exercise} Hands-on
365403

366404
```python

0 commit comments

Comments
 (0)