Cleanup

pubudu · pubudu · commit 0098db94d497 · 2025-04-23T18:04:05.000+02:00
diff --git a/content/2.NumPy_Data_Types.md b/content/2.NumPy_Data_Types.md
@@ -360,46 +360,6 @@ Understanding the distinctions between Python's general-purpose types and NumPy'
 * Apply an alternative analytical approach using Z-ratio methodology to complement standard differential expression tools like DESeq
 * Ranks immune-related genes based on their relative expression differences between the patient groups
 
-***Workflow:***
-
-```{mermaid}
-flowchart TD
-    A[Load Sample Group Info] --> B{Filter by Group}
-    B -->|iweak| C[Identify iweak samples]
-    B -->|istrong| D[Identify istrong samples]
-    
-    E[Load Count Matrix] --> F[Match columns with samples]
-    
-    F --> G[Convert to numeric]
-    G --> H[Log2 transformation]
-    
-    C --> F
-    D --> F
-    
-    H --> I1[Calculate iweak mean & std]
-    H --> I2[Calculate istrong mean & std]
-    
-    I1 --> J1[Compute Z-scores for iweak]
-    I2 --> J2[Compute Z-scores for istrong]
-    
-    J1 --> K[Calculate Z-score difference]
-    J2 --> K
-    
-    K --> L[Calculate standard deviation]
-    
-    L --> M[Compute Z-ratio]
-    
-    M --> N[Rank genes by Z-ratio]
-    
-    classDef dataNode fill:#f9f9f9,stroke:#aaa,stroke-width:2px;
-    classDef processNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
-    classDef resultNode fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
-    
-    class A,E dataNode;
-    class B,C,D,F,G,H,I1,I2,J1,J2,K,L,M processNode;
-    class N resultNode;
-```
-
 ***Dataset description:***
 
 * `test_data/Sample_group_info.csv`
diff --git a/content/3.Indexing_and_Slicing.md b/content/3.Indexing_and_Slicing.md
@@ -185,19 +185,28 @@ dna_seq = np.array([0, 1, 2, 3, 0, 0, 1, 2, 3, 3, 2, 1, 0, 0, 2, 3])  # "ACGTAAC
 
 ```python
 # 1. First 5 nucleotides
-print("First 5 nucleotides:", dna_seq[:5])  # array([0, 1, 2, 3, 0]) = "ACGTA"
+print("First 5 nucleotides:", dna_seq[:5])
 
 # 2. Last 4 nucleotides
-print("Last 4 nucleotides:", dna_seq[-4:])  # array([0, 0, 2, 3]) = "AAGT"
+print("Last 4 nucleotides:", dna_seq[-4:])
 
 # 3. Every third nucleotide
-print("Every third nucleotide:", dna_seq[::3])  # array([0, 0, 3, 0, 3]) = "AATA"
+print("Every third nucleotide:", dna_seq[::3])
 
 # 4. Subsequence from position 6 to 10
-print("Subsequence pos 6-10:", dna_seq[6:11])  # array([1, 2, 3, 3, 2]) = "CGTTG"
+print("Subsequence pos 6-10:", dna_seq[6:11])
 # Note: Upper bound is exclusive in slicing, so we use 11 to include position 10
 ```
 
+Output
+
+```none
+First 5 nucleotides: [0 1 2 3 0]
+Last 4 nucleotides: [0 0 2 3]
+Every third nucleotide: [0 3 1 3 0 3]
+Subsequence pos 6-10: [1 2 3 3 2]
+```
+
 :::
 
 :::{exercise}
@@ -220,31 +229,43 @@ gene_expr = np.array([
 **Tasks:**
 
 1. Extract the expression values for Gene 3
-2. Extract the expression values for all genes under condition 4 (fifth column)
-3. Extract a sub-matrix containing Genes 2-4 under conditions 2-3
-4. Find the expression value for Gene 5 under condition 2
+2. Extract the expression values for all genes under fifth column
+3. Extract a sub-matrix containing Genes 2-4 under columns 2-3
+4. Find the expression value for Gene 5 under columns 2
 
 :::
 
 :::{solution}
 
 ```python
 # 1. Expression values for Gene 3
-print("Gene 3 expression:", gene_expr[2])  # array([8.4, 7.5, 9.2, 8.1, 10.5])
+print("Gene 3 expression:", gene_expr[2])
 # Alternative: gene_expr[2, :]
 
-# 2. Expression values for all genes under condition 4
-print("Condition 4 expression:", gene_expr[:, 4])  # array([25.3, 19.7, 10.5, 36.2, 18.2])
+# 2. Expression values for all genes under column 5
+print("Condition 4 expression:", gene_expr[:, 4]) 
 
-# 3. Sub-matrix of Genes 2-4 under conditions 2-3
-print("Sub-matrix (Genes 2-4, Conditions 2-3):")
+# 3. Sub-matrix of Genes 2-4 under columns 2-3
+print("Sub-matrix (Genes 2-4, columns 2-3):")
 print(gene_expr[1:4, 1:3])
 # array([[38.1, 29.6],
 #        [7.5,  9.2],
 #        [29.8, 27.5]])
 
-# 4. Expression value for Gene 5 under condition 2
-print("Gene 5, Condition 2:", gene_expr[4, 1])  # 19.8
+# 4. Expression value for Gene 5 under columns 2
+print("Gene 5, columns 2:", gene_expr[4, 1]) 
+```
+
+Output
+
+```none
+Gene 3 expression: [ 8.4  7.5  9.2  8.1 10.5]
+Condition 4 expression: [25.3 19.7 10.5 36.2 18.2]
+Sub-matrix (Genes 2-4, columns 2-3):
+[[38.1 29.6]
+ [ 7.5  9.2]
+ [29.8 27.5]]
+Gene 5, columns 2: 19.8
 ```
 
 :::
@@ -253,7 +274,7 @@ print("Gene 5, Condition 2:", gene_expr[4, 1])  # 19.8
 
 ## Exercise 3: Multi-sequence Alignment Analysis (2-3 minutes)
 
-Consider a simplified alignment scoring matrix where each row represents a protein sequence and each column represents a position in the alignment:
+Consider a simplified alignment scoring matrix where each row represents a match (1) or mismatch (0) and each column represents a position in the alignment:
 
 ```python
 import numpy as np
@@ -280,79 +301,33 @@ alignment_scores = np.array([
 # 1. Positions where all sequences match
 all_match = np.all(alignment_scores == 1, axis=0)
 print("Positions where all sequences match:", np.where(all_match)[0]) 
-# array([3]) - only position 3 has all matches
 
 # 2. Scores for positions 3-7 for all sequences
 print("Positions 3-7 scores:")
 print(alignment_scores[:, 3:8])
-# array([[1, 0, 1, 0, 0],
-#        [1, 0, 0, 1, 1],
-#        [1, 1, 0, 0, 1],
-#        [1, 1, 1, 0, 0]])
 
 # 3. Matching pattern for Sequence 3
 seq3_matches = alignment_scores[2] == 1
 print("Sequence 3 match positions:", np.where(seq3_matches)[0])
-# array([1, 2, 3, 4, 7])
 
 # 4. Sub-alignment of first two sequences for last five positions
 print("Sub-alignment (Seq 1-2, last 5 positions):")
 print(alignment_scores[0:2, 5:])
-# array([[1, 0, 0, 1, 1],
-#        [0, 1, 1, 0, 1]])
 ```
 
-:::
-
-:::{exercise}
-
-## Exercise 4: Combining Indexing and Boolean Operations (2-3 minutes)
-
-Using the gene expression matrix from Exercise 2:
-
-```python
-import numpy as np
-gene_expr = np.array([
-    [15.2, 21.5, 18.9, 11.8, 25.3],  # Gene 1
-    [42.3, 38.1, 29.6, 33.2, 19.7],  # Gene 2
-    [8.4,  7.5,  9.2,  8.1,  10.5],  # Gene 3
-    [31.6, 29.8, 27.5, 34.9, 36.2],  # Gene 4
-    [17.3, 19.8, 22.5, 21.3, 18.2]   # Gene 5
-])
-```
-
-**Tasks:**
-
-1. Find all expression values greater than 30
-2. Identify which genes have at least one expression value greater than 30
-3. Create a boolean mask showing positions where expression is between 15 and 25
-4. Extract all expression values from condition 2 that are less than 20
-
-:::
-
-:::{solution}
-
-```python
-# 1. Expression values greater than 30
-high_expr = gene_expr > 30
-print("Values > 30:", gene_expr[high_expr])
-# array([42.3, 38.1, 31.6, 33.2, 34.9, 36.2])
-
-# 2. Genes with at least one expression value > 30
-genes_with_high_expr = np.any(gene_expr > 30, axis=1)
-print("Genes with expression > 30:", np.where(genes_with_high_expr)[0])
-# array([1, 3]) - Gene 2 and Gene 4 (indices 1 and 3)
-
-# 3. Boolean mask for expression between 15 and 25
-mid_range_expr = (gene_expr >= 15) & (gene_expr <= 25)
-print("Expression between 15-25:")
-print(mid_range_expr)
-# Boolean matrix where True indicates values between 15-25
-
-# 4. Expression values from condition 2 that are less than 20
-condition2_low = gene_expr[:, 1] < 20
-print("Condition 2 values < 20:", gene_expr[:, 1][condition2_low])
-# array([7.5, 19.8])
+Output
+
+```none
+Positions where all sequences match: [3]
+Positions 3-7 scores:
+[[1 0 1 0 0]
+ [1 0 0 1 1]
+ [1 1 0 0 1]
+ [1 1 1 0 0]]
+Sequence 3 match positions: [1 2 3 4 7]
+Sub-alignment (Seq 1-2, last 5 positions):
+[[1 0 0 1 1]
+ [0 1 1 0 1]]
 ```
 
 :::
diff --git a/content/4.Advance_indexing_filtering.md b/content/4.Advance_indexing_filtering.md
@@ -223,6 +223,11 @@ For large datasets, these techniques are drastically faster than traditional ite
 
 Create a NumPy array of 20 random integers between 0 and 100. Then:
 
+```python
+np.random.seed(42)  # for reproducibility
+numbers = np.random.randint(0, 101, 20)
+```
+
 * Create a boolean mask to identify all numbers divisible by 7
 * Use the mask to extract these numbers
 * Count how many numbers are divisible by 7
@@ -252,21 +257,15 @@ count = np.sum(mask)  # True values are treated as 1, False as 0
 print(f"Count of numbers divisible by 7: {count}")
 ```
 
-:::
-
-:::{exercise}
+Output
 
-**Exercise 2 - Combined Conditions:**
-
-Generate a NumPy array of 30 random integers between -50 and 50. Then:
-
-Create a mask to find all numbers that are both positive and even
-Create another mask to find all numbers that are either negative or divisible by 5
-Apply both masks to the array and display the results
-
-:::
-
-:::{solution}
+```none
+Original array: [51 92 14 71 60 20 82 86 74 74 87 99 23  2 21 52  1 87 29 37]
+Boolean mask: [False False  True False False False False False False False False False
+ False False  True False False False False False]
+Numbers divisible by 7: [14 21]
+Count of numbers divisible by 7: 2
+```
 
 :::
 
@@ -275,11 +274,12 @@ Apply both masks to the array and display the results
 
 Create a 4x4 matrix of random integers between 1 and 20. Then:
 
+```python
+np.random.seed(42)
+matrix = np.random.randint(1, 21, (4, 4))
+```
+
 * Use np.where() to replace all odd numbers with -1 while keeping even numbers unchanged
-* Use np.where() again to create a new matrix where:
-  * Numbers less than 10 remain the same
-  * Numbers between 10 and 15 are replaced with 100
-  * Numbers greater than 15 are replaced with 200
 :::
 
 :::{solution}
@@ -296,12 +296,22 @@ odd_replaced = np.where(matrix % 2 == 0, matrix, -1)
 print("\nMatrix with odd numbers replaced by -1:")
 print(odd_replaced)
 
-# Replace based on value ranges
-transformed = np.where(matrix < 10, matrix, 
-                       np.where((matrix >= 10) & (matrix <= 15), 100, 200))
-print("\nMatrix with conditional replacements:")
-print(transformed)
+```
+
+Output
 
+```none
+Original matrix:
+[[ 7 20 15 11]
+ [ 8  7 19 11]
+ [11  4  8  3]
+ [ 2 12  6  2]]
+
+Matrix with odd numbers replaced by -1:
+[[-1 20 -1 -1]
+ [ 8 -1 -1 -1]
+ [-1  4  8 -1]
+ [ 2 12  6  2]]
 ```
 
 :::
@@ -336,6 +346,13 @@ print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")
 
 ```
 
+Output
+
+```none
+DNA sequence: GCAGGCAAGTGGGGCACCCGTATCCTTTCCAACTTACAAGGGTCCCCGTT
+A: 10, T: 11, G: 13, C: 16
+```
+
 :::
 
 ## Key Takeaways
@@ -361,6 +378,27 @@ print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")
    1. Filter samples by group (`iweak`/`istrong`)
    2. Match count matrix columns with sample IDs
 
+***Workflow:***
+
+```{mermaid}
+flowchart TD
+    A[Load Sample Group Info] --> B{Filter by Group}
+    B -->|iweak| C[Identify iweak samples]
+    B -->|istrong| D[Identify istrong samples]
+    
+    E[Load Count Matrix] --> F[Match columns with samples]
+    
+    
+    classDef dataNode fill:#f9f9f9,stroke:#aaa,stroke-width:2px;
+    classDef processNode fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
+    classDef resultNode fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
+    
+    class A,E dataNode;
+    class B,C,D,F processNode;
+    class N resultNode;
+```
+
+
 :::{exercise} Hands-on
 
 ```python
diff --git a/content/5.Essential_array_operations.md b/content/5.Essential_array_operations.md