Skip to content

Commit 9324ca5

Browse files
John FonnerJohn Fonner
John Fonner
authored and
John Fonner
committed
updates, test data, and the modules section
1 parent 960c22d commit 9324ca5

26 files changed

+5205
-280
lines changed

shell/03-pipefilter.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11

22
#### Objectives
3+
4+
---
5+
36
* Redirect a command's output to a file.
47
* Process a file instead of keyboard input using redirection.
58
* Construct command pipelines with two or more stages.
69
* Explain what usually happens if a program or pipeline isn't given any input to process.
710
* Explain Unix's "small pieces, loosely joined" philosophy.
811

12+
---
913

1014
Now that we know a few basic commands, we can finally look at the shell's most powerful feature:
1115
the ease with which it lets us combine existing programs in new ways.
@@ -22,7 +26,7 @@ Let's go into that directory with `cd` and run the command `wc *.pdb`.
2226
`wc` is the "word count" command: it counts the number of lines, words, and characters in files.
2327
The `*` in `*.pdb` matches zero or more characters, so the shell turns `*.pdb` into a complete list of `.pdb` files:
2428

25-
```unix
29+
```
2630
$ cd molecules
2731
$ wc *.pdb
2832
@@ -61,7 +65,7 @@ $ wc *.pdb
6165
6266
If we run `wc -l` instead of just `wc`, the output shows only the number of lines per file:
6367

64-
```unix
68+
```
6569
$ wc -l *.pdb
6670
6771
20 cubane.pdb
@@ -78,7 +82,7 @@ We can also use `-w` to get only the number of words, or `-c` to get only the nu
7882
Which of these files is shortest? It's an easy question to answer when there are only six files,
7983
but what if there were 6000? Our first step toward a solution is to run the command:
8084

81-
```unix
85+
```
8286
$ wc -l *.pdb > lengths
8387
```
8488

@@ -87,7 +91,7 @@ The shell will create the file if it doesn't exist, or overwrite the contents of
8791
(This is why there is no screen output: everything that `wc` would have printed has
8892
gone into the file `lengths` instead.) `ls lengths` confirms that the file exists:
8993

90-
```unix
94+
```
9195
$ ls lengths
9296
```
9397

@@ -102,7 +106,7 @@ $ cat lengths
102106
Now let's use the `sort` command to sort its contents. This does *not* change the file;
103107
instead, it sends the sorted result to the screen:
104108

105-
```unix
109+
```
106110
$ sort lengths
107111
```
108112

@@ -111,7 +115,7 @@ by putting `> sorted-lengths` after the command, just as we used `> lengths` to
111115
output of `wc` into `lengths`. Once we've done that, we can run another command
112116
called `head` to get the first few lines in `sorted-lengths`:
113117

114-
```unix
118+
```
115119
$ sort lengths > sorted-lengths
116120
$ head -1 sorted-lengths
117121
```
@@ -126,7 +130,7 @@ even once you understand what `wc`, `sort`, and `head` do,
126130
all those intermediate files make it hard to follow what's going on.
127131
We can make it easier to understand by running `sort` and `head` together:
128132

129-
```unix
133+
```
130134
$ sort lengths | head -1
131135
```
132136

@@ -138,7 +142,7 @@ we don't have to know or care.
138142

139143
We can use another pipe to send the output of `wc` directly to `sort`, which then sends its output to `head`:
140144

141-
```unix
145+
```
142146
$ wc -l *.pdb | sort | head -1
143147
```
144148

@@ -213,14 +217,14 @@ Nelle has run her samples through the assay machines
213217
and created 1520 files in the `north-pacific-gyre/2012-07-03` directory described earlier.
214218
As a quick sanity check, she types:
215219

216-
```unix
220+
```
217221
$ cd north-pacific-gyre/2012-07-03
218222
$ wc -l *.txt
219223
```
220224

221225
The output is 1520 lines that look like this:
222226

223-
```unix
227+
```
224228
300 NENE01729A.txt
225229
300 NENE01729B.txt
226230
300 NENE01736A.txt

shell/05-modules.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
Using Software Modules
2+
----------------------
3+
4+
---
5+
6+
#### Objectives
7+
* Find software through the module system
8+
* Load and unload modules
9+
* Load different versions of the same package
10+
* Understand prerequisites
11+
* Save new default modules
12+
13+
---
14+
15+
TACC maintains hundreds of software packages on its systems. If all the software were loaded at the same time, it would not only bloat the environment, but you would run into problems with dependencies and versioning requirements. Do you need python 2.7 or python 3? What MPI libraries or compiler does your code need? To avoid all these problems, TACC uses a module system (lmod) to only load the software that you want, while still maintaining access to all available software packages. If you have already ssh'ed into a TACC system, let's explore the way modules work.
16+
17+
### Finding modules
18+
19+
```
20+
$ module list
21+
Currently Loaded Modules:
22+
1) TACC 3) Linux 5) cluster-paths 7) mvapich2/1.6 9) tar/1.22
23+
2) TACC-paths 4) cluster 6) intel/11.1 8) gzip/1.3.12
24+
```
25+
26+
The ```module``` commands on TACC systems all start with the word "module" and have at least one more argument after that tells what you want to do. If you used git from the command line previously, this may remind you of how ```git clone``` and other such commands behave.
27+
28+
Typing ```module list``` lets us see what modules are currently loaded. By default, a number of helpful modules are already loaded that provide core capabilities for interacting with the cluster. What other modules can we load?
29+
30+
```
31+
$ module avail
32+
```
33+
34+
Get ready for lots of modules. Notice that the modules in this list are arranged in a heirarchy. We'll come back to that, but for now, lets look at a better way to search. There are actually two good ways.
35+
36+
```
37+
$ module spider bedtools
38+
39+
$ module key bedtools
40+
```
41+
42+
If we know what module we are looking for, it is easiest just to search for it using a ```module spider``` or ```module key``` commands. Spider gives you more information about the module and the prerequisites. Keyword (or "key" for short) only has a compact description, which is better if you have a long list of modules. Try this:
43+
44+
```
45+
$ module spider genomics
46+
47+
$ module key genomics
48+
```
49+
50+
In addition to searching the names of software packages, the "keyword" command lives up to its name by searching other description and keyword text that go along with the module. Its not Google, but this is one potential way to discover packages related to your field.
51+
52+
Enough searching. Let's load some packages.
53+
54+
### Managing Loaded Modules
55+
56+
```
57+
$ module spider python
58+
$ module load python
59+
$ module list
60+
```
61+
62+
When multiple versions of a package exist, one of them is registered as the default. If you don't specify the version, the default is automatically loaded. As TACC updates software, they also update the default modules to be the newer, but stable, versions. If you need a specific version, be sure to specify it. For example:
63+
64+
```
65+
$ module load python/2.7.1
66+
$ module list
67+
```
68+
69+
What about modules with prerequisites?
70+
71+
```
72+
$ module load bedtools
73+
```
74+
75+
Bedtools requires a different compiler to be loaded so that it has access to the right libraries. By default, the "intel" compiler is loaded, but we need to swap over to "gcc". The normal approach doesn't work here:
76+
77+
```
78+
$ module load gcc
79+
80+
Lmod Error: You can only have one compiler module loaded at time.
81+
You already have intel loaded.
82+
To correct the situation, please enter the following command:
83+
84+
module swap intel gcc/4.4.5
85+
86+
Please submit a consulting ticket if you require additional assistance.
87+
```
88+
89+
The module system is saving us from having two compilers loaded at the same time. In this case, we can either:
90+
91+
```
92+
$ module unload intel
93+
$ module load gcc
94+
```
95+
96+
or, to be a little more concise:
97+
98+
```
99+
$ module swap intel gcc
100+
```
101+
102+
Now that the prerequisites are met, you can now ```module load bedtools```
103+
104+
If things ever get messed up, and you just want to get back to the global system default, you can use:
105+
106+
```
107+
$ module restore system
108+
```
109+
110+
### Customizing the Default
111+
112+
One module I use all the time is ```git``` You may be tempted to put module commands in your startup scripts, but that can cause problems if it is not done correctly. A save way to change your defaults is to use the ```module save``` command. Let's try this for git. First, make sure you have the default set loaded, then lets load git.
113+
114+
```
115+
$ module restore
116+
$ module load git
117+
```
118+
119+
Now, to save it as the default, we do this:
120+
121+
```
122+
$ module save
123+
```
124+
125+
Next time you login to the system, these modules will be loaded.
126+
127+
## Challenges
128+
129+
130+
* What happens if you type ```module``` without any other arguments?
131+
* Try using the "ml" shorthand command. Does it work like "module list" or "module load"?
132+
* Load the module ```trinityrnaseq``` and discover its dependencies

0 commit comments

Comments
 (0)