Nelle Nemo, a marine biologist,
has just returned from a six-month survey of the
North Pacific Gyre,
where she has been sampling gelatinous marine life in the
Great Pacific Garbage Patch.
She has 1520 samples that she’s run through an assay machine to measure the relative abundance
of 300 proteins.
She needs to run these 1520 files through an imaginary program called goostats.sh
she inherited.
On top of this huge task, she has to write up results by the end of the month so her paper
can appear in a special issue of Aquatic Goo Letters.
The bad news is that if she has to run goostats.sh
by hand using a GUI,
she’ll have to select and open a file 1520 times.
If goostats.sh
takes 30 seconds to run each file, the whole process will take more than 12 hours
of Nelle’s attention.
With the shell, Nelle can instead assign her computer this mundane task while she focuses
her attention on writing her paper.
The next few lessons will explore the ways Nelle can achieve this.
More specifically,
they explain how she can use a command shell to run the goostats.sh
program,
using loops to automate the repetitive steps of entering file names,
so that her computer can work while she writes her paper.
As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.
In order to achieve her task, Nelle needs to know how to:
First we need to download and unzip the dataset we are going to work with in this session.
$ curl -O https://swcarpentry.github.io/shell-novice/data/shell-lesson-data.zip
$ ls
shell-lesson-data.zip
$ unzip shell-lesson-data.zip
Note: you might be prompted to install the unzip command first, if you are using it for the first time.
$ sudo apt install unzip