This idea of linking programs together is why Unix has been so successful. Instead of creating enormous programs that try to do many different things, Unix programmers focus on creating lots of simple tools that each do one job well, and that work well with each other. This programming model is called ‘pipes and filters’.
We’ve already seen pipes; a filter is a program like wc
or sort
that transforms a stream of input into a stream of output. Almost all of the standard
Unix tools can work this way: unless told to do otherwise,they read from standard input,
do something with what they’ve read, and write to standard output.
The key is that any program that reads lines of text from standard input and writes lines of text to standard output can be combined with every other program that behaves this way as well. You can and should write your programs this way so that you and other people can put those programs into pipes to multiply their power.
A file called animals.txt
(in the shell-lesson-data/data
folder) contains the following data:
2012-11-05,deer
2012-11-05,rabbit
2012-11-05,raccoon
2012-11-06,rabbit
2012-11-06,deer
2012-11-06,fox
2012-11-07,rabbit
2012-11-07,bear
What text passes through each of the pipes and the final redirect in the pipeline below?
$ cat animals.txt | head -n 5 | tail -n 3 | sort -r > final.txt
Hint: build the pipeline up one command at a time to test your understanding
For the file animals.txt
from the previous exercise, consider the following command:
$ cut -d , -f 2 animals.txt
The cut
command is used to remove or ‘cut out’ certain sections of each line in the file,
and cut
expects the lines to be separated into columns by a Tab</kbdcharacter.
A character used in this way is a called a delimiter.
In the example above we use the -d
option to specify the comma as our delimiter character.
We have also used the -f
option to specify that we want to extract the second field (column).
This gives the following output:
deer
rabbit
raccoon
rabbit
deer
fox
rabbit
bear
The uniq
command filters out adjacent matching lines in a file.
How could you extend this pipeline (using uniq
and another command) to find
out what animals the file contains (without any duplicates in their
names)?
The file animals.txt
contains 8 lines of data formatted as follows:
2012-11-05,deer
2012-11-05,rabbit
2012-11-05,raccoon
2012-11-06,rabbit
...
The uniq
command has a -c
option which gives a count of the
number of times a line occurs in its input. Assuming your current
directory is shell-lesson-data/data/
, what command would you use to produce
a table that shows the total count of each type of animal in the file?
sort animals.txt | uniq -c
sort -t, -k2,2 animals.txt | uniq -c
cut -d, -f 2 animals.txt | uniq -c
cut -d, -f 2 animals.txt | sort | uniq -c
cut -d, -f 2 animals.txt | sort | uniq -c | wc -l
Suppose you want to delete your processed data files, and only keep
your raw files and processing script to save storage.
The raw files end in .dat
and the processed files end in .txt
.
Which of the following would remove all the processed data files,
and only the processed data files?
rm ?.txt
rm *.txt
rm * .txt
rm *.*