A Toolkit of commonly used external commands
The following commands are very frequently used in shell scripts. Many of them are used
in the examples in these notes. This is just a brief recap -- see the man pages for details on usage.
The most useful are flagged with *.
Most of these commands will operate on a one or more named files, or will operate on a stream of
data from standard input if no files are named.
Listing, copying and moving files and directories
ls *
- list contents of a directory, or list details of files and directories.
-
mkdir; rmdir *
- Make and Remove directories.
-
rm; cp; mv *
- Remove (delete), Copy and Move (rename) files.
touch *
- Update the last modifed timestamp on a file, to make it appear to have just been written.
If the file does not exist, a new zero-byte file is created, which is often useful to signify that
an event has occurred.
tee
- Make a duplicate copy of a data stream - used in pipelines to send one copy to a log file
and a second copy on to another program. (Think plumbing).
Displaying text, files or parts of files
echo *
- Echo the arguments to standard output -- used for messages from scripts.
Some versions of "sh", and all csh/ksh/bash shells internalized "echo".
Conflicts
sometimes arise over the syntax for echoing a line with no trailing CR/LF.
Some use "\c" and some use option "-n". To avoid these problems, ksh also provides the "print"
command for output.
cat *
- Copy and concatenate files; display contents of a file
head, tail *
- Display the beginning of a file, or the end of it.
cut
- Extract selected fields from each line of a file. Often awk is easier to use, even though it is
a more complex program.
Compression and archiving
compress; gzip, zip; tar *
- Various utilities to compress/uncompress individual files, combine multiple files into a single archive, or
do both.
Sorting and searching for patterns
sort *
- Sort data alphabetically or numerically.
grep *
- Search a file for lines containing character patterns. The patterns can be simple fixed text, or very complex
regular expressions.
The name comes from "Global Regular Expression and Print" -- a function from
the Unix editors which was used frequently enough to warrant getting its own program.
uniq *
- Remove duplicate lines, and generate a count of repeated lines.
wc *
- Count lines, words and characters in a file.
System information (users, processes, time)
date *
- Display the current date and time (flexible format). Useful for conditional execution based on
time, and for timestamping output.
ps *
- List the to a running processes.
kill *
- Send a signal (interrupt) to a running process.
id
- Print the user name and UID and group of the current user (e.g. to distinguish priviledged users before
attempting to run programs which may fail with permission errors)
who
- Display who is logged on the system, and from where they logged in.
uname *
- Display information about the system, OS version, hardware architecture etc.
mail *
- Send mail, from a file or standard input, to named recipients. Since scripts are often used to automate
long-running background jobs, sending notification of completion by mail is a common trick.
logger
- Place a message in the central system logging facility. Scripts can submit messages
with all the facilities available to compiled programs.
Conditional tests
test; [ *
- The conditional test, used extensively in scripts, is also an external program which evaluates
the expression given as an argument and returns true (0) or false (1) exit status. The name "[" is a
link to the "test" program, so a line like:
if [ -w logfile ]
actually runs a program "[", with arguments "-w logfile ]", and returns a true/false value to the "if"
command.
In ksh and most newer versions of sh, "[" is replaced with a compatible internal command, but the
argument parsing is performed as if it were an external command.
Ksh also provides the internal "[[" operator, with simplified syntax.
Stream Editing
awk *
- A pattern matching and data manipulation utility, which has its own scripting language. It also duplicates
much functionality from 'sed','grep','cut','wc', etc.
Complex scripts can be written entirely using awk, but it is
frequently used just to extract fields from lines of a file (similar to 'cut').
sed *
- Stream Editor. A flexible editor which operates by applying editing rules to every line in a data stream
in turn.
Since it makes a single pass through the file, keeping only a few lines in memory at once,
it can be used with infinitely large data sets. It is mostly used for global search and replace operations.
It is a superset of 'tr', 'grep', and 'cut', but is more complicated to use.
tr
- Transliterate - perform very simple single-character edits on a file.
Finding and comparing files
find *
- Search the filesystem and find files matching certain criteria (name pattern, age, owner, size,
last modified etc.)
xargs *
- Apply multiple filename arguments to a named command and run it.
Xargs is often used in combination
with "find" to apply some command to all the files matching certain criteria. Since "find" may result in a very
large list of pathnames, using the results directly may overflow command line buffers. Xargs avoids this problem,
and is much more efficient than running a command on every pathname individually.
diff *
- Compare two files and list the differences between them.
basename pathname
- Returns the base filename portion of the named pathname, stripping off all the directories
dirname pathname
- Returns the directory portion of the named pathname, stripping off the filename
Arithmetic and String Manipulation
expr *
- The "expr" command takes an numeric or text pattern expression as an argument, evaluates it, and
returns a result to stdout. Bourne shell has no built-in arithmetic operators or string manipulation.
e.g.
expr 2 + 1
expr 2 '*' '(' 21 + 3 ')'
Used with text strings, "expr" can match regular expressions and extract sub expressions. Similar functionality
can be achived with sed.
e.g.
expr SP99302L.Z00 : '[A-Z0-9]\{4\}\([0-9]\{3\}\)L\.*'
dc
- Desk Calculator - an RPN calculator, using arbitrary precision arithmetic and
user-specified bases. Useful for more complex arithmetic expressions than can be performed
internally or using
expr
bc
- A preprocessor for
dc which provides infix notation and a C-like syntax for
expressions and functions.
Merging files
paste
- Merge lines from multiple files into tab-delimited columns.
join
- Perform a join (in the relational database sense) of lines in two sorted input files.