CSC128: Introduction to UNIX

Using awk


The awk Utility

The awk utility is similar to sed, in that it searches for and processes patterns in a file.

Usage:
awk -f program-file [file-list]
awk program [file-list]


If [file-list ] is not specified, awk reads from stdin.  

By default, awk writes to stdout.

If the -f option is used, then commands are read from program-file , a text file containing one or more awk commands.

Fields and Variables in awk
By default, awk considers each line a record, and each whitespace-separated string in that record a field.

Some variables are pre-defined within awk :
$1, $2, etc.
The contents of field 1, field 2, etc.  


$0
The contents of the entire record.  


NR
The ordinal record number of the current record.  


NF
The number of fields in the current record.  


FILENAME
The name of the file being input (or "- " if stdin).  


RS
The record separator (default is NEWLINE)


FS
The field separator (default is SPACE);
Note that some of these variables mirror shell variables;  it is therefore a good idea to surround your commands with literal quotes (single quotes: ( ' )), so that they are not expanded by the shell, but are sent to awk as-is.

Functions in awk
There are many built-in functions in awk that can be called from anywhere within a command.  Here are a few:
length(string)
Returns the number of characters in the string string .

int(number)
Returns the integer portion of number .

tolower(string)
Returns the string string with all the letters converted to lower case.

toupper(string)
Returns the string string with all the letters converted to upper case.

Commands in awk
 Each awk command has the pattern:
pattern { action }
The pattern selects records from the input file (or stdin), and action specifies what action to perform with that line.   Variables (and functions) may be used anywhere in the command.  Some example pattern s:
/regex/
This will perform action on all lines with a match to the Regular Expression regex .

$n ~ /regex/
This will perform action on all lines that have a match to the Regular Expression regex in the n th field (i.e. $1 , $2, etc.)

$n !~ /regex/
This will perform action on all lines that have no match to the Regular Expression regex in the nth field (i.e. $1, $2 , etc.)

$1 < 5000
This will perform action on all lines where the value of the first field is less than 5000.  Other operators include:
>   greater than
<=  less than or equal to
>=  greater than or equal to
==  equal to
!=  not equal to

BEGIN
This will perform action before the first line.

END
This will perform action after the last line.

The action tells awk what to do with the records matched to the pattern .  Some actions :
print
Prints the entire record to stdout.

print expr-list
Prints the contents of expr-list to stdout

print expr-list > file
Prints the contents of expr-list to file

printf "control-string", arg1, arg2, ...
Prints in a manner similar to the c-language function printf() .

system(command-line)
Executes the shell command  command-line , and returns the exit status.

fflush()
Flush any buffers associated with the standard output.

fflush(file)
Flush any buffers associated with file .

fflush("")
Flush all open output files and pipes.

Examples
In this following example, all records (lines) are printed to output. Note that if no pattern is specified, the action is done on all lines.
user@machine:~ $ awk '{print}' myfile
There once was an old man of Esser,
Whose knowledge grew lesser and lesser,
It at last grew so small
He knew nothing at all,
And now he's a college professor.


user@machine:~ $


In this following example, the first and third fields (words) of each record (line) are printed to output. Note  that if no pattern is specified, the action is done on all lines.  Also note that if there is no comma separating the variables in the print statement, the variables will have no space between them.
user@machine:~ $ awk '{print $1,$3}' myfile
There was
Whose grew
It last
He nothing
And he's


user@machine:~ $ awk '{print $1 $3}' myfile
Therewas
Whosegrew
Itlast
Henothing
Andhe's


user@machine:~ $


In this In this following example, all lines containing a match to the Regular Expression "at" are printed to output.  Note that if no action is specified, the default is to print the entire record.
user@machine:~ $ awk '/at/ {print}' myfile
He knew nothing at all,

user@machine:~ $ awk '/at/' myfile
He knew nothing at all,

user@machine:~ $


In this following example, all lines containing a match to the Regular Expression "o" in the second field are printed to output.
user@machine:~ $ awk '$2 ~ /o/ {print}' myfile
There once was an old man of Esser,
Whose knowledge grew lesser and lesser,
And now he's a college professor.


user@machine:~ $



In this following example, the length of each line is printed to output.
user@machine:~ $ awk '{print length}' myfile
35
39
24
23
33


user@machine:~ $



In this following example, all lines shorter than 25 characters long are printed to output.
user@machine:~ $ awk 'length < 25 {print}' myfile
There once was an old man of Esser,
Whose knowledge grew lesser and lesser,
He knew nothing at all,


user@machine:~ $



In this following example, all lines with the first record shorter than 5 characters long are printed to output.
user@machine:~ $ awk 'length($1) < 5 {print}' myfile
It at last grew so small
He knew nothing at all,
And now he's a college professor.

user@machine:~ $


For more information on awk programming, see pp. 648-670 in the text; also consult the man page.


Toolbox

comm [options ] file1 file2
A simple utility to visually compare two sorted files.  (This utility only works with sorted files;  results are unpredictable if the files are not sorted.)

Column 1 shows lines found only in
file1 .
Column 2 shows lines found only in file2 .
Column 3 shows lines found both files.
user@machine:~ $ cat list1
alpha
beta
gamma
delta
zeta
theta
iota


user@machine:~ $ cat list2
alpha
beta
gamma
DELTA
zeta
eta
iota


user@machine:~ $ comm list1 list2
                alpha
                beta
                gamma
        DELTA
delta
                zeta
        eta
        iota
theta
iota


user@machine:~ $