CSC128 : Introduction to UNIX

Using grep and regular expressions


The grep Utility

grep searches, line by line, for a pattern, and displays (to stdout) lines that match that pattern.  If files are specified on the command line, they are searched;  otherwise, grep searches standard input (stdin).

The pattern is a regular expression.  (Regular expressions are discussed below.)

In the following example, the output of
ls -l is piped through grep , returning only lines that contain the pattern "Apr".

user@machine:~ $ ls -l
-rwx------   1 shum     staff           6 Mar 12 19:42 best.sh
-rwxr-xr-x   1 shum     staff         409 Mar 14 15:56 converse.sh
-rw-r--r--   1 shum     staff          46 Apr  4 17:32 dantest.sh
-rw-r--r--   1 shum     staff         396 Mar 14 15:56 doctor.sh~
-rwxr-xr-x   1 shum     staff         144 Mar 14 15:39 haha.sh
-rw-------   1 shum     staff           6 Mar 12 19:42 harold
-rw-------   1 shum     staff           6 Mar 12 19:42 nohup.out
-r--r--r--   1 shum     staff       31416 Feb  2 10:51 poem
-rw-r--r--   1 shum     staff           6 Mar 12 19:42 project1.out
-rwxr-xr-x   1 shum     staff           6 Mar 12 19:42 project1.sh
-rw-r--r--   1 shum     staff           6 Mar 12 19:42 regchrome
-rw-r--r--   1 shum     staff           6 Mar 12 19:42 richard
drwxr-xr-x   2 shum     staff        4096 Jan 24 16:41 stuff
-rwxr-xr-x   1 shum     staff         235 Apr  2 16:46 tcsh_args.sh
-rwxr-xr-x   1 shum     staff          96 Apr  2 15:16 tcsh_while.sh
-rwx------   1 shum     staff       11418 Mar 12 19:58 test
-rw-r--r--   1 shum     staff         129 Mar 12 19:58 test.c
-rwxr-xr-x   1 shum     staff         146 Apr  4 16:12 test.sh
-rwxr-xr-x   1 shum     staff           6 Mar 12 19:42 test10.sh
-rw-r--r--   1 shum     staff          89 Apr  4 17:03 testif.sh
drwxr-xr-x   2 shum     staff        4096 Apr  2 19:28 testnow
-rw-r--r--   1 shum     staff           6 Mar 12 19:42 thomas
drwxr-xr-x   2 shum     staff        4096 Jan 29 20:09 tmp


user@machine:~ $ ls -l | grep "Apr"
-rw-r--r--   1 shum     staff          46 Apr  4 17:32 dantest.sh
-rwxr-xr-x   1 shum     staff         235 Apr  2 16:46 tcsh_args.sh
-rwxr-xr-x   1 shum     staff          96 Apr  2 15:16 tcsh_while.sh
-rwxr-xr-x   1 shum     staff         146 Apr  4 16:12 test.sh
-rw-r--r--   1 shum     staff          89 Apr  4 17:03 testif.sh
drwxr-xr-x   2 shum     staff        4096 Apr  2 19:28 testnow


user@machine:~ $



In this example, the file verse is searched for the pattern "the" , and only lines matching that pattern are displayed on stdout.

user@machine:~ $ cat verse
XVIII.

Shall I compare thee to a summer's day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer's lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm'd;
And every fair from fair sometime declines,
By chance or nature's changing course untrimm'd;
But thy eternal summer shall not fade
Nor lose possession of that fair thou owest;
Nor shall Death brag thou wander'st in his shade,
When in eternal lines to time thou growest:
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee.


user@machine:~ $ grep "the" verse
Shall I compare thee to a summer's day?
Rough winds do shake the darling buds of May,
Sometime too hot the eye of heaven shines,
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee.


user@machine:~ $


Note that grep does not search for whole words by default.  It matched all line containing "the" .

Some Options to grep
The syntax for grep is:
grep [options] "pattern" [file-list ]

Options:

-E
Use extended regular expressions. (See below, Extended Regular Expressions ).

-v

Rev erse the sense of the matching test.  Given this option,   grep will match all lines that DO NOT match the pattern.

-i
Ignore case. This will make grep treat each upper- and lower-case letter as the same letter.

-q
Be quiet.  This keeps grep from displaying anything at all;  it will only return an exit code. This exit code makes grep useful as a conditional expression in a control structure:

#!/bin/sh

if grep -q "happiness" riches.txt
then
  echo "There is happiness in riches.txt"
else
  echo "There is no happiness in riches.txt"
fi


 
-w
Search only for whole words, not substrings.

user@machine:~ $ grep "the" verse
Shall I compare thee to a summer's day?
Rough winds do shake the darling buds of May,
Sometime too hot the eye of heaven shines,
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee.


user@machine:~ $ grep -w "the" verse
Rough winds do shake the darling buds of May,
Sometime too hot the eye of heaven shines,

user@machine:~ $




-l
Displays the name of each file in [file-list] that has one or more lines matching the pattern

-c
Displays the number of matches only.



Regular Expressions

Simple strings
All the above examples use simple strings.  A simple string is a series of characters with no special characters.  It matches only itself.

Match any character: .
The dot ( . ) matches any single character.  

For example, the regular expression
"d.g" will match all lines containing dig, dug, dog, dqg, d9g, etc.

Match a character set: [ ]
The brackets ( [ ] ) match any single character within them.  

For example, the regular expression
"d[oiu]g" will only match lines containing dog, dig, or dug .

To search for a special character we would place that character inside of the square brackets so:
history |grep [">"] >file
will find any time that you have used the redirection symbol and save it to file. Note that if we use a special character we have to place it inside of the double quotes or we can use the special character \ to escape it like this:
history |grep [\>]

To reverse this action, use a caret (^ ) as the first character within the brackets:  the regular expression "d[^oiu]g"dqg, d9g, etc.,  but not dog, dig, or dug .

Zero or more occurrences: *
The asterisk ( * ) can follow regular expression that represents a single character; it alters its behavior to match zero or more occurrences of that expression:
The regular expression "do*g" will match:
dog
dg
doog
doooooooog
etc.
The regular expression "d[oiu]*g" will match:
dig
dog
dug
doug
diuiouog
dg
etc.
The regular expression "d.*g" will match:
dig
dog
dug
dining
dg
do this for me, Greg
etc.


Begining and end of line: ^ $
The caret ( ^ ), when at the beginning of a regular expression, will only match lines that begin with the regular expression:

The regular expression "^The" will match these lines:
The cat is outside.
The dog is also outside.
These animals are all outside.
etc.
But it will not match these lines:
Where is the duck? The duck is in the pond.
"The mongoose is chasing the duck", she said.
etc.

The dollar sign ( $ ) is similar;  when it is at the end of a regular expression, it will only match lines that end with the regular expression:

The regular expression "html$" will match these lines:
-rw-r--r--    1 shum     unknown      4422 Feb 21 08:07 basic_commands.html
-rw-r--r--    1 shum     unknown     13105 Feb 21 08:07 communications.html
-rw-r--r--    1 shum     unknown      1068 Feb 21 08:07 exam1.html
-rw-r--r--    1 shum     unknown      7361 Feb 21 08:08 file_system.html
-rw-r--r--    1 shum     unknown     14990 Feb 21 08:08 gui_overview.html
-rw-r--r--    1 shum     unknown     22685 Feb 21 08:46 index.html

But it will not match these lines:
-rw-r--r--    1 shum     unknown      3703 Feb 21 08:08 html01.txt
-rw-r--r--    1 shum     unknown      2222 Feb 21 08:08 html02.txt
-rw-r--r--    1 shum     unknown      3793 Feb 21 08:08 index.html.gz
-rw-r--r--    1 shum     unknown      3366 Feb 21 08:08 html.tar.gz
-rwxr-xr-x    1 shum     unknown     67062 Mar  7 10:50 htmledit


Quoting special characters: \
Many times, you will need to search for a sequence of characters that includes one or more of the special characters that grep uses.  These characters must be escaped to be taken literally.

For example, to
grep through a long listing for everything ending in .it , the dot will need to be escaped with a backslash ( \ ) so that grep doesn't think you mean to match any character, but only a dot (or period).

user@machine:~ $  ls -l | grep ".it"
-rw-r--r--    1 shum     unknown      1294 Feb 21 08:08 one_bit
-rw-r--r--    1 shum     unknown      1557 Feb 21 08:08 ita887
-rw-r--r--    1 shum     unknown      1768 Feb 21 08:08 I.ate.it
-rw-r--r--    1 shum     unknown      1498 Feb 21 08:08 002.it.gz
-rw-r--r--    1 shum     unknown      3962 Feb 21 08:08 003.it.gz
-rw-r--r--    1 shum     unknown      2795 Feb 21 08:08 quit.txt

user@machine:~ $ ls -l | grep "\.it"
-rw-r--r--    1 shum     unknown      1768 Feb 21 08:08 I.ate.it
-rw-r--r--    1 shum     unknown      1498 Feb 21 08:08 002.it.gz
-rw-r--r--    1 shum     unknown      3962 Feb 21 08:08 003.it.gz


user@machine:~ $


Extended Regular Expressions

In addition to the asterisk, we can also use the plus ( + ) and question mark (? ) modifiers, if we are using extended regular expressions.  Either pass the -E option to grep , or use egrep.

One or more occurances: +

The plus ( + ) can followa  regular expression that represents a single character; it alters its behavior to match one or more occurrences of that expression.

Zero or one occurances: ?

The question mark ( ? ) can follow a regular expression that represents a single character; it alters its behavior to match zero or one occurrences of that expression.

  • Read Appendix A in the text for more information on regular expressions and extended regular expressions .