Grep and regex are powerful command-line tools for searching and manipulating text in Linux and Unix-like systems. Mastering them can greatly enhance your productivity when working in the terminal.
In this comprehensive guide, we‘ll cover everything you need to know to use grep and regex effectively.
What is Grep?
Grep stands for Global Regular Expression Print. It‘s a command-line utility for searching text files and streams for matching patterns.
Some key things about grep:
-
It‘s available on all Unix-based systems including Linux and macOS.
-
It supports powerful regular expressions for pattern matching.
-
It can search across multiple files and recursively through directories.
-
It has many useful options and flags for fine-tuning searches.
At its core, grep takes a regular expression pattern and searches for matching lines in the specified files/input. The matched lines are then printed to the standard output.
Here is the basic syntax for grep:
grep [options] pattern [files]
Let‘s try a simple example:
$ grep "hello" file.txt
This will print all lines in file.txt that contain the word "hello".
Grep is often used in pipelines for filtering text as well:
$ ps aux | grep "httpd"
This prints only the lines containing "httpd" from the output of ps aux.
As you can see, grep is great for extracting information from text outputs. It‘s a must-have tool for processing log files, source code, config files and more.
Next, let‘s look at regular expressions and how they work with grep.
Regular Expressions Primer
A regular expression (regex) is a sequence of characters that defines a search pattern. It provides a concise and flexible means for matching strings of text.
Here are some examples of simple regex patterns:
hello– matches the literal string "hello"^hello– matches "hello" at the start of a linehello$– matches "hello" at the end of a line[Hh]ello– matches "hello" or "Hello"[a-z]– matches any lowercase letter[0-9]– matches any digit
Regex meta characters like ^ $ . * + ? [] {} () \ | have special meanings and allow you to create more complex patterns.
Grep uses a flavor of regex called basic regular expressions (BRE). There is also an extended (ERE) mode enabled with the -E flag.
Let‘s look at some common regex features supported by grep:
Anchors
^– Start of line or string$– End of line or string
Quantifiers
*– Zero or more occurrences of the previous item+– One or more occurrences of the previous item?– Zero or one occurrence of the previous item{n}– Exactly n occurrences of the previous item{n,}– Minimum n occurrences of the previous item{n,m}– Between n and m occurrences of the previous item
Character Classes
[abc]– Matches a, b or c[^abc]– Matches anything except a, b or c[a-z]– Matches any lowercase letter[A-Z]– Matches any uppercase letter[0-9]– Matches any digit\w– Matches any alphanumeric character\W– Matches any non-alphanumeric character\s– Matches whitespace (space, tab, newline)\S– Matches non-whitespace
Grouping
(...)– Groups multiple patterns into a single unit|– Matches either pattern separated by |
These are just some of the basic constructs. Regex can get much more advanced than this!
Now let‘s see how to leverage the power of regex with grep.
Grep Usage Examples
Here are some examples to demonstrate common use cases of grep with regular expressions.
1. Basic matching
The most basic usage of grep is to find lines containing literal text:
$ grep "hello" file.txt
This will print all lines with the text "hello".
2. Case-insensitive search
Use the -i flag to make the search case-insensitive:
$ grep -i "hello" file.txt
Now it will match "hello", "Hello", "HELLO" etc.
3. Invert match
The -v flag inverts the match, printing non-matching lines:
$ grep -v "hello" file.txt
This will print all lines that do not contain "hello".
4. Print file names only
The -l flag suppresses normal output and only prints the names of files containing matches:
$ grep -l "hello" *.txt
This is useful for checking which files contain the pattern.
5. Print line numbers
Add the -n flag to prefix each matching line with its line number:
$ grep -n "hello" file.txt
This provides context for where in the file the matches occur.
6. Count matches
The -c flag counts the number of matching lines per file:
$ grep -c "hello" *.txt
This prints just the counts versus the full matched lines.
7. Multiple patterns
To match lines containing either of two patterns, use the -e flag:
$ grep -e "hello" -e "world" file.txt
This will print lines matching "hello" OR "world".
8. Colorized output
The --color flag prints matches in color for better visibility:
$ grep --color "error" log.txt
Matches will be highlighted in red or other colors depending on terminal settings.
9. Search recursively
Use -R to recursively search through directories:
$ grep -R "hello" ~/Documents/
This will find matches in all files under ~/Documents/ and subdirectories.
10. Search by regex
Let‘s try some regex patterns with grep. To search for "hello" or "Hello":
$ grep "[hH]ello" file.txt
The [hH] matches either h or H. Remember to always quote the regex so the shell doesn‘t interpret special characters like [ and ].
To find lines starting with "hello":
$ grep "^hello" file.txt
The ^ anchors the match to the start of the line.
To find lines ending with "hello":
$ grep "hello$" file.txt
The $ anchors the match to the end of the line.
These examples demonstrate the power of regexes for flexible matching with grep.
Advanced Grep and Regex
So far we‘ve covered basic usage of grep and regex. Now let‘s look at some more advanced techniques and features.
Extended regex with -E
The -E flag enables extended regular expressions, which support additional metacharacters like ?, +, {}, () etc.
For example, to find lines with "file" followed by 1 or more digits:
$ grep -E "file[0-9]+" file.txt
The + quantifier matches 1 or more of the previous item.
Extended regex allows you to craft more complex patterns that are not supported in basic regex.
Perl Compatible regex with -P
Grep also supports Perl Compatible Regular Expressions (PCRE) when using the -P flag.
PCRE provides additional constructs like lookaround assertions and recursive patterns.
For example, to get lines containing duplicate words:
$ grep -P "\b(\w+)\s+\1\b" file.txt
This matches a word boundary, captures a word, matches 1+ whitespace chars and then backreferences the captured word.
Perl regex is ideal for matching tricky patterns beyond standard regex capabilities.
Logical OR with alternation
To match lines containing either "error" or "failure", use the alternation operator |:
$ grep -E "error|failure" file.txt
This will match lines containing either of the two words.
Including and excluding files
Instead of enumerating files to search, you can use wildcards like *.txt.
To exclude specific files, use the --exclude flag:
$ grep "hello" *.txt --exclude "ignore.txt"
This will search all .txt files except ignore.txt.
You can also combine --include and --exclude for more granular control over which files are searched.
Search across files
To search for a pattern across multiple files, pass all the files as arguments instead of wildcards:
$ grep "hello" file1.txt file2.txt
This will search file1.txt and file2.txt.
You can also pass the file list from another command via stdin:
$ find . -name "*.log" | grep "error"
This will search the log files found by find for the pattern "error".
Save matches to a file
To write all matching lines to a file, redirect stdout:
$ grep "hello" files/*.txt > matches.txt
Now matches.txt will contain the matched lines.
Invert match and print to file
To get all non-matching lines, combine -v and stdout redirection:
$ grep -v "hello" files/*.txt > exclude.txt
exclude.txt will now contain all lines without "hello".
Highlight matches
The --color flag highlights matches but keeps normal output.
To save just the highlighted text, pipe to sed -n ‘/\^.\{0,0\}m/p‘:
$ grep --color "hello" file.txt | sed -n ‘/\^.\{0,0\}m/p‘ > highlights.txt
This extracts the colored parts signified by escape sequences starting with \e[ into highlights.txt.
Count occurrences
To count the number of matches per file, use -c flag:
$ grep -c "error" *.log
It will print the counts for each log file.
To get the total count across files, pipe to awk:
$ grep -c "error" *.log | awk ‘{sum+=$1} END {print sum}‘
awk sums up the counts, printing the total at the end.
Grep cheat sheet
Here‘s a quick cheat sheet for reference of some commonly used grep flags and options:
-i : Case insensitive search
-v : Invert match
-c : Print count of matching lines
-l : Print matching file names only
-n : Prefix matches with line numbers
-E : Use extended regular expressions
-P : Use Perl compatible regular expressions
-w : Match whole words only
-x : Match whole lines only
-A n : Print n lines after a match
-B n : Print n lines before a match
-C n : Print n context lines before and after a match
--color : Highlight matches
-R : Recursively search directories
--exclude : Exclude files/paths from search
--include : Include only files/paths in search
This covers the most useful options. Refer to man grep for more details.
Lookahead and lookbehind assertions
When using Perl regex with -P, you can use lookaround assertions for advanced matches.
For example, to get lines where "error" precedes "code X123":
$ grep -P "error(?=.*code X123)" file.txt
The positive lookahead (?= ) matches "error" only if followed by "code X123" on the same line.
Conversely, to match "error" not followed by "code X123":
$ grep -P "error(?!.*code X123)" file.txt
The negative lookahead (?! ) excludes matches before "code X123".
Lookarounds don‘t consume characters, they only assert a condition. This makes them useful for complex scenarios like this.
Save matched groups
When using capturing groups with (), you can extract the contents of groups.
For example, to get just the number from "error code X123":
$ grep -Po "error code X(\d+)" file.txt
The -o flag prints only the matched text instead of the full line. \d+ matches one or more digits, captured into group 1.
This technique is useful for parsing structured log lines to extract specific fields.
Multi-line searching
Normally grep matches patterns line by line. To match patterns spanning multiple lines, use the -z flag:
$ grep -Pzo "error.*\n.*code X123" file.txt
This will treat the input as a single string separated by null chars for multi-line searching.
Search gzipped files
Grep can search inside gzipped files without decompressing:
$ zgrep "error" *.log.gz
zgrep is a handy utility that calls grep on the decompressed stream on-the-fly.
This avoids having to explicitly decompress gzipped files before searching.
Getting Help
Grep has a manual page accessible through man grep which covers all flags and options in detail.
Some handy resources:
grep --help– Prints a quick reference of common flags- Regex101 – Online regex tester and debugger
- Regular Expressions Guide – Regex tips, tricks and examples
When in doubt, don‘t hesitate to search online – there is a treasure trove of information on leveraging grep and regex!
Conclusion
Grep and regular expressions are at the heart of text processing on Linux. I hope this guide provided a solid overview of getting the most out of these tools.
The key takeaways are:
- Use grep for searching across files and stdin
- Craft patterns with regex for flexible text matching
- Learn flags like
-i,-v,-cfor added functionality - Use extended regex
-Efor advanced patterns - Leverage Perl regex
-Pfor lookaround assertions - Extract matches and groups for parsing text
- Search gzipped files easily with
zgrep
Practice grep and regex skills – they will serve you well for processing logs, code, files and output in the Linux shell.
Let me know if you have any other grep/regex tips or tricks!