Red Hat Certified System Administrator(RHCSA)

Understand and Use Essential Tools

Analyze text using basic regular expressions

In this article, we will explore how to analyze text using basic regular expressions in Linux. In previous sections, we used simple search patterns to locate specific text such as "CentOS." However, what do you do when the search conditions become more complex?

Imagine you have hundreds of files with scattered documentation, and you need to extract all the IP addresses mentioned. An IP address typically looks like 203.102.3.5. Simply searching for numbers separated by dots might also capture unwanted patterns like "1.2," which does not represent a complete IP address.

The image shows a grid of ten terminal windows displaying code or configuration files with highlighted IP addresses. The windows are arranged in two rows of five.

Consider this mathematical scenario: Given x is an integer, where x > 3 and x < 8, the possible values for x are only 4, 5, 6, or 7.

The image shows a mathematical puzzle with conditions: "x is an integer," "x > 3," and "x < 8," with a number line indicating possible values between 3 and 8.

Regular expressions function in a similar manner. You define a set of conditions, and only text meeting all those conditions will match your pattern. We'll start with simple examples and gradually build towards more advanced expressions.

Regex Operators Overview

Regular expressions are constructed using various operators. Some of the key operators include:

  • The caret (^) to mark the start of a line.
  • The dollar sign ($) to mark the end of a line.
  • The period (.) to match any single character.
  • The asterisk (*) to match zero or more occurrences of the preceding element.
  • The plus sign (+) to match one or more occurrences of the preceding element.
  • Braces ({}), question mark (?), vertical pipe (|), and different bracket types (including [[] and [^]).

The image displays a list of regex operators, including symbols like `^`, `$`, `.`, `*`, `+`, `{}`, `?`, `|`, `[]`, `()`, and `[^]`.

Let’s dive deeper into each operator.

Matching Lines That Begin with a Pattern

Consider a file names.txt containing the following names:

$ cat names.txt
adam
adnan
basam
samad
samuel
sheela
ravi
mausami

Running a simple grep command that searches for "sam" will return any line that contains the sequence:

$ grep 'sam' names.txt
basam
samad
samuel
mausami

If you want to match only the names that start with "sam," use the caret (^) operator to denote the beginning of the line. Conversely, to match a pattern that must appear at the end of a line, use the dollar sign ($) operator.

For example, to match names ending with "sam," append the $ operator at the end of the search term:

$ grep 'sam$' names.txt

This command returns only those lines where "sam" appears right at the end. Remember: the caret should always be at the beginning of your pattern while the dollar sign always comes at the end.

Using the Period (.) Operator

The period (.) in a regular expression matches any single character. For example, the pattern c.t can match "cat," "cut," "sit," or "c1t" and even "c#t." However, it will not match "ct" because there must be exactly one character between "c" and "t." Similarly, c..t requires exactly two characters between "c" and "t."

If you need to match whole words rather than parts of words, consider using the -w option with grep:

$ grep -wr '.c.t' /etc/

This command restricts matches to whole words that satisfy the pattern, ignoring any partial matches.

Escaping Special Characters

Some characters, such as the period (.), have special meanings in regular expressions. If your aim is to match a literal period—say, the end of a sentence—you must escape it with a backslash:

$ grep '\.' /etc/login.defs

Note

When dealing with regex, always escape characters that have special meanings if you intend to match them literally.

Understanding the Asterisk (*) and Plus (+) Operators

The asterisk (*) lets the preceding element appear zero or more times. For example, the regex let* can match "le", "let", "lett," and so on. If you need to ensure that the character appears at least once, use the plus (+) operator.

Consider this: when matching sequences where the character "0" appears one or more times, using 0* would match strings with one or more zeros as well as strings with no zeros (because the asterisk makes the occurrence optional). To enforce at least one occurrence of "0," the plus operator is the correct choice.

However, when working with basic regular expressions in grep, the plus sign is treated as a literal character rather than an operator:

$ grep -r '0+' /etc/

The command above will search for the literal string "0+" instead of treating + as a quantifier. In basic regex, special characters such as ?, +, |, and () lose their special significance unless they are escaped. To use the plus operator correctly in basic regex, escape it with a backslash:

$ grep -r '0\+' /etc/

The output might look like this after properly escaping the plus operator:

$ grep -r '0\+' /etc/
/etc/pnm2ppa.conf:# The setting is correct when alignments "@" are correct.
...
/etc/subuid:-aaron:10000:65536
/etc/subuid:-charles:231072:65536

Another recommended approach is to use Extended Regular Expressions (ERE), which do not require escaping these characters. You can enable ERE by using the -E flag with grep or by using the egrep command:

$ grep -E -r '0+' /etc/

Or equivalently:

$ egrep -r '0+' /etc/

Tip

Using Extended Regular Expressions helps simplify pattern writing and reduces common mistakes caused by missing escape characters.

Conclusion

Understanding the placement and function of regex operators is key to effective text searching. The caret (^) marks the start of a line, and the dollar sign ($) marks its end. The period (.) stands for any single character, while the asterisk (*) and plus (+) operators control the number of occurrences of the preceding element. Escaping special characters appropriately or opting for Extended Regular Expressions (via -E or egrep) can greatly simplify your search patterns.

In our next article, we will explore further regex concepts and provide additional practical examples using Extended Regular Expressions.

Happy regexing!

Watch Video

Watch video content

Previous
Search file using Grep