Red Hat Certified System Administrator(RHCSA)
Understand and Use Essential Tools
Compare and manipulate file content
Linux is built around text. Whether you're working in an SSH session, configuring applications, or modifying system settings, you'll encounter numerous text files. In this article, we explore common commands used to view, edit, transform, and compare file content, providing practical examples to improve your workflow.
Viewing Files
For small files, the cat
command allows you to display the entire file content quickly. For example, if you have a file named "users.txt", you can run:
$ cat /home/users.txt
user1
user2
user3
user4
user5
user6
If you wish to display the file contents in reverse order (i.e., with the last line first), use the tac
command, which works similarly to cat
but in reverse.
For large files such as system logs, you might be interested only in the most recent entries. The tail
command displays the last 10 lines by default. You can specify a different number of lines with the -n
option:
$ tail -n 20 /var/log/dnf.log
2021-11-02T18:11:47-0500 DEBUG baseos: using metadata from Thu 28 Oct
2021-11-02T18:11:48-0500 DEBUG reviving: 'extras' can be revived - repomd matches.
2021-11-02T18:11:48-0500 DEBUG extras: using metadata from Thu 28 Oct
2021-11-02T18:11:48-0500 DEBUG User-Agent: constructed: 'libdnf (CentOS Stream 8; generic; Linux.x86_64)'
2021-11-02T18:11:48-0500 INFO Metadata cache created.
2021-11-02T18:11:48-0500 DEBUG Cleaning up.
2021-11-02T19:22:58-0500 INFO --- logging initialized ---
2021-11-02T19:22:58-0500 DDEBUG timer: config: 1 ms
2021-11-02T19:22:58-0500 DDEBUG DNF version: 4.7.0
2021-11-02T19:22:58-0500 DDEBUG Command: dnf makecache --timer
2021-11-02T19:22:58-0500 DDEBUG Installroot: /
2021-11-02T19:22:58-0500 DDEBUG Releaserver: 8
2021-11-02T19:22:58-0500 DDEBUG cachedir: /var/cache/dnf
2021-11-02T19:22:58-0500 DDEBUG Base command: makecache
2021-11-02T19:22:58-0500 DDEBUG Extra commands: ['makecache', '--timer']
2021-11-02T19:22:58-0500 DEBUG Making cache files for all metadata files.
2021-11-02T19:22:58-0500 INFO Metadata cache refreshed recently.
2021-11-02T19:22:58-0500 DDEBUG Cleaning up.
To display the beginning of a file, the head
command shows the first 10 lines by default. Similar to tail
, you can modify the output by using the -n
option:
$ head -n 20 /var/log/dnf.log
2021-10-19T00:53:06-0500 INFO --- logging initialized ---
2021-10-19T00:53:06-0500 DDEBUG timer: config: 3 ms
2021-10-19T00:53:06-0500 DDEBUG DNF version: 4.7.0
2021-10-19T00:53:06-0500 DDEBUG Command: dnf makecache --timer
2021-10-19T00:53:06-0500 DDEBUG Installroot: /
2021-10-19T00:53:06-0500 DDEBUG Releasver: 8
2021-10-19T00:53:06-0500 DDEBUG cachedir: /var/cache/dnf
2021-10-19T00:53:06-0500 DDEBUG Base command: makecache
2021-10-19T00:53:06-0500 DDEBUG Extra commands: ['makecache', '--timer']
2021-10-19T00:53:06-0500 DDEBUG Making cache files for all metadata files.
2021-10-19T00:53:06-0500 WARNING Failed determining last makecache time.
2021-10-19T00:53:06-0500 DDEBUG appstream: has expired and will be refreshed.
2021-10-19T00:53:06-0500 DDEBUG baseos: has expired and will be refreshed.
2021-10-19T00:53:06-0500 DDEBUG extras: has expired and will be refreshed.
2021-10-19T00:53:25-0500 DDEBUG appstream: using metadata from Thu 07 Oct
2021-10-19T00:07:51 AM CDT.
2021-10-19T00:53:25-0500 DDEBUG repo: downloading from remote: appstream
2021-10-19T00:54:07-0500 DDEBUG repo: downloading from remote: baseos
2021-10-19T00:54:07-0500 DDEBUG baseos: using metadata from Thu 07 Oct
2021-10-19T00:07:02 AM CDT.
2021-10-19T00:54:07-0500 DDEBUG repo: downloading from remote: extras
Tip
For files with warnings or error messages, consider piping the output to a pager like less
to navigate the content easily.
Automating Text Manipulation with sed
Manually editing large files can be tedious, especially when you need to update many occurrences of the same text. The sed
(stream editor) utility automates text transformations.
Imagine you have a file named "userinfo.txt" that includes addresses, phone numbers, and citizenship details. If the country “Canada” is misspelled as “canda” throughout the file, you can preview the correction with:
$ sed 's/canda/canada/g' userinfo.txt
In this command:
- The
s
stands for substitute. s/canda/canada/g
tellssed
to replace every occurrence of "canda" with "canada" on each line (theg
flag indicates a global replacement).
After confirming the output, apply the changes in-place using the -i
option:
$ sed -i 's/canda/canada/g' userinfo.txt
Warning
Always preview changes before applying them with the -i
option to avoid accidental modifications.
Extracting Specific File Content
Sometimes you need only a part of a file. For instance, to extract the first column (e.g., names) from "userinfo.txt" when fields are space-delimited, use the cut
command:
$ cut -d ' ' -f 1 userinfo.txt
ravi
mark
john
ravi
mary
Here:
- The
-d
option defines the delimiter (a space in this case). - The
-f
option selects the field (column 1).
If your file uses commas as delimiters and you want to extract the third field (perhaps representing countries), you can direct the output to a new file:
$ cut -d ',' -f 3 userinfo.txt > countries.txt
Removing Duplicate Lines
If the output file (e.g., "countries.txt") contains duplicate entries, the uniq
command removes adjacent duplicate lines. However, since it only works with consecutive duplicates, sort the file first:
$ sort countries.txt
canada
canada
canada
usa
usa
Then, remove duplicates with:
$ sort countries.txt | uniq
canada
usa
Comparing Files
When configuration files change during an upgrade, comparing the old and new versions is crucial. The diff
command isolates differences between files, making it easier to identify what has changed.
For a simple comparison:
$ diff file1 file2
Example output:
1c1
< only exists in file 1
---
> only exists in file 2
4c4
< only exists in file 1
---
> only exists in file 2
This indicates that line 1 in file1 differs from line 1 in file2 (denoted by 1c1
), and similarly for line 4. Here, <
shows content from the first file and >
shows content from the second file.
For additional context, use the context option (-c
):
$ diff -c file1 file2
*** file1 2021-10-28 20:39:43.083264406 -0500
--- file2 2021-10-28 20:40:02.900262846 -0500
***************
*** 1,4 ****
! only exists in file 1
identical line 2
identical line 3
! only exists in file 1
--- 1,4 ----
! only exists in file 2
identical line 2
identical line 3
! only exists in file 2
For those who prefer a side-by-side comparison, use the -y
option or the sdiff
command:
$ diff -y file1 file2
only exists in file 1 | only exists in file 2
identical line 2 identical line 2
identical line 3 identical line 3
only exists in file 1 | only exists in file 2
$ sdiff file1 file2
only exists in file 1 | only exists in file 2
identical line 2 identical line 2
identical line 3 identical line 3
only exists in file 1 | only exists in file 2
In these side-by-side outputs, a pipe (|
) separates the differing content.
Extracting Names Using Cut
If you need to extract just the names (the first column) from "userinfo.txt", the command is:
$ cut -d ' ' -f 1 userinfo.txt
Removing Duplicates and Sorting
After extracting data, such as a list of countries, you might notice duplicates. Combining the sort
and uniq
commands will both sort the entries and remove duplicates, ensuring an ordered list.
Conclusion
In this article, we've covered essential Linux utilities for managing text files, including:
- Viewing files with
cat
,tac
,tail
, andhead
- Modifying content with
sed
- Extracting columns with
cut
- Removing duplicates and sorting using
sort
anduniq
- Comparing files with
diff
andsdiff
These tools are indispensable for system administration and everyday Linux operations. With practice, you'll be better equipped to efficiently manage and analyze text files.
Happy scripting!
Watch Video
Watch video content