Linux Foundation Certified System Administrator (LFCS)
Essential Commands
Compare and Manipulate File Content
In this article, we explore essential Linux command-line tools for comparing and manipulating file content. Because Linux relies heavily on text—whether for SSH sessions, configuration files, or log files—mastering these commands will improve your efficiency in managing and troubleshooting your system.
Viewing File Content
To inspect a small file quickly, use the cat
command with the filename. For example:
$ cat /home/users.txt
user1
user2
user3
user4
user5
user6
If you prefer to see the contents in reverse order (from bottom to top), the tac
command is available:
$ tac /home/users.txt
user6
user5
user4
user3
user2
user1
For longer files, such as logs, it is often more practical to view only a portion of the file. The tail
command displays the last 10 lines by default, which is useful for checking the most recent log entries. Conversely, the head
command shows the beginning of a file.
For instance, consider a log file with 10 lines (keep in mind that empty lines might also count):
$ head /var/log/apt/term.log
Log started: 2024-03-11 04:41:37
Selecting previously unselected package libyaml2:amd64.
(Readi ng database ... 118768 files and directories currently installed.)
Preparing to unpack .../libyaml2_2.1.0-3ubuntu0.22.04.1_amd64.deb ...
Unpacking libyaml2:amd64 (2.1.0-3ubuntu0.22.04.1) ...
Selecting previously unselected package libvirt0:amd64.
Preparing to unpack .../libvirt0_8.0.0-1ubuntu7.8_amd64.deb ...
Unpacking libvirt0:amd64 (8.0.0-1ubuntu7.8) ...
Selecting previously unselected package libvirt-clients.
You can also control the number of lines displayed by using the -n
option with both tail
and head
.
Automating Text Replacement with SED
Editing multiple instances manually in large files can be error-prone and time-consuming. The Stream Editor (SED) automates search and replace tasks efficiently. For example, if a file listing user details has the country "Canada" misspelled as "canda", you can preview the correction with:
$ sed 's/canda/canada/g' userinfo.txt
Let's break down the command:
s/canda/canada/g
: The substitute command wherecanda
is replaced withcanada
globally on each line.- Single quotes ensure Bash does not interpret special characters.
- The
-g
flag replaces all occurrences in each line.
Once you're satisfied with the preview, apply the change in-place:
$ sed -i 's/canda/canada/g' userinfo.txt
Note
Always back up your files before performing in-place edits with sed -i
.
It is important to quote the expression correctly to prevent Bash from misinterpreting special characters such as the asterisk. Both single and double quotes can be used:
$ sed "s/canda/canada/g" userinfo.txt
Extracting Data with Cut
The cut
command is ideal for extracting specific columns from a file. For example, to extract the first column—which often contains names—from a space-separated file, use:
$ cut -d ' ' -f 1 userinfo.txt
Here, -d ' '
sets the delimiter to a space, while -f 1
specifies that the first field should be extracted.
If the file is comma-separated, simply adjust the delimiter. For instance, to extract the third field (which could represent country names) and save the output to countries.txt
, run:
$ cut -d ',' -f 3 userinfo.txt > countries.txt
In this command, the redirection operator (>
) saves the extracted output to a new file.
Removing Duplicate Entries
After extracting data—like a list of countries—you might encounter duplicate entries. The uniq
command removes duplicates from adjacent lines. For example:
$ uniq countries.txt
usa
canada
usa
canada
To remove duplicates effectively, sort the file first so that similar lines are adjacent, then pipe the output to uniq
:
$ sort countries.txt | uniq
canada
usa
Piping (|
) is a powerful technique that allows you to pass the output from one command directly into another for further processing.
Comparing Files with Diff
When system upgrades or configuration changes modify files, comparing the old and new versions is crucial. The diff
command highlights these differences. Consider the following example:
$ diff file1 file2
1c1
< only exists in file 1
---
> only exists in file 2
4c4
< only exists in file 1
---
> only exists in file 2
In this output, the notation 1c1
indicates that line 1 of file1
differs from line 1 of file2
. The <
symbol shows content from file1
, while >
represents content from file2
.
For more context, use the -c
option:
$ diff -c file1 file2
*** file1 2021-10-28 20:39:43.083264406 -0500
--- file2 2021-10-28 20:40:02.900262846 -0500
**************
** 1,4 ****
! only exists in file 1
identical line 2
identical line 3
! only exists in file 1
--- 1,4 ----
! only exists in file 2
identical line 2
identical line 3
! only exists in file 2
For a side-by-side visual comparison, use the -y
option:
$ diff -y file1 file2
only exists in file 1 | only exists in file 2
identical line 2 | identical line 2
identical line 3 | identical line 3
only exists in file 1
Alternatively, you can use sdiff
for a similar side-by-side comparison:
$ sdiff file1 file2
only exists in file 2
identical line 2
identical line 3
exists in file 2
Tip
Using the diff
command with different options (-c
, -y
, or sdiff
) can help you pinpoint changes more easily during system upgrades or when troubleshooting configuration issues.
Summary
This guide introduced a variety of Linux commands—cat
, tac
, head
, sed
, cut
, sort
, uniq
, and diff
—that are invaluable for viewing, editing, and comparing file content. Mastery of these tools not only streamlines your workflow but also enhances your ability to manage and debug files in any Linux environment.
For more detailed explanations and advanced use cases, consider exploring additional Linux command-line resources.
Watch Video
Watch video content