Intermediate โฑ 25 min Lesson 9 of 13

๐Ÿ” Text Processing

Powerful text processing with grep, sed, awk, and other tools.

Text Processing Power Tools

One of the greatest strengths of Linux is its rich collection of text-processing utilities. Combined with pipes, they let you transform, search, filter, and analyze text data with incredible efficiency.

grep โ€” Search for Patterns

grep (Global Regular Expression Print) searches for patterns in files or input and prints matching lines.

bash
# Basic search
grep "error" logfile.txt

# Case-insensitive
grep -i "error" logfile.txt

# Recursive search in directories
grep -r "TODO" ./src/

# Invert match (lines that do NOT match)
grep -v "debug" logfile.txt

# Count matches
grep -c "error" logfile.txt

# Show line numbers
grep -n "error" logfile.txt

# Use regex
grep -E "error|warning|fatal" logfile.txt
๐Ÿ’ก Tip: Use grep -E (or egrep) for extended regular expressions. It supports +, ?, |, and () without needing to escape them.

sed โ€” Stream Editor

sed performs text transformations on an input stream. The most common use is find-and-replace.

bash
# Substitute first occurrence on each line
sed 's/old/new/' file.txt

# Substitute ALL occurrences on each line (global)
sed 's/old/new/g' file.txt

# Edit the file in place
sed -i 's/old/new/g' file.txt

# Delete lines matching a pattern
sed '/pattern/d' file.txt

# Delete line 5
sed '5d' file.txt

# Insert text before line 3
sed '3i\New line of text' file.txt

# Print only lines 10-20
sed -n '10,20p' file.txt

awk โ€” Pattern Processing Language

awk is a full-fledged text-processing language. It excels at working with columnar data.

bash
# Print specific columns (space-separated by default)
awk '{print $1, $3}' data.txt

# Custom delimiter
awk -F: '{print $1, $7}' /etc/passwd

# Pattern matching
awk '/error/ {print $0}' logfile.txt

# BEGIN and END blocks
awk 'BEGIN {print "Name\tScore"} {print $1"\t"$2} END {print "Done"}' scores.txt

# Sum a column
awk '{sum += $2} END {print "Total:", sum}' sales.txt

# Conditional processing
awk '$3 > 90 {print $1, "passed with", $3}' grades.txt

Other Essential Text Tools

bash
# cut โ€” extract columns by delimiter or character position
cut -d: -f1 /etc/passwd         # first field, colon-delimited
cut -c1-10 file.txt             # first 10 characters of each line

# sort โ€” sort lines
sort file.txt                   # alphabetical
sort -n numbers.txt             # numeric sort
sort -r file.txt                # reverse sort
sort -t: -k3 -n /etc/passwd    # sort by 3rd field, numeric

# uniq โ€” remove adjacent duplicates (sort first!)
sort file.txt | uniq            # unique lines
sort file.txt | uniq -c         # count occurrences
sort file.txt | uniq -d         # show only duplicates

# wc โ€” word/line/character count
wc file.txt                     # lines, words, chars
wc -l file.txt                  # just line count

# tr โ€” translate/delete characters
echo "hello" | tr 'a-z' 'A-Z'   # HELLO
echo "hello  world" | tr -s ' '  # squeeze spaces
โš ๏ธ Warning: uniq only removes adjacent duplicates. Always pipe through sort first if your data isn't already sorted.

Try It Yourself

Terminal

Summary

You've explored the big three text-processing tools โ€” grep for searching, sed for substitution, and awk for columnar data โ€” plus supporting utilities like cut, sort, uniq, wc, and tr. These tools, combined with pipes, give you immense power over text data.

๐Ÿงช Test Your Knowledge

Answer the questions below to check your understanding of this lesson.