Text Processing | BashHero

Text Processing Power Tools

One of the greatest strengths of Linux is its rich collection of text-processing utilities. Combined with pipes, they let you transform, search, filter, and analyze text data with incredible efficiency.

grep — Search for Patterns

grep (Global Regular Expression Print) searches for patterns in files or input and prints matching lines.

bash

# Basic search
grep "error" logfile.txt

# Case-insensitive
grep -i "error" logfile.txt

# Recursive search in directories
grep -r "TODO" ./src/

# Invert match (lines that do NOT match)
grep -v "debug" logfile.txt

# Count matches
grep -c "error" logfile.txt

# Show line numbers
grep -n "error" logfile.txt

# Use regex
grep -E "error|warning|fatal" logfile.txt

💡 Tip: Use grep -E (or egrep) for extended regular expressions. It supports +, ?, |, and () without needing to escape them.

sed — Stream Editor

sed performs text transformations on an input stream. The most common use is find-and-replace.

bash

# Substitute first occurrence on each line
sed 's/old/new/' file.txt

# Substitute ALL occurrences on each line (global)
sed 's/old/new/g' file.txt

# Edit the file in place
sed -i 's/old/new/g' file.txt

# Delete lines matching a pattern
sed '/pattern/d' file.txt

# Delete line 5
sed '5d' file.txt

# Insert text before line 3
sed '3i\New line of text' file.txt

# Print only lines 10-20
sed -n '10,20p' file.txt

awk — Pattern Processing Language

awk is a full-fledged text-processing language. It excels at working with columnar data.

bash

# Print specific columns (space-separated by default)
awk '{print $1, $3}' data.txt

# Custom delimiter
awk -F: '{print $1, $7}' /etc/passwd

# Pattern matching
awk '/error/ {print $0}' logfile.txt

# BEGIN and END blocks
awk 'BEGIN {print "Name\tScore"} {print $1"\t"$2} END {print "Done"}' scores.txt

# Sum a column
awk '{sum += $2} END {print "Total:", sum}' sales.txt

# Conditional processing
awk '$3 > 90 {print $1, "passed with", $3}' grades.txt

Other Essential Text Tools

bash

# cut — extract columns by delimiter or character position
cut -d: -f1 /etc/passwd         # first field, colon-delimited
cut -c1-10 file.txt             # first 10 characters of each line

# sort — sort lines
sort file.txt                   # alphabetical
sort -n numbers.txt             # numeric sort
sort -r file.txt                # reverse sort
sort -t: -k3 -n /etc/passwd    # sort by 3rd field, numeric

# uniq — remove adjacent duplicates (sort first!)
sort file.txt | uniq            # unique lines
sort file.txt | uniq -c         # count occurrences
sort file.txt | uniq -d         # show only duplicates

# wc — word/line/character count
wc file.txt                     # lines, words, chars
wc -l file.txt                  # just line count

# tr — translate/delete characters
echo "hello" | tr 'a-z' 'A-Z'   # HELLO
echo "hello  world" | tr -s ' '  # squeeze spaces

⚠️ Warning: uniq only removes adjacent duplicates. Always pipe through sort first if your data isn't already sorted.

Try It Yourself

Terminal

echo -e "apple\nbanana\napple\ncherry\nbanana\napple" | sort | uniq -c | sort -rn
echo "Hello World" | tr 'a-z' 'A-Z'
echo "user:Alice:admin" | cut -d: -f2

Summary

You've explored the big three text-processing tools — grep for searching, sed for substitution, and awk for columnar data — plus supporting utilities like cut, sort, uniq, wc, and tr. These tools, combined with pipes, give you immense power over text data.

🔍 Text Processing

Text Processing Power Tools

grep — Search for Patterns

sed — Stream Editor

awk — Pattern Processing Language

Other Essential Text Tools

Try It Yourself

Summary

🧪 Test Your Knowledge