Quiz: Regex & Text Wrangling¶
7 questions
L0 (2 questions)¶
1. What is the difference between BRE (Basic Regular Expressions) and ERE (Extended Regular Expressions) when using grep?
Show answer
In BRE (default grep), metacharacters like +, ?, |, and () must be escaped with backslashes to be special. In ERE (grep -E), they are special by default without escaping. ERE is generally preferred for readability. *Common mistake:* People often forget that plain grep uses BRE where + means a literal plus sign, not 'one or more'.2. What is the difference between . (greedy) and .? (lazy) in regex?
Show answer
Greedy (.*) matches as much as possible, then backtracks. Lazy (.*?) matches as little as possible. Example: on 'text', '<.*>' matches the entire string, while '<.*?>' matches just ''. Use lazy quantifiers when you want the shortest match.L1 (3 questions)¶
1. You need a portable regex to match IP addresses in a log file. Why should you avoid \d and use [0-9] instead?
Show answer
\d is PCRE-only (grep -P). Standard grep -E and sed -E do not support \d. For portability across BRE/ERE/PCRE, use [0-9] or the POSIX class [[:digit:]].2. How do lookahead and lookbehind assertions work in regex?
Show answer
Lookahead (?=pattern) asserts what follows without consuming. Lookbehind (?<=pattern) asserts what precedes. Negative forms: (?!pattern) and (?3. What are named capture groups and backreferences in regex?
Show answer
Named groups: (?PL2 (1 questions)¶
1. Write a sed one-liner that extracts just the HTTP status code from an Nginx access log line like: '192.168.1.1 - - [15/Mar/2024:14:23:01 +0000] "GET /api HTTP/1.1" 200 1234'.
Show answer
sed -E 's/.* "[A-Z]+ [^ ]+ HTTP\/[0-9.]+" ([0-9]+) .*/\1/' — this captures the three-digit status code after the request line using a capture group. *Common mistake:* A common mistake is not escaping the forward slashes or trying to use \d which does not work in sed.L3 (1 questions)¶
1. You have a 50GB log file and need to find all lines where a request took longer than 1 second (field format: duration=0.XXXs). How do you approach this efficiently?