#Introduction to Regular Expressions

A Regular Expression (Regex) is a powerful tool for matching and processing text. It defines a search pattern using a specific syntax within a string.

For example, verifying whether an input email address is valid character by character is tedious. Instead, a regular expression like:

^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$

can be used to validate it.

import re

# Validate email format
email_pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'
if re.match(email_pattern, "[email protected]"):
    print("Valid email")

#Metacharacters

Metacharacter	Meaning	Example
`.`	Matches any single character (except newline)	`a.c` → `abc`, `a1c`
`^`	Matches the beginning of the string	`^abc` → matches `abcxxxx`
`$`	Matches the end of the string	`abc$` → matches `xxxxabc`
`*`	Matches 0 or more repetitions of the preceding character	`a*` → `""`, `a`, `aa`
`+`	Matches 1 or more repetitions	`a+` → `a`, `aa`
`?`	Matches 0 or 1 repetition	`a?` → `""`, `a`
`{n}`	Matches exactly n repetitions	`a{2}` → `aa`
`{min,}`	Matches at least min repetitions	`a{2,}` → `aa`, `aaa`, `aaaa`
`{min,max}`	Matches between min and max repetitions	`a{2,3}` → `aa`, `aaa`
`[]`	Matches any one character inside the brackets	`[abc]` → `a`, `b`, `c`
`[^]`	Matches any one character not in the brackets	`[^abc]` → `d`, `e`, `f`
`[-]`	Indicates a range	`[a-z]` → `a`, `b`, ..., `z`
`()`	Groups expressions	`(abc)+` → `abc`, `abcabc`
`\|`	OR operator	`abc\|xyz` → `abc` or `xyz`
`\d`	Matches any digit, same as `[0-9]`	`\d` → `1`, `2`, `3`
`\D`	Matches any non-digit, same as `[^0-9]`	`\D` → `a`, `@`, `_`
`\w`	Matches alphanumeric or underscore, `[a-zA-Z0-9_]`	`\w` → `a`, `1`, `_`
`\W`	Matches non-word characters, `[^a-zA-Z0-9_]`	`\W` → `@`, `#`
`\s`	Matches any whitespace character	`\s` → space, `\t`, `\n`, etc.
`\S`	Matches any non-whitespace character	`\S` → `a`, `1`, `@`
`\b`	Matches word boundaries	`\bcat\b` → matches `cat` in a sentence
`\B`	Matches non-word boundaries	`\Bcat\B` → matches `cat` in `scatter`
`\r`	Carriage return	-
`\n`	Newline	-
`\f`	Form feed	-
`\t`	Tab	-
`\v`	Vertical tab	-
`\`	Escape character to treat special characters literally	`\+` → `+`

#Greedy vs Lazy Matching

By default, regex uses greedy matching, which means it tries to match the longest possible string. If a ? is added, it switches to lazy (non-greedy) matching, which matches the shortest possible string.

Greedy Pattern	Description	Lazy Pattern	Description
`.*`	Match 0 or more, longest possible	`.*?`	Match 0 or more, shortest
`.+`	Match 1 or more, longest possible	`.+?`	Match 1 or more, shortest
`.?`	Match 0 or 1, longest	`.??`	Match 0 or 1, shortest
`.{n,m}`	Match n to m times, longest	`.{n,m}?`	Match n to m times, shortest
`.{n,}`	Match at least n, longest	`.{n,}?`	Match at least n, shortest