Pattern Matching Deep Dive
Introduction to Pattern Matching
While many programming languages rely on external regular expression libraries, Lua includes pattern matching capabilities built directly into its standard library. This guide covers Lua’s pattern syntax—a lightweight alternative to full regex that handles most string manipulation tasks directly.
Lua patterns share similarities with regular expressions but use a distinct syntax optimized for the language’s minimalist philosophy. By the end of this guide, you’ll be comfortable extracting data, validating input, and transforming strings using patterns.
Understanding Character Classes
At the core of Lua’s pattern matching are character classes—shorthands that represent sets of characters.
| Class | Matches |
|---|---|
. | Any character |
%a | Letters (A-Z, a-z) |
%d | Digits (0-9) |
%l | Lowercase letters |
%u | Uppercase letters |
%w | Alphanumeric characters |
%s | Whitespace characters |
%p | Punctuation characters |
Character classes can be negated by capitalizing them: %A matches non-letters, %D matches non-digits, and so on.
You can also define your own custom character sets using brackets:
-- Match only vowels
local pattern = "[aeiou]"
-- Match anything except vowels (negated)
local pattern = "[^aeiou]"
-- Match hexadecimal digits
local pattern = "[0-9A-Fa-f]"
Here’s a basic example demonstrating character classes:
-- Check if a string contains only digits
local function is_numeric(s)
return s:match("^%d+$") ~= nil
end
print(is_numeric("12345")) -- true
print(is_numeric("12a45")) -- false
print(is_numeric("")) -- false
The ^ anchor ensures the match starts at the beginning, and $ anchors it to the end—together, they require the entire string to match the pattern. Since %d+ means “one or more digits,” an empty string returns nil.
Captures: Extracting Matched Data
When part of your pattern is enclosed in parentheses, Lua captures that portion for later use:
-- Extract the filename and extension from a path
local path = "/home/user/documents/report.pdf"
local filename, extension = path:match("([^/]+)%.(%w+)$")
print(filename) -- report
print(extension) -- pdf
The pattern ([^/]+)%.(%w+)$ breaks down as:
([^/]+)— Capture one or more characters that aren’t forward slashes%— Escape the literal period (special in patterns)(%w+)— Capture one or more word characters$— Anchor to end of string
Captures become invaluable when parsing structured text:
-- Parse a simple CSV-like line
local line = 'Alice,25,Engineer'
local name, age, role = line:match("([^,]+),([^,]+),(.+)")
print(name) -- Alice
print(age) -- 25
print(role) -- Engineer
Practical Pattern Examples
Let’s explore real-world scenarios where Lua patterns prove useful.
Validating Email Addresses
local function is_valid_email(email)
local pattern = "^[%w%.]+@[%w%.]+%.%w+%.%w+$"
return email:match(pattern) ~= nil
end
print(is_valid_email("user@example.com")) -- true
print(is_valid_email("user@example.co.uk")) -- true
print(is_valid_email("invalid@email")) -- false
print(is_valid_email("@nodomain.com")) -- false
The pattern now handles multi-part top-level domains like .co.uk and .com.au by requiring two or more word characters after the final dot.
Extracting Numbers from Text
local text = "The temperature is 23 degrees Celsius"
-- Find the first number in a string
local temp = text:match("%d+")
print(temp) -- 23
-- Extract all numbers using gmatch
local numbers = {}
for num in text:gmatch("%d+") do
table.insert(numbers, tonumber(num))
end
print(numbers[1]) -- 23
The gmatch iterator lets you process all matches in a string without a loop:
-- Extract all words from a sentence
local sentence = "hello world lua"
for word in sentence:gmatch("%a+") do
print(word)
end
-- hello
-- world
-- lua
Replacing Text with gsub
The gsub function performs global substitution and returns both the modified string and the number of replacements:
local sentence = "the quick brown fox jumps over the lazy dog"
-- Replace all vowels with asterisks
local masked = sentence:gsub("[aeiou]", "*")
print(masked)
-- th* q**ck br*wn f*x j*mps *v*r th* l*zy d*g
-- Count replacements using a replacement function
local count = 0
sentence:gsub("the", function()
count = count + 1
end)
print(count) -- 2
Since we changed the sentence to use lowercase “the” throughout, the count is now accurate.
Using Captures in Replacements
You can reference captured groups in the replacement string using %1, %2, and so on:
local text = "john doe"
-- Swap first and last name
local swapped = text:gsub("(%a+)%s(%a+)", "%2, %1")
print(swapped) -- doe, john
This is particularly powerful for reformatting structured data:
-- Convert ISO date to readable format
local date = "2024-03-15"
local formatted = date:gsub("(%d+)-(%d+)-(%d+)", "%3/%2/%1")
print(formatted) -- 15/03/2024
Pattern Modifiers and Magic Characters
Certain characters have special meaning in patterns:
| Character | Meaning |
|---|---|
+ | One or more (greedy) |
* | Zero or more (greedy) |
- | Zero or more (lazy) |
? | Zero or one |
% | Escape character |
The difference between + and * matters when zero matches are possible:
local text = "123"
print(text:match("%d*")) -- "123" (matches as many as possible)
print(text:match("%d+")) -- "123" (requires at least one)
print(text:match("%d?")) -- "1" -- matches zero or one, stops early
The lazy quantifier - tries to match as few characters as possible:
local html = "<div>content</div>"
-- Greedy: matches as much as possible
print(html:match("<.+>")) -- <div>content</div>
-- Lazy: matches the minimum
print(html:match("<.- >")) -- nil (no space before >)
-- Fixed: lazy match without requiring space
print(html:match("<.- >")) -- nil
print(html:match("<[^>]+>")) -- <div>
The pattern <.- > fails because there’s no space before the closing >. Use <[^>]+> instead to match HTML tags without requiring a space.
Balanced Pairs with %b
Lua supports matching balanced delimiters using %b:
-- Match balanced parentheses
local text = "func(a, b(c, d))"
local balanced = text:match("%b()")
print(balanced) -- (a, b(c, d))
-- Match balanced brackets
local text = "[outer [inner]]"
local balanced = text:match("%b[]")
print(balanced) -- [outer [inner]]
This is invaluable for parsing nested structures:
-- Extract content within outermost parentheses
local code = "result = calculate(a, b(1, 2)) + other"
local params = code:match("%b()")
print(params) -- (a, b(1, 2))
Escaping Special Characters
The % character itself must be escaped as %%:
local text = "50% off!"
-- Match a literal percent sign
local percent = text:match("%%")
print(percent) -- %
-- Match numbers followed by percent
local value = text:match("(%d+)")
print(value) -- 50
This is essential when processing strings that contain special pattern characters:
-- Match a file path with dots
local path = "config.yaml"
local filename = path:match("([%w%._-]+)$")
print(filename) -- config.yaml
Working with String Find
The find function locates patterns and returns positions:
local text = "Hello, World!"
local start_pos, end_pos, capture = text:find("(%w+)", 1)
print(start_pos) -- 1
print(end_pos) -- 5
print(capture) -- Hello
This is particularly useful for extracting content between delimiters:
local text = "prefix[MIDDLE]suffix"
local _, _, content = text:find("%[([^%]]+)%]")
print(content) -- MIDDLE
Conclusion
Lua’s pattern matching provides practical tools for everyday string work. For most extraction, validation, and transformation tasks, you won’t need external libraries—built-in patterns handle the job directly. The key is mastering character classes, captures, modifiers, and the %b balanced pair syntax for parsing nested structures.
While Lua patterns lack some features of full regex (like lookahead assertions), they remain remarkably capable. The built-in syntax is simpler to learn and debug, which often outweighs the limitations. Practice with the examples above, and pattern matching will become a natural part of your Lua toolkit.