luaguides

Pattern Matching in Lua: Strings, Captures, and Modifiers

Prerequisites

You’ll need a working Lua 5.1+ interpreter — the pattern matching functions described here (string.match, string.gmatch, string.gsub, string.find) are part of Lua’s standard library and work identically across PUC Lua and LuaJIT. No external modules are required. Basic familiarity with Lua strings and function calls is assumed. If you’re completely new to Lua strings, review the tables introduction first to understand how string methods work via the : syntax.

Introduction to pattern matching

While many programming languages rely on external regular expression libraries, Lua includes pattern matching capabilities built directly into its standard library. This guide covers Lua’s pattern syntax — a lightweight alternative to full regex that handles most string manipulation tasks directly.

Lua patterns share similarities with regular expressions but use a distinct syntax optimized for the language’s minimalist philosophy. By the end of this guide, you’ll be comfortable extracting data, validating input, and transforming strings using patterns.

Understanding character classes

At the core of Lua’s pattern matching are character classes—shorthands that represent sets of characters.

ClassMatches
.Any character
%aLetters (A-Z, a-z)
%dDigits (0-9)
%lLowercase letters
%uUppercase letters
%wAlphanumeric characters
%sWhitespace characters
%pPunctuation characters

Character classes can be negated by capitalizing them: %A matches non-letters, %D matches non-digits, and so on.

You can also define your own custom character sets using brackets:

-- Match only vowels
local pattern = "[aeiou]"

-- Match anything except vowels (negated)
local pattern = "[^aeiou]"

-- Match hexadecimal digits
local pattern = "[0-9A-Fa-f]"

Custom character sets let you match exactly the characters relevant to your data. Because Lua uses 1-based table indexing, constructing pattern ranges like [0-9] works the way most developers expect by intuition—zero through nine inclusive, with no off-by-one surprises. A negated set such as [^aeiou] matches everything that is not a vowel, making it useful for stripping or counting particular character classes from input strings before further processing.

Here’s a basic example demonstrating character classes:

-- Check if a string contains only digits
local function is_numeric(s)
    return s:match("^%d+$") ~= nil
end

print(is_numeric("12345"))   -- true
print(is_numeric("12a45"))  -- false
print(is_numeric(""))       -- false

The ^ anchor ensures the match starts at the beginning, and $ anchors it to the end—together, they require the entire string to match the pattern. Since %d+ means “one or more digits,” an empty string returns nil.

Captures: extracting matched data

When part of your pattern is enclosed in parentheses, Lua captures that portion for later use:

-- Extract the filename and extension from a path
local path = "/home/user/documents/report.pdf"

local filename, extension = path:match("([^/]+)%.(%w+)$")

print(filename)    -- report
print(extension)  -- pdf

The pattern ([^/]+)%.(%w+)$ breaks down as:

  • ([^/]+) — Capture one or more characters that aren’t forward slashes
  • % — Escape the literal period (special in patterns)
  • (%w+) — Capture one or more word characters
  • $ — Anchor to end of string

Captures become invaluable when parsing structured text:

-- Parse a simple CSV-like line
local line = 'Alice,25,Engineer'

local name, age, role = line:match("([^,]+),([^,]+),(.+)")

print(name)  -- Alice
print(age)   -- 25
print(role)  -- Engineer

Notice how match returns multiple values from a single call—a Lua idiom that pairs naturally with pattern captures. Each parenthesized group becomes a separate return value, assigned left-to-right to the variables on the left side of the assignment statement. If the pattern fails to match, match returns nil rather than empty strings, so you can test the result directly in a conditional without inspecting individual capture values separately.

Practical pattern examples

Let’s explore real-world scenarios where Lua patterns prove useful.

Validating email addresses

local function is_valid_email(email)
    local pattern = "^[%w%.]+@[%w%.]+%.%w+%.%w+$"
    return email:match(pattern) ~= nil
end

print(is_valid_email("user@example.com"))     -- true
print(is_valid_email("user@example.co.uk"))   -- true
print(is_valid_email("invalid@email"))        -- false
print(is_valid_email("@nodomain.com"))        -- false

The validation above uses ^ and $ anchors to insist on a full-string match—any extra characters before or after the email would cause the pattern to return nil. Character class ranges like %w cover alphanumeric characters, and the escaped dots match literal periods in domain names. Lua’s . matches any single character, so escaping . is essential whenever you need a real dot, as in domain names and file extensions.

The pattern now handles multi-part top-level domains like .co.uk and .com.au by requiring two or more word characters after the final dot.

Extracting numbers from text

local text = "The temperature is 23 degrees Celsius"

-- Find the first number in a string
local temp = text:match("%d+")
print(temp)  -- 23

-- Extract all numbers using gmatch
local numbers = {}
for num in text:gmatch("%d+") do
    table.insert(numbers, tonumber(num))
end

print(numbers[1])  -- 23

Unlike match which stops at the first hit, gmatch returns an iterator that yields every non-overlapping match across the entire string, making it the right choice for scanning through log files or data streams. Keep in mind that tonumber is needed to convert string captures into actual numbers—Lua patterns always produce string results, even when matching digit sequences with %d+, because the pattern engine operates purely on text.

The gmatch iterator lets you process all matches in a string without a loop:

-- Extract all words from a sentence
local sentence = "hello world lua"
for word in sentence:gmatch("%a+") do
    print(word)
end
-- hello
-- world
-- lua

The iterator-based approach with gmatch is memory-efficient because matches are produced one at a time rather than collected into a list first. For small strings the difference is negligible, but when processing paragraphs or longer documents, avoiding an intermediate table of all matches can keep your program’s memory footprint low. This lazy evaluation pattern is a recurring theme across Lua’s iterator-based functions.

Replacing text with gsub

The gsub function performs global substitution and returns both the modified string and the number of replacements:

local sentence = "the quick brown fox jumps over the lazy dog"

-- Replace all vowels with asterisks
local masked = sentence:gsub("[aeiou]", "*")
print(masked)
-- th* q**ck br*wn f*x j*mps *v*r th* l*zy d*g

-- Count replacements using a replacement function
local count = 0
sentence:gsub("the", function()
    count = count + 1
end)
print(count)  -- 2

The replacement function trick counts occurrences by running a callback each time gsub finds a match. Because gsub iterates at the C level rather than inside a Lua loop, the callback approach sidesteps the need for an explicit loop around match. This pattern—using a function as the replacement argument—works for any substitution where the replacement value depends on what was matched, not just for counting.

Since we changed the sentence to use lowercase “the” throughout, the count is now accurate.

Using captures in replacements

You can reference captured groups in the replacement string using %1, %2, and so on:

local text = "john doe"

-- Swap first and last name
local swapped = text:gsub("(%a+)%s(%a+)", "%2, %1")
print(swapped)  -- doe, john

The pattern (%a+)%s(%a+) captures two word sequences separated by whitespace. Lua’s %s matches both spaces and tabs, so the swap works on indented text as well as single-space separators. When there are more than two words in the input string, match still only returns the first two captures it finds, because the pattern does not include anchors that would force matching the entire string from start to finish.

This is particularly powerful for reformatting structured data:

-- Convert ISO date to readable format
local date = "2024-03-15"
local formatted = date:gsub("(%d+)-(%d+)-(%d+)", "%3/%2/%1")
print(formatted)  -- 15/03/2024

Pattern modifiers and magic characters

Certain characters have special meaning in patterns:

CharacterMeaning
+One or more (greedy)
*Zero or more (greedy)
-Zero or more (lazy)
?Zero or one
%Escape character

The difference between + and * matters when zero matches are possible:

local text = "123"

print(text:match("%d*"))   -- "123" (matches as many as possible)
print(text:match("%d+"))   -- "123" (requires at least one)
print(text:match("%d?"))  -- "1"   -- matches zero or one, stops early

These three quantifier examples show how match returns the longest possible sequence that satisfies the pattern for * and +, while ? is satisfied by finding just one character—specifically the first digit. The greedy-versus-lazy distinction becomes most important when you have delimiters and want to stop at the first closing marker rather than matching all the way to the last one, which is a frequent source of bugs in HTML and markup processing.

The lazy quantifier - tries to match as few characters as possible:

local html = "<div>content</div>"

-- Greedy: matches as much as possible
print(html:match("<.+>"))   -- <div>content</div>

-- Lazy: matches the minimum
print(html:match("<.- >")) -- nil (no space before >)

-- Fixed: lazy match without requiring space
print(html:match("<.- >")) -- nil
print(html:match("<[^>]+>")) -- <div>

Using a custom character class like <[^>]+> is the standard Lua idiom for extracting angle-bracket tags because it says “match <, then one or more characters that are not >, then >.” This approach avoids the pitfalls of both greedy matching (which grabs everything between the first < and the last >) and lazy matching with - (which only works when the next token in the pattern can appear inside the optional range). The [^>]+ technique is explicit and non-ambiguous.

The pattern <.- > fails because there’s no space before the closing >. Use <[^>]+> instead to match HTML tags without requiring a space.

Balanced pairs with %b

Lua supports matching balanced delimiters using %b:

-- Match balanced parentheses
local text = "func(a, b(c, d))"
local balanced = text:match("%b()")
print(balanced)  -- (a, b(c, d))

-- Match balanced brackets
local text = "[outer [inner]]"
local balanced = text:match("%b[]")
print(balanced)  -- [outer [inner]]

Lua’s %b operator is a unique feature not found in typical regular expression engines—it counts nesting levels internally and returns the substring from the first opening delimiter to the matching closing delimiter at the same depth. This makes it ideal for extracting function arguments, array contents, or any bracket-delimited region without worrying about embedded delimiters at inner nesting levels that would confuse a linear character-class approach.

This is invaluable for parsing nested structures:

-- Extract content within outermost parentheses
local code = "result = calculate(a, b(1, 2)) + other"
local params = code:match("%b()")
print(params)  -- (a, b(1, 2))

Balanced-pair parsing with %b works for any two distinct characters passed as the argument to %b, including unconventional pairings like %b// for matching C-style comment regions. The operator treats the two characters as opening and closing delimiters respectively and counts nesting depth, so %b<> handles template angle brackets while correctly skipping inner < and > pairs inside nested structures. This depth-aware behavior is what separates %b from simpler scanning approaches.

Escaping special characters

The % character itself must be escaped as %%:

local text = "50% off!"

-- Match a literal percent sign
local percent = text:match("%%")
print(percent)  -- %

-- Match numbers followed by percent
local value = text:match("(%d+)")
print(value)    -- 50

Because % doubles as both the escape character in Lua patterns and the format specifier in string.format, working with literal percent signs requires extra attention. The pattern %% tells Lua to treat the percent sign as a plain character rather than as the start of a character class like %d or %a. When you need to match a literal backslash in a string, however, you must double-escape it as \\ since Lua itself interprets \ as an escape in string literals.

This is essential when processing strings that contain special pattern characters:

-- Match a file path with dots
local path = "config.yaml"
local filename = path:match("([%w%._-]+)$")
print(filename)  -- config.yaml

The character class [%w%._-]+ shows how custom sets combine predefined classes and literal characters. Inside square brackets, %w still matches word characters, while %. matches a literal dot and _- matches underscore and hyphen. Note that the hyphen - must be placed at the end of the bracket expression—if placed between two characters, Lua interprets it as a range specifier, so [a-z] means “a through z” while [az-] means “a, z, or a literal hyphen.”

Working with string find

The find function locates patterns and returns positions:

local text = "Hello, World!"

local start_pos, end_pos, capture = text:find("(%w+)", 1)

print(start_pos)   -- 1
print(end_pos)     -- 5
print(capture)     -- Hello

Unlike match which returns captured substrings, find returns byte positions using 1-based indexing—the first character of the string is at position 1, not 0. The third return value is the capture from the first parenthesized group, which follows the same multi-return convention as match. When you don’t need position information, discarding the first two return values with the _ placeholder keeps the code clean while still capturing the substring you want.

This is particularly useful for extracting content between delimiters:

local text = "prefix[MIDDLE]suffix"

local _, _, content = text:find("%[([^%]]+)%]")
print(content)  -- MIDDLE

Conclusion

Lua’s pattern matching provides practical tools for everyday string work. For most extraction, validation, and transformation tasks, you won’t need external libraries—built-in patterns handle the job directly. The key is mastering character classes, captures, modifiers, and the %b balanced pair syntax for parsing nested structures.

While Lua patterns lack some features of full regex (like lookahead assertions), they remain remarkably capable. The built-in syntax is simpler to learn and debug, which often outweighs the limitations. Practice with the examples above, and pattern matching will become a natural part of your Lua toolkit.

Next steps

Pattern matching is one part of Lua’s string-handling story. Continue to operator overloading with metamethods to see how metatables give your custom types arithmetic and comparison operators. For working with larger text files, check out file I/O in Lua to read and write data using patterns for parsing.

See Also