Lua String Patterns vs Regular Expressions
Lua does not have regular expressions. It has something simpler and more focused: Lua string patterns. If you arrive from Python, JavaScript, or PHP, you will immediately notice similarities and start reaching for tools that do not exist. This guide covers exactly what Lua string patterns can and cannot do, and how they differ from the regex engine you already know, from magic characters and character classes to the frontier pattern and balanced-pair matching.
Why Lua has its own pattern system
Lua’s standard library is deliberately small. Rather than importing a full regex engine, the string library ships with a lightweight pattern syntax that handles the vast majority of everyday text processing tasks. The tradeoff is real: Lua patterns cannot do everything PCRE can, but they are faster to parse and simpler to reason about.
The key distinction is that Lua patterns are regular strings, not a separate syntax. They follow the same escaping rules as any other Lua string. This means % is the escape character inside a pattern, not the backslash you might expect from other languages.
Magic Characters
Certain characters have special meaning inside a pattern. These are the magic characters:
( ) . % + - * ? [ ^ $
To match any of these literally, you escape them with %. This is a fundamental shift from PCRE and other regex engines, where \ is the escape character. Since Lua patterns are ordinary strings, using % avoids the painful double-escaping that plagues regex in many languages — you do not need to write \\ to match a literal backslash. The % escape works on every non-alphanumeric character, so even characters that are not currently magic (like # or &) can be escaped without harm. A common mistake is reaching for \( or \) out of regex habit — those simply do nothing in Lua patterns.
%. -- literal dot
%) -- literal close parenthesis
%% -- literal percent sign
%[ -- literal open bracket
When in doubt, escape it. % works as an escape for every non-alphanumeric character. This catches a lot of people coming from regex, where \ is the escape character. In Lua, \ has no special meaning inside patterns at all — you use % instead.
Character Classes
Single-character predefined classes let you match common character sets without building your own:
| Class | Matches |
|---|---|
. | Any character |
%a | Letters (a-z, A-Z) |
%c | Control characters |
%d | Digits (0-9) |
%l | Lowercase letters |
%p | Punctuation |
%s | Space characters |
%u | Uppercase letters |
%w | Alphanumeric (letters + digits) |
%x | Hexadecimal digits |
%z | The null character (\0) |
The uppercase version of any class is its complement:
%A -- any character that is NOT a letter
%S -- any character that is NOT a space
One important difference from PCRE: Lua’s %w does not include underscore. It matches [a-zA-Z0-9] only. PCRE’s \w includes _. If you need to match identifiers, use [_%w] or the full character class [_%a%d] instead.
You can also build your own character sets with square brackets:
[aeiou] -- vowels
[0-9a-fA-F] -- hex digits, same as %x
[^0-7] -- any character that is NOT an octal digit
[%[\\]] -- literal brackets and backslash
One subtlety: inside a character set, the % escape does not work the same way. To include a literal ] in your set, put it immediately after the opening [ or escape it. To include a literal -, put it at the start or end of the set.
Repetition Modifiers
Lua patterns offer four modifiers that control how many times a character class can repeat:
| Modifier | Meaning |
|---|---|
+ | One or more (greedy) |
* | Zero or more (greedy) |
- | Zero or more (non-greedy / lazy) |
? | Zero or one (optional) |
%d+ -- one or more digits (integer)
%a* -- zero or more letters (word)
[+-]?%d+ -- optional sign followed by digits (e.g. -12, +100, 42)
The greedy versus non-greedy distinction matters more than you might expect. In Lua, * is the greedy zero-or-more quantifier, matching as many characters as possible, while - is its non-greedy (lazy) counterpart, matching as few characters as possible. Think of - as the Lua equivalent of *? in PCRE. One common trap: Lua has no lazy version of +. If you need a non-greedy “one or more,” you must restructure your pattern or use a capture to enforce at least one character. The example below shows the classic comment-stripping scenario where choosing the wrong modifier changes the result entirely:
-- Greedy: matches from the FIRST "/*" to the LAST "*/"
string.gsub("/* x */ int y; /* z */", "/%*.*%*/", "<C>")
--> "int y; <C>"
-- Non-greedy: matches from each "/*" to the FIRST "*/" after it
string.gsub("/* x */ int y; /* z */", "/%*.-%*/", "<C>")
--> "<C> int y; <C>"
This trips up a lot of people. In PCRE, you add ? after a quantifier to make it lazy. In Lua, - is the lazy version of *. There is no lazy version of + or ? — that is a genuine limitation.
Balanced Pairs with %b
Lua has a built-in pattern for matching balanced delimiter pairs, something that requires recursion or complex backtracking in most regex engines:
%b() -- matched parentheses
%b{} -- matched braces
%b[] -- matched brackets
%b"" -- matched double quotes
%b'' -- matched single quotes
The %b pattern takes two characters that represent the open and close delimiters. You can use any two characters, not just the standard bracket pairs — %b/* would match a C-style comment’s delimiters. Internally, Lua uses a simple counter: each opening character increments, each closing character decrements, and when the count hits zero the match ends. This is efficient and predictable but does mean that %b() cannot distinguish between function parentheses and arithmetic grouping parentheses in an expression. The example below shows %b() extracting a balanced pair from a function definition:
local s = "function foo(a, (b + c)) end"
print(string.match(s, "%b()")) --> (a, (b + c))
This uses a simple stack-based algorithm — it matches the first opening delimiter with the first closing one, then resets. It cannot match nested semantic pairs like if (a > (b + c)) then in a way that respects Lua’s grammar. But for delimited strings, it is extremely useful.
The frontier pattern %f
The %f[set] pattern matches the boundary between a character not in set and a character that is in set. This is Lua’s way of detecting transitions:
%f[%w] -- transition from non-word to word (word start)
%f[^%w] -- transition from word to non-word (word end)
A frontier pattern consumes no characters — it only checks the adjacent characters at a position, like a zero-width assertion. This means you can chain it with other patterns without worrying about it eating input. The %f[%w] at the start of a pattern anchors to word boundaries, and %f[^%w] at the end ensures the match does not spill into adjacent word characters. The example below uses both to extract the first whole word from a string, something that would require \b in PCRE:
local text = "hello world foo bar"
print(string.match(text, "%f[%w]%a+%f[^%w]")) --> hello
Frontier patterns are unique to Lua. PCRE has no direct equivalent. They are rarely needed but invaluable when you need to match words in context without lookahead.
The four core functions
Every pattern operation in Lua goes through one of four functions in the string library.
string.find
Searches for the first match and returns start and end indices:
local s = "hello world"
print(string.find(s, "world")) --> 7 11
Pass true as the fourth argument to do a plain text search with no pattern interpretation. This is critical when searching for strings that contain characters Lua treats as magic; $, ., (, ), %, and others. Without the plain flag, string.find("price: $50", "$50") would fail silently because $ is an anchor that matches end-of-string in Lua patterns, not a literal dollar sign. The plain argument also makes the search faster since Lua skips pattern parsing entirely. Always use it when you are looking for a literal substring rather than a pattern:
-- Without plain=true, the dot would match any character
print(string.find("price: $50", "$50", 1, true)) --> 8 11
string.match
Returns the matched substring or, when the pattern has captures, returns those captured values. Where string.find gives you indices, string.match gives you the actual text, which is often what you actually want. The real power emerges with captures; parentheses () inside a pattern mark sections to extract. Each capture becomes a separate return value, letting you destructure a string into its parts in a single line. This is how Lua programs parse dates, URLs, version numbers, and structured log lines without needing a separate parsing step:
local date = "30/05/1999"
local d, m, y = string.match(date, "(%d+)/(%d+)/(%d+)")
print(d, m, y) --> 30 05 1999
string.gmatch
Returns an iterator for stepping through all matches in a loop. Unlike string.match which returns only the first match, gmatch lets you walk through every occurrence in the string without knowing how many there are in advance. This is the Lua equivalent of Python’s re.finditer or JavaScript’s String.prototype.matchAll. Each call to the iterator advances through the string, and when captures are used in the pattern, gmatch yields them as multiple return values just like string.match does. The loop below shows the basic pattern of iterating over every number found in a space-separated string:
local s = "10 20 30 40"
for num in string.gmatch(s, "%d+") do
print(num)
end
string.gsub
Replaces every occurrence. The replacement can be a string, a table, or a function. This makes gsub far more flexible than a simple search-and-replace; you can use it as a lightweight templating engine, a translator, or even a code generator. When the replacement is a table, Lua looks up each match as a key and substitutes the corresponding value. When it is a function, Lua calls it with each captured value and inserts the return value. Unlike string.match which stops at the first match, gsub processes the entire string, making it the go-to function for bulk transformations:
local s = "hello world"
print(string.gsub(s, "o", "0")) --> hell0 w0rld 2
-- Table-based replacement
string.gsub("hello", "%l", {h="H", e="E", l="L", o="O"}) --> "HELLO"
-- Function-based replacement
string.gsub("10 20 30", "%d+", function(n) return tonumber(n) * 2 end)
--> "20 40 60"
What Lua patterns cannot do
Coming from PCRE, there are three gaps you will hit immediately.
No alternation. There is no | operator. To match one word or another, you need separate calls. This is one of the most glaring omissions for anyone arriving from a regex-heavy language like Perl, JavaScript, or Python. The design philosophy is that alternation invites backtracking and makes pattern matching harder to predict and optimize. In practice, you can often work around this by running multiple string.match calls with or, building a table of candidate patterns, or splitting the problem into multiple passes. The pattern below shows the typical fallback; testing each alternative separately:
local match = string.match(s, "foo") or string.match(s, "bar")
Or iterate over a table of alternatives. There is no elegant single-pattern workaround.
No lookaround. No lookahead (?=, ?!) or lookbehind (?<=, ?<!). The %f frontier pattern covers some boundary cases, but it cannot match conditionally based on what follows.
No quantifiers on groups. You cannot write (abc)+ in Lua. A modifier can only apply to a character class, not to a grouped subpattern. The workaround is to restructure your pattern or handle it in code.
Escaping user input
When building patterns from user input, you must escape all magic characters:
function escape_pattern(s)
return (s:gsub("[%(%)%.%%%+%-%*%?%[%]%^%$]", "%%%0"))
end
local user_input = "hello (world)"
local pattern = escape_pattern(user_input) --> "hello %(world%)"
Without this, any parentheses in the user input will be interpreted as capture groups. This is the most common source of bugs in Lua string pattern code.
Performance Notes
Lua does not compile patterns ahead of time. Each call to string.find, string.match, string.gmatch, or string.gsub parses the pattern string internally before matching. This makes the parsing cost paid on every call, but the parsing itself is much simpler and faster than PCRE’s engine.
For tight loops with heavy pattern matching, the difference can matter. If you are processing millions of short strings, pre-parsing is not an option; Lua simply does not have that feature. In practice, for most tasks (parsing config files, extracting data from structured logs, validating input), the pattern parsing overhead is negligible compared to the actual string scanning.
If you need maximum performance and are on LuaJIT instead of vanilla Lua, you can see 10-100x speedups on tight loops. The simpler pattern syntax also tends to JIT better than a full regex engine.
Quick Reference
| Feature | Lua Patterns | PCRE/Regex |
|---|---|---|
| Escape character | % | \ |
| Any character | . | . |
| One or more | + | + |
| Zero or more greedy | * | * |
| Zero or more lazy | - | *? |
| Optional | ? | ? |
| Alternation | None | | |
| Lookahead/behind | None | Yes |
| Anchors needed | No (anywhere by default) | Usually |
%w includes underscore | No | Yes |
| Balanced pairs | %b<xy> | Recursive patterns |
| Frontier pattern | %f[set] | None |
Lua string patterns are a deliberate tradeoff. They handle the common cases cleanly and leave the complex cases to external libraries or plain Lua code. Once you stop fighting that decision and work with the grain, you will find they cover a surprising amount of ground.
See Also
- Lua Metatables: Metatables let you redefine how string operations work on your own objects.
- Lua Weak Tables: Weak tables help manage memory when tables are used as pattern matchers or callbacks.
- Pattern Matching: A broader look at pattern facilities in Lua, including table patterns and the
matchconstruct. - LPeg Parsing: When Lua patterns hit their limits, LPeg provides a full parsing expression grammar for complex text processing.
- Functional Patterns: Techniques for composing string operations into reusable, testable functions without side effects.