Parsing with LPeg and re

· 8 min read · Updated April 1, 2026 · intermediate
parsing lpeg re peg stdlib

Lua’s built-in string patterns handle most search-and-replace jobs, but they struggle with structured input like arithmetic expressions, config files, or CSV data. Writing a parser by hand is error-prone and tedious. LPeg solves this by giving you a formal grammar system backed by Parsing Expression Grammars (PEGs).

LPeg was created by Roberto Ierusalimschy, Lua’s lead designer, and it ships with the standard library starting in Lua 5.2. LuaJIT users need to install it separately. The companion re module layers a regex-like syntax on top of LPeg’s programmatic API, which is what most developers reach for first.

Getting Started

LPeg is part of the standard library on Lua 5.2 and later:

local lpeg = require("lpeg")

On LuaJIT (which implements Lua 5.1), install via LuaRocks:

luarocks install lpeg

Once loaded, the core function you work with is lpeg.match():

local lpeg = require("lpeg")

local pattern = lpeg.P("hello")
local result = lpeg.match(pattern, "hello world")

print(result)  --> 6  (index of first character after the match)

lpeg.match() returns nil on failure, the position after the match on success, or captured values if your pattern includes captures.

Building Basic Patterns

lpeg.P() converts values into patterns. Strings match literally (not as regex wildcards):

local P, R, S = lpeg.P, lpeg.R, lpeg.S

-- Match literal text
local hello = P("hello")

-- R matches character ranges
local lowercase = R("az")
local uppercase = R("AZ")
local letter = lowercase + uppercase

-- S matches any single character in a set
local op = S("+-*/%^")

R("az") matches one lowercase letter. S("+-*/") matches a single operator character. You can combine these with sequence (*) and ordered choice (+):

local P, R = lpeg.P, lpeg.R

-- Sequence: match both in order
local greeting = P("hello") * P(",") * P(" ")

-- Ordered choice: try left side first, then right
local opt_hello = P("hello") + P("hi") + P("yo")

The + operator has PEG semantics, not regex semantics. It tries the left side first and only moves to the right if the left side fails completely. This matters more than it sounds.

Repetition Operators

Patterns support a set of repetition operators that behave differently from their regex counterparts:

OperatorMeaning
p^0Zero or more repetitions (greedy)
p^1One or more repetitions (greedy)
p^-1Zero or more, non-greedy (lazy)
p^nExactly n repetitions (positive n), or up to |n| repetitions (negative n); e.g. p^-1 means zero or one
local R = lpeg.R

-- One or more lowercase letters
local word = R("az")^1
print(lpeg.match(word, "hello"))  --> 6

-- Zero or more digits (greedy)
local decimals = R("09")^0
print(lpeg.match(decimals, "123abc"))  --> 4  (matched "123")

-- Zero or more digits (lazy - stops immediately)
print(lpeg.match(R("09")^-1, "123abc"))  --> 1  (matched nothing, tried first char)

LPeg repetition is greedy by default, matching as much as possible. Use ^-1 for lazy behavior.

Lookahead

Two operators let you inspect what comes next without consuming input:

-- Positive lookahead: asserts p matches but does not consume it
local p = P("foo") * #P("bar") * P("foo")

print(lpeg.match(p, "foobar"))  --> 4
print(lpeg.match(p, "foobaz"))   --> nil (lookahead failed)
-- Negative lookahead: succeeds only if p cannot match
local p = P("foo") * -P("bar") * P("foo")

print(lpeg.match(p, "foobaz"))   --> 4
print(lpeg.match(p, "foobar"))   --> nil (negative lookahead failed)

Captures

Captures are where LPeg becomes genuinely useful. They extract matched substrings and transform them.

Simple Capture: lpeg.C()

lpeg.C() wraps a pattern and captures the matched substring:

local C, R = lpeg.C, lpeg.R

local word = C(R("az")^1)
print(lpeg.match(word, "hello world"))  --> hello

lpeg.C() returns a new pattern, not the captured value. You still call lpeg.match() on it.

Constant and Position Captures

lpeg.Cc() produces fixed values without consuming input. lpeg.Cp() returns the current position:

local Cc, Cp, P = lpeg.Cc, lpeg.Cp, lpeg.P

-- Tag matched text with a category
local tagged = P("dog") * Cc("animal")
print(lpeg.match(tagged, "dog"))  --> animal

-- Capture the position
local pos = P("hello") * Cp()
print(lpeg.match(pos, "hello world"))  --> 6

Table Capture: lpeg.Ct()

lpeg.Ct() collects all captures into a table. This is how you build an AST:

local Ct, C, P, R, V = lpeg.Ct, lpeg.C, lpeg.P, lpeg.R, lpeg.V

local name = C(R("az")^1)

-- Named group captures
local pair = lpeg.Cg(name, "key") * P("=") * lpeg.Cg(name, "value")
local record = Ct(pair)

local result = lpeg.match(record, "foo=bar")
print(result.key, result.value)  --> foo  bar

Fold Capture: lpeg.Cf()

lpeg.Cf() aggregates multiple captured values using a fold function:

local Cf, C, R = lpeg.Cf, lpeg.C, lpeg.R

local number = C(R("09")^1) / tonumber
local sum = Cf(number * ("," * number)^0, function(acc, x) return acc + x end)

print(lpeg.match(sum, "10,20,30"))  --> 60

The / operator applies a transformation to a capture. Here it converts the captured string "10" into the number 10.

Grammars for Recursive Structures

Real parsers need recursion. A grammar is a table of named rules, with rule "1" (or "_") as the start rule:

local P, R, V, C, S = lpeg.P, lpeg.R, lpeg.V, lpeg.C, lpeg.S

local calc = P({
    "expr",
    expr   = V"term" * (V"addop" * V"term")^0,
    term   = V"factor" * (V"mulop" * V"factor")^0,
    factor = V"number" + "(" * V"expr" * ")",
    number = C(R("09")^1) / tonumber,
    addop  = C(S("+-")),
    mulop  = C(S("*/")),
})

V("rule") references another rule by name. All rules must be defined before they are referenced, which trips up developers accustomed to PEG forward declarations.

Once matched, calc produces a flat list of alternating operands and operators. You then evaluate it in Lua:

local function evaluate(text)
    local caps = lpeg.match(calc, text)
    if not caps then return nil, "parse error" end

    local result = caps[1]
    local i = 2
    while i <= #caps do
        local op = caps[i]
        local rhs = caps[i + 1]
        if op == "+" then result = result + rhs
        elseif op == "-" then result = result - rhs
        elseif op == "*" then result = result * rhs
        elseif op == "/" then result = result / rhs end
        i = i + 2
    end
    return result
end

print(evaluate("2+3"))       --> 5
print(evaluate("10-3*2"))    --> 4
print(evaluate("(10-3)*2"))  --> 14

The grammar correctly enforces precedence: multiplication binds tighter than addition. 2+3*4 produces 2, +, 3, *, 4, which evaluates to 2 + (3 * 4) = 14.

The re Module

Writing grammars with lpeg.P tables is powerful but verbose. The re module provides a conventional syntax:

local re = require("re")

Key differences from raw LPeg:

re syntaxLPeg equivalent
"abc"P("abc")
[abc]S("abc")
[a-z]R("az")
p*p^0
p+p^1
p?p^-1
p1 p2p1 * p2
p1 / p2p1 + p2
{p}Ct(p)
"name"V"name"
#p#p (lookahead)

Compiling and Matching

re.compile() converts a re-syntax string into an LPeg pattern:

local re = require("re")

local p = re.compile(" [a-z]+ ", { space = re.compile("[ \t\n]*") })

The second argument is an environment table. Custom definitions are accessed with %name inside the pattern.

Use re.match(), re.find(), and re.gsub() for quick operations:

local re = require("re")

re.match("hello", "hello world")      --> 6
local s, e = re.find("hello", "say hello world")
print(s, e)                           --> 5  9

local result = re.gsub("hello world", "[aeiou]", string.upper)
print(result)                         --> hEllO wOrld

Parsing Key-Value Pairs with re

Here is a practical example building a simple key-value parser:

local re = require("re")

local grammar = re.compile([[
    record  <- { pair (',' pair)* }
    pair    <- key '=' value
    key     <- {[a-z]+}
    value   <- {[a-z]+}
]])

local result = grammar:match("foo=bar,baz=qux")
print(result[1].key, result[1].value)  --> foo  bar
print(result[2].key, result[2].value)  --> baz  qux

The curly braces {...} create a table capture. Named sub-patterns like {[a-z]+} capture into named fields.

Practical Example: Parsing CSV

CSV parsing exposes the real value of LPeg. Quoted fields, escaped quotes, and newlines all create edge cases that regex-based approaches stumble over:

local lpeg = require("lpeg")

local comma     = lpeg.P(",")
local newline   = lpeg.P("\n")
local quote     = lpeg.P('"')
local dquote    = quote * quote  -- escaped quote: ""

-- Unquoted field: any character except comma or newline
local field_nq = lpeg.C((lpeg.P(1) - comma - newline)^0)

-- Quoted field: opening quote, content, closing quote
-- Content can be any char except quote, or escaped quote
local field_q  = lpeg.C(quote * (lpeg.P(1) - quote + dquote)^0 * quote)

local field = field_q + field_nq
local row   = lpeg.Ct(field * (comma * field)^0)
local csv   = lpeg.Ct(row * (newline * row)^0)

local data = lpeg.match(csv, 'name,age,city\nAlice,30,NYC\nBob,25,LA\n')

for _, row in ipairs(data) do
    print(unpack(row))
end

Output:

name	age	city
Alice	30	NYC
Bob	25	LA

The pattern (lpeg.P(1) - comma - newline)^0 reads as “match any character that is not a comma and not a newline, zero or more times.” The - operator is the difference (set subtraction) for character classes.

Common Gotchas

Strings match literally, not as regex. P("a.") matches the two-character string "a.", not an "a" followed by any character. Use P(1) for “any character.”

The + operator is ordered. P("ab") + P("a") will never match "ab" via the second branch, because the first branch succeeds on "a" first. This is a PEG, not a regular expression.

Grammars require pre-declaration. A rule cannot reference a rule defined later in the same table. Define all mutually-referencing rules before using them.

LuaJIT ships without LPeg. Most OpenResty installations use LuaJIT, which implements Lua 5.1 semantics. You need luarocks install lpeg on those systems.

No built-in error messages. lpeg.match() returns nil on failure with no explanation. For better diagnostics, look at lpeglabel, which adds labeled error reporting to LPeg.

See Also

For Lua’s built-in string pattern syntax, which handles simpler search-and-replace tasks, see /tutorials/pattern-matching/.

To serialize parsed data back to a string format, read /guides/lua-serialization/ after this.

If you need to handle JSON input, /guides/lua-json-parsing/ covers the available libraries.