luaguides

Implementing Sets in Lua

Lua ships with tables — versatile structures that act as arrays, maps, or dictionaries depending on how you use them. What it does not ship with is a built-in Set type. Fortunately, tables are flexible enough that you can build one yourself in a few lines of idiomatic Lua. In this tutorial, you’ll implement sets using tables, learn the standard operations, avoid the common pairs() ordering trap, and cap it off with a read-only proxy pattern for cases where mutation just shouldn’t happen.

What Is a Set?

A set is a collection of unique values where order doesn’t matter and membership is the primary operation. The classic question a set answers is: “Is X in this collection?” — answerable in constant time, not by scanning the whole thing.

In Lua, the convention is to use a table where keys are members and every value is true. This is sometimes called the “true sentinel” pattern.

local fruits = { apple = true, banana = true, cherry = true }
```lua

Why keys rather than values? If you stored members as values in an array-style table, a membership check would require scanning the whole table — O(n). By using keys, Lua's hash table lookup gives you O(1) membership checks. The `true` value is just a convention; any truthy value works, but `true` is the clearest choice.

You can also build a set incrementally:

```lua
local primes = {}
primes[2] = true
primes[3] = true
primes[5] = true
primes[7] = true
-- primes is now the set {2, 3, 5, 7}
```lua

This is the foundation everything else in this tutorial builds on.

## Basic Operations

Once you understand the sentinel pattern, the basic operations fall out naturally.

### Adding an element

```lua
set[key] = true
```lua

Because setting a key that already exists overwrites the value with `true`, adding an element to a set is idempotent — inserting the same value twice has no extra effect.

### Removing an element

```lua
set[key] = nil
```lua

Setting a key to `nil` deletes it from the table entirely. This is idiomatic Lua cleanup. There's no `remove` method needed, no special API call — just assign `nil`.

### Checking membership

```lua
if set[key] then
    print("key is in the set")
end
```lua

Since `nil` is falsy and `true` is truthy, a direct lookup works as a membership test. No comparison against a special value needed.

Here are those three operations wrapped in small helper functions to make your intent explicit:

```lua
local function add(set, key)
    set[key] = true
end

local function remove(set, key)
    set[key] = nil
end

local function contains(set, key)
    return set[key] ~= nil
end

local vowels = {}
add(vowels, "a")
add(vowels, "e")
add(vowels, "i")
add(vowels, "o")
add(vowels, "u")

print(contains(vowels, "a"))  --> true
print(contains(vowels, "z"))  --> false

remove(vowels, "e")
print(contains(vowels, "e"))  --> false
```lua

## Set Operations

You'll often need to combine or compare sets. The standard operations — union, intersection, difference, and subset check — are all implementable with a few lines of Lua.

### Union — elements in either set (or both)

```lua
local function union(a, b)
    local result = {}
    for k in pairs(a) do result[k] = true end
    for k in pairs(b) do result[k] = true end
    return result
end

local set_a = { x = true, y = true }
local set_b = { y = true, z = true }
local set_u = union(set_a, set_b)
-- set_u = { x = true, y = true, z = true }
```lua

### Intersection — elements in both sets

```lua
local function intersection(a, b)
    local result = {}
    for k in pairs(a) do
        if b[k] then result[k] = true end
    end
    return result
end

local set_a = { x = true, y = true }
local set_b = { y = true, z = true }
local set_i = intersection(set_a, set_b)
-- set_i = { y = true }
```lua

### Difference — elements in A but not in B

```lua
local function difference(a, b)
    local result = {}
    for k in pairs(a) do
        if not b[k] then result[k] = true end
    end
    return result
end

local set_a = { x = true, y = true }
local set_b = { y = true, z = true }
local set_d = difference(set_a, set_b)
-- set_d = { x = true }
```lua

### Subset check

```lua
local function is_subset(sub, super)
    for k in pairs(sub) do
        if not super[k] then return false end
    end
    return true
end

local small = { a = true, b = true }
local large = { a = true, b = true, c = true, d = true }
print(is_subset(small, large))  --> true
print(is_subset(large, small))  --> false
```lua

All of these iterate with `pairs()`, which brings us to a critical gotcha.

## Iteration with `pairs()` — The Ordering Trap

`pairs()` iterates over all key-value pairs in a table. For sets, the values are always `true`, so `pairs()` gives you the keys — which is what you want. The problem is **iteration order**.

```lua
local vowels = { a = true, e = true, i = true, o = true, u = true }

for letter in pairs(vowels) do
    print(letter)
end
```lua

This will print the vowels, but not in alphabetical order. The order is implementation-defined — it might be insertion order today, hash bucket order tomorrow, and different again in a different Lua version. You cannot rely on it.

If order matters for your use case, you need to sort explicitly:

```lua
local sorted = {}
for k in pairs(vowels) do table.insert(sorted, k) end
table.sort(sorted)
for _, v in ipairs(sorted) do print(v) end
-- a, e, i, o, u — guaranteed sorted
```lua

Note that `ipairs()` only works with sequential integer-keyed tables. It cannot iterate over a set's string keys. Once you've sorted the keys into an array, `ipairs()` handles the ordered traversal correctly.

This is one of the most common sources of surprising behaviour when people first use tables-as-sets in Lua. The rule of thumb: if you need ordered output, sort before iterating.

## Sets vs Arrays — When to Use Which

Sets and arrays (which in Lua are just tables with sequential integer keys) serve different purposes. Here's a quick decision guide:

| Scenario | Best choice |
|---|---|
| Checking if a value exists | **Set**O(1) lookup |
| Traversing values in insertion order | **Array**`ipairs()` |
| Deduplicating a list of values | **Set** — keys are naturally unique |
| Collecting unique items | **Set** — inserting the same key twice is a no-op |
| Storing an ordered sequence | **Array** — integer keys preserve order |
| Fast edge lookup in a graph | **Set** per node |
| Tracking tags or flags | **Set**`tags["urgent"] = true` |

The telltale sign that you should switch from an array to a set: if you find yourself doing this:

```lua
-- Array with a manual linear scan for membership
local items = { "apple", "banana", "cherry" }
local function contains(arr, val)
    for _, v in ipairs(arr) do
        if v == val then return true end
    end
    return false
end
```lua

Replace it with a set and the lookup becomes a single hash access:

```lua
local items = { apple = true, banana = true, cherry = true }
if items["banana"] then
    -- O(1) — done
end
```lua

## Read-Only Sets with Metatables

Sometimes you want a set that's fixed after creation — a constant set that nothing can modify. This is common when modelling things like operators, keyword sets in a lexer, or valid configuration values.

Lua's metatable mechanism lets you create a **proxy** around a table that intercepts writes and throws an error.

```lua
local function readonly_set(source)
    local proxy = {}
    local mt = {
        __index = function(_, k) return source[k] end,
        __newindex = function(_, _, _)
            error("cannot modify a read-only set", 2)
        end
    }
    setmetatable(proxy, mt)
    return proxy
end

local operators = readonly_set({
    ["+"] = true, ["-"] = true,
    ["*"] = true, ["/"] = true
})

print(operators["+"])   --> true
print(operators["%"])   --> nil (not in the set)
operators["+"] = nil    --> error: cannot modify a read-only set
operators["^"] = true  --> error: cannot modify a read-only set
```lua

Here's how it works:

- `__index` forwards any read to the underlying source table, so the proxy behaves exactly like the original set.
- `__newindex` fires whenever code tries to assign to any key — including `proxy["+"] = nil`. It throws an error with a two-level stack trace (`2`) so the error points to the caller's line, not the metamethod itself.
- The source table lives in the closure and is never exposed directly, so there's no way for external code to bypass the proxy.

One thing to be aware of: if any other code holds a direct reference to the underlying `source` table, it can still modify it. The proxy only guards access through the proxy itself. Keep the source table private.

The practical payoff for this pattern is clean, safe constant sets:

```lua
local KEYWORDS = readonly_set({
    ["and"] = true, ["break"] = true, ["do"] = true,
    ["else"] = true, ["elseif"] = true, ["end"] = true,
    ["false"] = true, ["for"] = true, ["function"] = true,
    ["if"] = true, ["in"] = true, ["local"] = true,
    ["nil"] = true, ["not"] = true, ["or"] = true,
    ["repeat"] = true, ["return"] = true, ["then"] = true,
    ["true"] = true, ["until"] = true, ["while"] = true
})

local function is_keyword(word)
    return KEYWORDS[word] ~= nil
end
```lua

This is the pattern you'll see in lexers and parsers across the Lua ecosystem.

## Practical Examples

Let's look at three scenarios where sets genuinely earn their keep.

### Deduplication

If you receive a list of values and need only the unique ones:

```lua
local input = { "apple", "banana", "apple", "cherry", "banana", "apple" }

local unique = {}
for _, v in ipairs(input) do
    unique[v] = true
end

local result = {}
for k in pairs(unique) do
    table.insert(result, k)
end
table.sort(result)

for _, v in ipairs(result) do
    print(v)
end
-- apple, banana, cherry (sorted, deduplicated)
```lua

### Token classification in a lexer

When parsing text, you often need to quickly decide whether a character is a valid operator:

```lua
local operators = readonly_set({
    ["+"] = true, ["-"] = true, ["*"] = true, ["/"] = true,
    ["="] = true, ["<"] = true, [">"] = true,
    ["=="] = true, ["~="] = true, ["<="] = true, [">="] = true
})

local function tokenize_operator(chars, i)
    local two = chars[i] .. (chars[i + 1] or "")
    if operators[two] then
        return two, i + 2
    end
    local one = chars[i]
    if operators[one] then
        return one, i + 1
    end
    return nil, i
end

-- Test it
print(tokenize_operator({"=", "="}, 1))  --> ==  3
print(tokenize_operator({"+"}, 1))        --> +   2
print(tokenize_operator({"x"}, 1))        --> nil 1
```lua

### Graph node neighbours

In a graph represented as adjacency lists, storing each node's neighbours in a set gives fast edge existence checks:

```lua
local graph = {
    A = { B = true, C = true },
    B = { A = true, D = true },
    C = { A = true, D = true },
    D = { B = true, C = true }
}

local function has_edge(from, to)
    return graph[from] and graph[from][to] == true
end

print(has_edge("A", "B"))  --> true
print(has_edge("A", "D"))  --> false
```lua

## Summary

Sets in Lua are tables used in a specific way: keys are members, values are `true`. This simple convention gives you O(1) membership testing, idiomatic `nil`-based deletion, and a clean foundation for implementing union, intersection, difference, and subset operations.

The key takeaways:

- Use `set[key] = true` to add, `set[key] = nil` to remove, and `if set[key]` to check membership.
- `pairs()` iteration order is not guaranteed — sort keys explicitly if order matters.
- Reach for a set whenever you're doing repeated membership checks on a collection.
- Use a metatable proxy when you need an immutable set.

If you're building data structures in Lua, you'll also find it useful to understand how [hash maps work in Lua](/tutorials/ds-hash-maps/) — since that's what your sets are built on — and how [arrays and lists](/tutorials/ds-arrays-and-lists/) handle ordered data that sets deliberately don't care about.

## See Also

- [Implementing Hash Maps in Lua](/tutorials/ds-hash-maps/)
- [Implementing Arrays and Lists in Lua](/tutorials/ds-arrays-and-lists/)