Profiling Lua Code for Bottlenecks

April 18, 2026 · 6 min read ·Updated April 18, 2026 ·intermediate

performanceprofilingdebuggingoptimizationbenchmarking

A profiler tells you which parts of your program are slow — the bottlenecks. Without profiling, you guess. You optimise what feels slow, miss the real culprit, and end up with code that is faster in all the wrong places.

Lua makes profiling straightforward because the language gives you the tools built in. You do not need special tooling or recompiled binaries. This guide walks through every available approach, from quick manual timing to a full statistical sampler written in pure Lua.

Manual Timing with os.clock()

The fastest way to measure a function is to call it between two os.clock() readings:

local start = os.clock()
my_function(arg1, arg2)
local elapsed = os.clock() - start
print(("Took %.4f seconds"):format(elapsed))

os.clock() returns CPU time used by the current process, measured in seconds with microsecond precision. It is good enough for functions that take at least a few milliseconds.

For sub-millisecond measurements, the resolution is too coarse — system scheduling noise dominates the result. In that case, run the function many times and divide:

local N = 100000
local start = os.clock()
for i = 1, N do
  my_function(arg1, arg2)
end
local total = os.clock() - start
print(("%.6f ms per call"):format(total / N * 1000))

Disable the Garbage Collector

The garbage collector runs unpredictably. A GC pause during a timed section can skew a result by hundreds of milliseconds. Pause the GC before timing and restart it after:

collectgarbage("stop")
local start = os.clock()
for i = 1, N do
  my_function()
end
local elapsed = os.clock() - start
collectgarbage("restart")
print(("%.4f ms per call"):format(elapsed * 1000 / N))

This gives you steady, repeatable measurements.

Statistical Sampling with debug.sethook()

Manual timing tells you how long a specific function takes. A statistical sampler tells you where your program spends its time overall — especially useful when you do not know where to start.

The debug.sethook() function registers a hook that fires on events. You can hook on "call", "return", and "line" events. The strategy is to sample the call stack at regular intervals and build a picture of where the program was when the timer fired.

Here is a minimal pure-Lua sampler:

local sampled = {}
local counts = {}

local function sampler(event)
  -- Walk the stack and record each active function
  for level = 1, 100 do
    local info = debug.getinfo(level, "nSl")
    if not info then break end
    local name = info.name or "(anon)"
    local key = ("%s:%d %s"):format(info.short_src, info.currentline, name)
    counts[key] = (counts[key] or 0) + 1
  end
end

-- Sample for 5 seconds
debug.sethook(sampler, "crl")
local start = os.clock()
while (os.clock() - start) < 5 do end  -- replace with your program
debug.sethook()

-- Print top hotspots
local sorted = {}
for k, v in pairs(counts) do sorted[#sorted + 1] = {k, v} end
table.sort(sorted, function(a, b) return a[2] > b[2] end)
for i = 1, 20 do
  print(("%6d  %s"):format(sorted[i][2], sorted[i][1]))
end

The output lists source locations and function names sorted by hit count. The highest counts are your hotspots. This works in Lua 5.1 through 5.4 without any C extensions.

The main limitation: it samples by wall time, so functions that run during idle time (waiting for I/O, sleeping) appear disproportionately. Use it on representative workloads.

LuaJIT Profiling

If you are running on LuaJIT, the jit.profile module gives you built-in statistical sampling:

local profile = require("jit.profile")

local results = {}
profile.start("l", function(thread, samples, vmstate)
  -- vmstate is the current VM state at the sample
  print(samples, vmstate)
end)

A simpler view comes from jit.trace diagnostics. Enable trace recording and dump stats at the end of a run:

local jit = require("jit")
jit.on()

-- your code here --

jit.off()
require("jit.dump").exit()

This prints every compiled trace with IR instruction counts. Large traces with many instructions are worth examining for optimisation opportunities.

The Strategy: Narrow Down, Then Measure

Profiling is not a one-shot operation. Follow this cycle:

Sample broadly — use the debug hook profiler to find the general area (a module, a function, a loop)
Isolate the culprit — comment out or stub sections until the hot spot disappears
Time the fix — benchmark the specific function before and after

Most bottlenecks live in a small part of the code. Getting to that part is the hard part. Once you know where to look, fixing it is usually obvious.

The Stub-and-Time Technique

When a program has deep call chains, isolate the slow layer by replacing the innermost function with a no-op:

local function original_slow(path)
  -- do complex parsing
end

-- Temporarily stub it out
local function stub(path) return {} end
original_slow = stub

-- Now measure
local start = os.clock()
main_loop()
print(("With stub: %.4f seconds"):format(os.clock() - start))

If the total time drops significantly, the bottleneck is in original_slow or its callees. If it barely changes, the bottleneck is somewhere else. This tells you where to dig without any tools.

Common Bottlenecks to Watch For

Even without a profiler, these patterns are frequent culprits:

Repeated table resizing. Every t[#t+1] = val is a potential resize if the array part is full. In hot loops building large tables, use table.create(n) in Lua 5.4 or table.new(n, 0) in LuaJIT:

local large = table.create(100000)  -- pre-allocated
for i = 1, 100000 do
  large[i] = compute(i)
end

Global lookups in tight loops. Each reference to math.sin costs a dictionary lookup. Cache it locally:

local sin, cos = math.sin, math.cos  -- look up once
for i = 1, N do
  sum = sum + sin(i) + cos(i)
end

String concatenation in loops. s = s .. chunk reallocates and copies on every iteration. Use a table:

local parts = {}
for line in lines do
  parts[#parts + 1] = process(line)
end
local result = table.concat(parts, "\n")

Metatable __index on every access. If you rely on __index to provide defaults, it fires on every missing key. Once the table is populated, direct access is faster. Populate explicitly after construction:

for k, v in pairs(defaults) do
  t[k] = v  -- now direct, no __index overhead
end

Reading Profilers Correctly

A profiler shows where time was spent, not why it was slow. A line that appears in 80% of samples may look like the culprit, but it could be calling a slow C function deep in a library. Follow the call chain upward.

Also watch for self time vs total time. A function that shows 30% of total time may be doing real work — or it may call another function that accounts for 28% of that. Most profilers report self time (inside the function only) and total time (including callees). Look at both.