luaguides

string.unpack

string.unpack(fmt, s [, pos])

string.unpack in Lua 5.4 is the inverse of string.pack. It reads a binary string produced by string.pack and decodes it back into Lua values, using the same format-string grammar. This page covers the signature, the format string, the streaming pos argument, and the failure modes you are most likely to hit in real code.

Signature and return values

string.unpack(fmt, s [, pos])
ParameterTypeDefaultDescription
fmtstringrequiredFormat string describing the layout of s. See string.pack for the full grammar.
sstringrequiredThe packed (binary) string to decode.
posinteger11-based byte offset at which to start reading. Negative values count from the end (matching string.sub); values past #s + 1 are clamped to #s + 1.

string.unpack returns one Lua value for each data option in fmt, followed by a final integer: the 1-based offset of the first unread byte in s. That trailing offset is the part most people forget on first contact. It is not noise. It is the position to feed back into the next call when you are parsing a stream of concatenated records.

local a, b, c, next_pos = string.unpack(">i4Bi2", packed)
-- a, b, c  : the decoded values
-- next_pos: #packed + 1 when everything was consumed

The format string

The format string fmt is a sequence of single-character specifiers, most of which pull one value from the buffer. State specifiers like <, >, =, and ![n] change how later specifiers are interpreted. Whitespace is ignored.

OptionReads or writes
bSigned byte.
BUnsigned byte.
h / HSigned or unsigned short (native size).
l / LSigned or unsigned long (native size).
jlua_Integer (typically 64-bit on standard builds).
Jlua_Unsigned.
Tsize_t (native size).
i[n] / I[n]Signed or unsigned integer of n bytes (1..16). Default native size.
ffloat (native size).
ddouble (native size).
nlua_Number (default double).
c nFixed-length string of n bytes. Not aligned.
zZero-terminated string. Not aligned.
s[n]Length-prefixed string. Length is an unsigned integer of n bytes.
xOne byte of padding, ignored on unpack.
X opAlignment-only item, treated as if op were real.
< / > / =Set little-endian, big-endian, or native-endian.
![n]Set max alignment (1..16). Default 1 (no alignment).

String specifiers are where most decoding bugs hide, because c, z, and s have very different stopping conditions and none of them raise an error when the input is malformed. A short pack-and-unpack cycle shows all three in one shot:

local p = string.pack("c4zs1", "HEAD", "tail\0", "body")
local a, b, c, pos = string.unpack("c4zs1", p)
-- a = "HEAD"   (c4: exactly 4 bytes, no terminator)
-- b = "tail"   (z: stops at the embedded NUL byte)
-- c = "body"   (s1: 1-byte length prefix says 4 bytes follow)
-- pos = 15    (14 bytes consumed, plus 1 for the canonical sentinel)

If you need a deeper walkthrough of pack and unpack together, see the string.pack reference. In practice, the specifiers that show up most are b B h H i I f d s z c x, plus the !n, <, >, and = prefixes when you control the on-disk layout.

Reading at an offset with pos

The trailing position is the feature that makes string.unpack composable. Two records packed back-to-back can be read one at a time by feeding each call’s position to the next:

local buf = string.pack("<i2s1", 5, "hello")
          .. string.pack("<i2s1", 3, "lua")

local n1, s1, p1 = string.unpack("<i2s1", buf)      -- 5,  "hello", 9
local n2, s2, p2 = string.unpack("<i2s1", buf, p1)  -- 3,  "lua",   15

p1 is the offset of the first byte of the second record, and p2 is #buf + 1, the canonical “we consumed everything” sentinel. Wrap this in a loop and you have a binary record parser without ever slicing the buffer manually. For a one-shot decode of a single record, you can ignore the trailing value with _ or just not bind it.

Endianness, alignment, and padding

Three things in the format string quietly shape the byte layout:

  1. Endianness is set by <, >, or =. Default is native. The choice is sticky: it applies to every later integer specifier until you change it.
  2. Alignment is set by !n. Default is 1, which inserts no padding. !4>i2 means “align to 4, then big-endian 2-byte integer.” On unpack, padding is silently consumed; you do not get separate return values for x or X op.
  3. Padding is ignored, not validated. x skips a byte. X op advances the offset to where op would have started, then reads nothing. Use these to skip headers or trailing cruft without branching.

Alignment in practice means bytes you do not see. A format of !4>i2xi2 packs two big-endian 2-byte integers with 4-byte alignment and a pad between them, and the same format reads them back cleanly:

local p = string.pack("!4>i2xi2", 1, 2)
-- 6 bytes total: the first integer, 1 byte of x padding, 1 byte of alignment
-- padding, then the second integer

local a, b, next_pos = string.unpack("!4>i2xi2", p)
-- a = 1
-- b = 2
-- next_pos = 7  (1 past the last byte)

The sharp edge here is silent corruption, not an error. If you pack big-endian and unpack little-endian, you get a wrong number, no exception:

local p = string.pack(">i2", 0x0102)   -- bytes 01 02
local v = string.unpack("<i2", p)      -- reads as little-endian → 0x0201
-- v == 0x0201; no error raised

Pick the prefix deliberately and make sure pack and unpack agree. Native endianness (=) is the right choice when both sides are the same Lua build; reach for > when you need a stable wire format.

Errors and safety

Three classes of failure are common:

  • Invalid format string. A bad option character ("Z", for example) or a size outside 1..16 raises "invalid format string".
  • Truncated data. If the data runs out mid-format, unpack raises "data string too short". There is no partial-success mode.
  • Integer overflow on read. Reading a value that does not fit in a Lua integer raises an error from the same family. Unsigned options (B H L J I) treat Lua integers as unsigned.

If fmt comes from a config file or untrusted input, wrap the call in pcall and inspect the result:

local results = { pcall(string.unpack, fmt, data) }
if not results[1] then
    error("bad record: " .. tostring(results[2]))
end
local a, b, next_pos = results[2], results[3], results[4]

pcall returns the success flag plus the original return values, so a format with three data specifiers plus a trailing offset becomes results [2] through [5]. For raising your own errors from inside a decoder, see error.

Common mistakes

  • Confusing string.unpack with table.unpack (or the legacy global unpack). string.unpack decodes binary data; table.unpack expands a sequence into multiple return values. Same name, very different jobs.
  • Mismatched endianness between pack and unpack. Silent corruption, no error. Pin the prefix (<, >, or =) on both sides.
  • Forgetting the trailing position. If your format has three data specifiers, the call returns four values. Counting only three and ignoring the last one is a common source of off-by-one bugs in streaming code.
  • Assuming the returned position is end-relative. It is always 1-based, even if you passed a negative pos. If you call string.unpack("b", s, -5) on a 20-byte string, the returned position is 16, not -4.
  • Trusting dynamic format strings. Always validate fmt (or wrap in pcall) if it is built from external input. A single bad option character raises.

Putting it together

A small but concrete parser reads a stream of records from a binary buffer, skips a 4-byte magic header at the front, and uses the trailing return value to advance through the rest:

local function parse_records(buf)
    local records = {}
    local pos = 5  -- skip a 4-byte header
    while pos <= #buf do
        local id, ts, payload, next_pos = string.unpack(">i4I2c4", buf, pos)
        records[#records + 1] = {
            id      = id,
            ts      = ts,
            payload = payload,
        }
        pos = next_pos
    end
    return records
end

local buf = string.pack(">i4I2c4", 1001, 42, "cmd0")
          .. string.pack(">i4I2c4", 1002, 43, "cmd1")

parse_records(buf)
-- {
--   { id = 1001, ts = 42, payload = "cmd0" },
--   { id = 1002, ts = 43, payload = "cmd1" },
-- }

The loop trusts next_pos to advance to the next byte, and exits when pos is past the end of buf. That pos > #buf check is the same #buf + 1 sentinel the function hands you back, which is why the trailing return is worth keeping instead of discarding.

Conclusion

string.unpack is the decoder half of Lua 5.4’s binary format facility. Read with the same fmt you wrote with string.pack, count the return values carefully, and use the trailing position to stream through concatenated records. Endianness and alignment are not optional: pin them in the format string on both sides, and you will not get silently wrong data.

See also