string.unpack
string.unpack(fmt, s [, pos]) string.unpack in Lua 5.4 is the inverse of string.pack. It reads a binary string produced by string.pack and decodes it back into Lua values, using the same format-string grammar. This page covers the signature, the format string, the streaming pos argument, and the failure modes you are most likely to hit in real code.
Signature and return values
string.unpack(fmt, s [, pos])
| Parameter | Type | Default | Description |
|---|---|---|---|
fmt | string | required | Format string describing the layout of s. See string.pack for the full grammar. |
s | string | required | The packed (binary) string to decode. |
pos | integer | 1 | 1-based byte offset at which to start reading. Negative values count from the end (matching string.sub); values past #s + 1 are clamped to #s + 1. |
string.unpack returns one Lua value for each data option in fmt, followed by a final integer: the 1-based offset of the first unread byte in s. That trailing offset is the part most people forget on first contact. It is not noise. It is the position to feed back into the next call when you are parsing a stream of concatenated records.
local a, b, c, next_pos = string.unpack(">i4Bi2", packed)
-- a, b, c : the decoded values
-- next_pos: #packed + 1 when everything was consumed
The format string
The format string fmt is a sequence of single-character specifiers, most of which pull one value from the buffer. State specifiers like <, >, =, and ![n] change how later specifiers are interpreted. Whitespace is ignored.
| Option | Reads or writes |
|---|---|
b | Signed byte. |
B | Unsigned byte. |
h / H | Signed or unsigned short (native size). |
l / L | Signed or unsigned long (native size). |
j | lua_Integer (typically 64-bit on standard builds). |
J | lua_Unsigned. |
T | size_t (native size). |
i[n] / I[n] | Signed or unsigned integer of n bytes (1..16). Default native size. |
f | float (native size). |
d | double (native size). |
n | lua_Number (default double). |
c n | Fixed-length string of n bytes. Not aligned. |
z | Zero-terminated string. Not aligned. |
s[n] | Length-prefixed string. Length is an unsigned integer of n bytes. |
x | One byte of padding, ignored on unpack. |
X op | Alignment-only item, treated as if op were real. |
< / > / = | Set little-endian, big-endian, or native-endian. |
![n] | Set max alignment (1..16). Default 1 (no alignment). |
String specifiers are where most decoding bugs hide, because c, z, and s have very different stopping conditions and none of them raise an error when the input is malformed. A short pack-and-unpack cycle shows all three in one shot:
local p = string.pack("c4zs1", "HEAD", "tail\0", "body")
local a, b, c, pos = string.unpack("c4zs1", p)
-- a = "HEAD" (c4: exactly 4 bytes, no terminator)
-- b = "tail" (z: stops at the embedded NUL byte)
-- c = "body" (s1: 1-byte length prefix says 4 bytes follow)
-- pos = 15 (14 bytes consumed, plus 1 for the canonical sentinel)
If you need a deeper walkthrough of pack and unpack together, see the string.pack reference. In practice, the specifiers that show up most are b B h H i I f d s z c x, plus the !n, <, >, and = prefixes when you control the on-disk layout.
Reading at an offset with pos
The trailing position is the feature that makes string.unpack composable. Two records packed back-to-back can be read one at a time by feeding each call’s position to the next:
local buf = string.pack("<i2s1", 5, "hello")
.. string.pack("<i2s1", 3, "lua")
local n1, s1, p1 = string.unpack("<i2s1", buf) -- 5, "hello", 9
local n2, s2, p2 = string.unpack("<i2s1", buf, p1) -- 3, "lua", 15
p1 is the offset of the first byte of the second record, and p2 is #buf + 1, the canonical “we consumed everything” sentinel. Wrap this in a loop and you have a binary record parser without ever slicing the buffer manually. For a one-shot decode of a single record, you can ignore the trailing value with _ or just not bind it.
Endianness, alignment, and padding
Three things in the format string quietly shape the byte layout:
- Endianness is set by
<,>, or=. Default is native. The choice is sticky: it applies to every later integer specifier until you change it. - Alignment is set by
!n. Default is1, which inserts no padding.!4>i2means “align to 4, then big-endian 2-byte integer.” On unpack, padding is silently consumed; you do not get separate return values forxorX op. - Padding is ignored, not validated.
xskips a byte.X opadvances the offset to whereopwould have started, then reads nothing. Use these to skip headers or trailing cruft without branching.
Alignment in practice means bytes you do not see. A format of !4>i2xi2 packs two big-endian 2-byte integers with 4-byte alignment and a pad between them, and the same format reads them back cleanly:
local p = string.pack("!4>i2xi2", 1, 2)
-- 6 bytes total: the first integer, 1 byte of x padding, 1 byte of alignment
-- padding, then the second integer
local a, b, next_pos = string.unpack("!4>i2xi2", p)
-- a = 1
-- b = 2
-- next_pos = 7 (1 past the last byte)
The sharp edge here is silent corruption, not an error. If you pack big-endian and unpack little-endian, you get a wrong number, no exception:
local p = string.pack(">i2", 0x0102) -- bytes 01 02
local v = string.unpack("<i2", p) -- reads as little-endian → 0x0201
-- v == 0x0201; no error raised
Pick the prefix deliberately and make sure pack and unpack agree. Native endianness (=) is the right choice when both sides are the same Lua build; reach for > when you need a stable wire format.
Errors and safety
Three classes of failure are common:
- Invalid format string. A bad option character (
"Z", for example) or a size outside 1..16 raises"invalid format string". - Truncated data. If the data runs out mid-format,
unpackraises"data string too short". There is no partial-success mode. - Integer overflow on read. Reading a value that does not fit in a Lua integer raises an error from the same family. Unsigned options (
B H L J I) treat Lua integers as unsigned.
If fmt comes from a config file or untrusted input, wrap the call in pcall and inspect the result:
local results = { pcall(string.unpack, fmt, data) }
if not results[1] then
error("bad record: " .. tostring(results[2]))
end
local a, b, next_pos = results[2], results[3], results[4]
pcall returns the success flag plus the original return values, so a format with three data specifiers plus a trailing offset becomes results [2] through [5]. For raising your own errors from inside a decoder, see error.
Common mistakes
- Confusing
string.unpackwithtable.unpack(or the legacy globalunpack).string.unpackdecodes binary data;table.unpackexpands a sequence into multiple return values. Same name, very different jobs. - Mismatched endianness between
packandunpack. Silent corruption, no error. Pin the prefix (<,>, or=) on both sides. - Forgetting the trailing position. If your format has three data specifiers, the call returns four values. Counting only three and ignoring the last one is a common source of off-by-one bugs in streaming code.
- Assuming the returned position is end-relative. It is always 1-based, even if you passed a negative
pos. If you callstring.unpack("b", s, -5)on a 20-byte string, the returned position is16, not-4. - Trusting dynamic format strings. Always validate
fmt(or wrap inpcall) if it is built from external input. A single bad option character raises.
Putting it together
A small but concrete parser reads a stream of records from a binary buffer, skips a 4-byte magic header at the front, and uses the trailing return value to advance through the rest:
local function parse_records(buf)
local records = {}
local pos = 5 -- skip a 4-byte header
while pos <= #buf do
local id, ts, payload, next_pos = string.unpack(">i4I2c4", buf, pos)
records[#records + 1] = {
id = id,
ts = ts,
payload = payload,
}
pos = next_pos
end
return records
end
local buf = string.pack(">i4I2c4", 1001, 42, "cmd0")
.. string.pack(">i4I2c4", 1002, 43, "cmd1")
parse_records(buf)
-- {
-- { id = 1001, ts = 42, payload = "cmd0" },
-- { id = 1002, ts = 43, payload = "cmd1" },
-- }
The loop trusts next_pos to advance to the next byte, and exits when pos is past the end of buf. That pos > #buf check is the same #buf + 1 sentinel the function hands you back, which is why the trailing return is worth keeping instead of discarding.
Conclusion
string.unpack is the decoder half of Lua 5.4’s binary format facility. Read with the same fmt you wrote with string.pack, count the return values carefully, and use the trailing position to stream through concatenated records. Endianness and alignment are not optional: pin them in the format string on both sides, and you will not get silently wrong data.
See also
string.pack: the inverse encoder.string.byteandstring.char: single-byte conversions for simpler cases.pcallanderror: for safe decoding of untrusted buffers.table.unpack: the unrelated list-expander.- Working with binary data in Lua: broader patterns around
packandunpack. - Lua serialization: when to reach for binary
packversus a text format.