Reversing
This is a compact workflow reference for reversing small binaries: preserve C semantics in Python, drive local targets reproducibly, parse binary layouts explicitly, and inspect ELF shared-object boundaries.
For vectorized loops, packed math, and compiler intrinsics, use the related Intrinsics note.
C Semantics In Python
When porting C logic, preserve the behavior that Python does not model by default.
| C concern | Python handling |
|---|---|
| Fixed-width overflow | Mask after operations with x & ((1 << bits) - 1). |
| Signedness | Convert at input/output boundaries, not only at the end. |
| Unsigned right shift | Mask first, then shift. |
| Integer division | Use int(a / b) for C99 truncation toward zero. |
| Promotions | Re-apply width after operations where the C type matters. |
switch fallthrough | Model explicitly; Python match does not fall through. |
| Pointers | Use offsets into bytearray, memoryview, ctypes, or mmap. |
| C strings | Slice bytes until \0; Python bytes do not stop there. |
| Struct layout | Use explicit struct formats or ctypes.Structure. |
| Unions | Reuse the same bytes with ctypes.Union or multiple unpack views. |
| Bitfields | Extract with masks and shifts; compiler packing can differ. |
| Float width | Python float is usually C double; use struct.pack("<f", ...) or numpy.float32 for 32-bit behavior. |
Use these helpers when the exact integer width matters:
from ctypes import c_int32, c_uint32
def to_i32(x: int) -> int:
return c_int32(x).value
def to_u32(x: int) -> int:
return c_uint32(x).value
def unsigned(x: int, bits: int = 32) -> int:
return x & ((1 << bits) - 1)
def signed(x: int, bits: int = 32) -> int:
x &= (1 << bits) - 1
sign = 1 << (bits - 1)
return x - (1 << bits) if x & sign else x
def c_div(a: int, b: int) -> int:
return int(a / b)
def c_mod(a: int, b: int) -> int:
return a - c_div(a, b) * b
def shl(x: int, n: int, bits: int = 32) -> int:
return unsigned(x << n, bits)
def shr_u(x: int, n: int, bits: int = 32) -> int:
return unsigned(x, bits) >> n
def shr_s(x: int, n: int, bits: int = 32) -> int:
return signed(x, bits) >> n
def hx(x: int, bits: int = 32) -> str:
return f"0x{unsigned(x, bits):0{bits // 4}X}"
Binary layout helpers:
import struct
def p32(x: int) -> bytes:
return struct.pack("<I", unsigned(x, 32))
def u32(buf: bytes, off: int = 0) -> int:
return struct.unpack_from("<I", buf, off)[0]
def cstr(buf: bytes, off: int = 0) -> bytes:
end = buf.find(b"\0", off)
return buf[off:] if end < 0 else buf[off:end]
Before trusting a port, compare intermediate values against the original binary or a tiny C harness. Test 0, 1, -1, max signed, min signed, high-bit set values, short buffers, and null bytes.
Python Process Harness
A harness should make the run reproducible: exact path, exact input, exact environment, captured stdout/stderr, return code, and timeout.
Minimal runner:
from pathlib import Path
import subprocess
target = Path("./program").resolve()
def run(payload: bytes = b"", args=(), timeout: float = 5, env=None):
proc = subprocess.run(
[str(target), *args],
input=payload,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
timeout=timeout,
check=False,
env=env,
)
return proc.returncode, proc.stdout, proc.stderr
code, out, err = run(b"A" * 24 + b"\n")
print(out.decode(errors="replace"))
print(err.decode(errors="replace"))
print("returncode:", code)
Make environment changes local to the harness:
import os
env = os.environ.copy()
env["LC_ALL"] = "C"
env["PYTHONHASHSEED"] = "0"
code, out, err = run(b"test\n", env=env)
assert code in (0, 1)
Use subprocess.Popen only when the target needs staged interaction:
proc = subprocess.Popen(
[str(target)],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
out, err = proc.communicate(b"input\n", timeout=5)
print(proc.returncode, out, err)
For local preload experiments, keep the hook path explicit:
from pathlib import Path
import os
root = Path("./scratch").resolve()
target = root / "program"
hook = root / "tap.so"
env = os.environ.copy()
env["LD_PRELOAD"] = str(hook)
Then pass env into the runner. If the hook changes behavior, the harness should make the change visible in stdout, stderr, return code, or a saved trace.
ELF And Shared Objects
Changing an ELF type field does not make an arbitrary executable behave like a shared library. Prefer one of these workflows:
- Inspect the binary to understand type, imports, exports, relocations, and entry points.
- Patch metadata on an actual shared object.
- Build a wrapper from source or a relocatable object.
- Load a real shared object from Python with an explicit ABI.
Inspect first:
file ./target
readelf -h -l -d ./target
readelf -Ws ./target | less
objdump -d -M intel ./target | less
objdump -T ./libtarget.so | less
Patch metadata only when the input is already a suitable shared object:
patchelf --print-soname ./libtarget.so
patchelf --set-soname libtarget.so ./libtarget.so
patchelf --print-rpath ./libtarget.so
patchelf --set-rpath '$ORIGIN' ./libtarget.so
When a relocatable object is available, build a wrapper:
/* wrapper.c */
extern int main(int, char **);
int target_main(int argc, char **argv) {
return main(argc, argv);
}
gcc -fPIC -shared -o wrapper.so wrapper.c target.o
Load it from Python:
import ctypes
lib = ctypes.CDLL("./wrapper.so")
lib.target_main.argtypes = [ctypes.c_int, ctypes.POINTER(ctypes.c_char_p)]
lib.target_main.restype = ctypes.c_int
Use ctypes only after defining argtypes and restype. Incorrect ABI assumptions can turn a small experiment into misleading output.