This project came from a first-year Master’s course at Université Grenoble Alpes: Conception des systèmes d’exploitation et programmation concurrente. The assignment was to implement malloc, free, realloc, and their underlying infrastructure on a fixed memory region with no system calls. No sbrk. No mmap. Just a static 4096-byte array, a custom allocator built on top of it, and a test suite that replaced the libc symbols entirely.
The point was not to write something competitive with glibc. The point was to understand what is actually happening below malloc. And once you do that, memory bugs stop being mysterious. You see exactly where the metadata lives, how fragmentation happens, why a double-free is catastrophic, and how the defenses against heap exploitation work.
The memory model
The allocator manages a single contiguous region defined by mem_space_get_addr() and mem_space_get_size(). Behind those functions is a trivial implementation: a static char memory[MEMORY_SIZE] array with the default size compiled in as 4096 bytes, overridable via -DMEMORY_SIZE=... at build time. The allocator never calls out to the OS at all. Everything it can ever hand out is in that array.
mem_init() runs once on first use, sets MEM_SPACE_MIN and MEM_SPACE_MAX as bounds for validity checks, and writes a single initial free block covering the entire memory region minus the header size:
free_block_list = (mem_free_block_t *)memory_area;
free_block_list->size = total_memory_size - MEM_FREE_BLOCK_SIZE;
free_block_list->next = NULL;
After that point, malloc and free are manipulating that initial block and its descendants.
Two structs, same memory
The design distinguishes two kinds of block through two different header structs. A free block uses:
struct mem_free_block_s {
size_t size;
struct mem_free_block_s *next;
};
An allocated block uses:
struct mem_allocated_block_s {
size_t size;
uint64_t guard;
};
Both sit at the start of the block they describe. The pointer you hand back to the caller is the address immediately after the header. When free is called, the allocator subtracts sizeof(mem_allocated_block_t) from the pointer to get back to the header, then reads the metadata from there.
The two structs are not the same size. On a 64-bit system, mem_free_block_t is 16 bytes (8 for size, 8 for the pointer). mem_allocated_block_t is also 16 bytes (8 for size, 8 for the uint64_t guard). The compiler pads the allocated block header to 16 bytes for alignment regardless of the guard type, which is actually why the guard is 64-bit. A 32-bit guard would give a 12-byte header, which the compiler pads to 16 anyway, wasting 4 bytes. By using uint64_t, the guard fills the padding that would otherwise be dead space.
The difference in meaning is important: a free block uses its second field to point to the next free block in the list. An allocated block uses the same memory location to store the guard value. Casting from one to the other is safe precisely because the layout is the same size; the interpretation is what changes.
The free block list
Free blocks are maintained as a singly linked list sorted by memory address. The sort order is not aesthetic. It makes coalescing — merging adjacent free blocks back into one — possible without an additional scan. When you insert a newly freed block, you can check the block immediately to its left and right in address space by looking at the list position, and fuse them if they are contiguous.
The project brief required keeping it as a singly linked list. A doubly linked list would have made some operations cleaner, but the constraint was explicit. The comment in the source reflects some frustration with this: // use of this structure is mandatory for the project, we are *not* allowed to switch to a doubly linked list :/.
Finding the insertion point requires search_block:
void search_block(mem_free_block_t *block, mem_free_block_t ***list) {
while ((**list) && ((**list) < block)) {
*list = &(**list)->next;
}
}
This walks the list until it finds the position where block should go (where the current node’s address is no longer less than block). The triple pointer is necessary: list is a pointer to the pointer-to-pointer that holds the current position in the list. Walking forward requires updating the pointer to the next field, not just reading it.
The Linus Torvalds trick
The remove_block and insert_block functions use a technique from a 2016 TED talk where Linus Torvalds demonstrates “good taste” in C by showing how to remove a node from a linked list without a special case for the head. The standard approach needs an if (prev == NULL) branch to handle removing the first element. The better approach passes a pointer to the next pointer that points at the target, making the head case and the middle case identical:
void remove_block(mem_free_block_t *block, mem_free_block_t **list) {
if (*list == block) {
*list = block->next;
block->next = NULL;
}
}
Because search_block hands back a pointer to the next field that should be updated, remove_block can always do the same thing regardless of position. No branch needed.
insert_block uses the same principle to detect whether the previous (left) block is contiguous for coalescing:
mem_free_block_t *prev = (mem_free_block_t *)(((char *)list) - offsetof(mem_free_block_t, next));
if (prev >= memory_area) {
if ((char *)prev + MEM_FREE_BLOCK_SIZE + prev->size == (char *)block) {
prev->size += MEM_FREE_BLOCK_SIZE + block->size;
prev->next = block->next;
} else {
prev->next = block;
}
}
list points to a next field somewhere in the list. Subtracting offsetof(mem_free_block_t, next) from it gives the address of the parent struct. That is the block to the left. If its end address equals block’s start address, they are contiguous and can be fused.
Guards: integrity checking without a separate table
Each allocated block carries two guard values. One lives in the header (the guard field of mem_allocated_block_t). The other sits at the end of the allocated data region, 8 bytes before where the next block begins. Both are computed the same way:
guard = ((uint64_t)block_address) ^ secret_number;
where secret_number = 0xDEADBEEFFEEDFACE.
Using the block’s own address in the guard means every block has a unique guard value. A buffer overflow that copies a valid guard from one block cannot satisfy the check on a different block because the XOR value is different. The secret makes the guard unpredictable to an attacker who does not know it (though a fixed compile-time constant is a weak secret; the source comment acknowledges this with // TODO: replace with logic to compute good secret number).
The tail guard at the end of the data region catches linear overwrites. If a user writes past the end of their allocation, the tail guard will be corrupted before the next block’s header is touched. When mem_free is called, it checks both guards with assert():
assert(block_to_free->guard == guard);
assert(*((uint64_t *)(...tail position...)) == guard);
A failed assert aborts the program. This is not graceful error recovery, but it is honest: a corrupted guard means the heap state cannot be trusted, and proceeding would make things worse.
mem_free also validates that the pointer falls within the allocator’s bounds before doing anything:
if (!zone || zone < MEM_SPACE_MIN || zone >= MEM_SPACE_MAX)
return;
This catches NULL frees and wild pointers. It does not catch double-frees — if a block has already been freed and its header overwritten with free block metadata, the guard check will fail on the second free and abort. The abort is correct behavior.
mem_alloc step by step
void *mem_alloc(size_t size) {
size += SECRET_SIZE; // reserve space for tail guard
mem_free_block_t *block = current_fit_function(free_block_list, size + MEM_FREE_ALL_DIFF);
if (!block) return NULL;
mem_free_block_t **list = &free_block_list;
search_block(block, &list);
remove_block(block, list);
if ((block->size - size - MEM_FREE_ALL_DIFF) <= MEM_MAX_BLOCK_SIZE + SECRET_SIZE) {
size = block->size - MEM_FREE_ALL_DIFF; // absorb rather than split
} else {
// split: write a new free block after the allocation
mem_free_block_t *new_free_block = (mem_free_block_t *)(((char *)block) + size + MEM_ALL_BLOCK_SIZE);
new_free_block->size = block->size - size - MEM_ALL_BLOCK_SIZE;
new_free_block->next = NULL;
insert_block(new_free_block, list);
}
*((mem_allocated_block_t *)block) = (mem_allocated_block_t){size, ((uint64_t)block) ^ secret_number};
*((uint64_t *)(tail position)) = ((uint64_t)block) ^ secret_number;
return (void *)(((char *)block) + MEM_ALL_BLOCK_SIZE);
}
Four stages. Find a free block using the current fit strategy. Remove it from the free list. Split it if there is enough remainder to form a valid new free block (the threshold is MEM_MAX_BLOCK_SIZE + SECRET_SIZE, which is the minimum useful free block size). Write both guards and return a pointer past the header.
MEM_FREE_ALL_DIFF is the difference between the free block header size and the allocated block header size. The fit function is called with size + MEM_FREE_ALL_DIFF because the block’s size field tracks the usable data space inside a free block, and a free block header is larger than an allocated block header by that difference. Adjusting the requested size upward accounts for the fact that the allocated header is smaller and will reclaim some bytes that the free header occupied.
Three fit strategies
The allocator supports three pluggable strategies, selected by calling mem_set_fit_handler:
First fit walks the list and returns the first block that satisfies the size requirement. Fastest on average, but produces fragmentation toward the beginning of the memory space over time as the easy cases accumulate small leftover gaps.
Best fit scans the entire list and returns the smallest block that satisfies the request. Minimizes waste per allocation but leaves behind many tiny fragments that individually cannot satisfy future requests. Also slower because it always scans the whole list.
Worst fit returns the largest available block. The reasoning is that a large block split by a modest request still leaves a large remainder that can satisfy future large requests. Works poorly in practice for most allocation patterns but can be useful when the request sizes are predictable and you want to preserve small blocks for small requests.
The strategy is a function pointer: static mem_fit_function_t *current_fit_function = mem_first_fit. Swapping strategies at runtime is one function call.
mem_realloc: the real complexity
mem_realloc is more complex than mem_alloc and mem_free combined. The function handles six distinct cases, and getting them wrong causes either memory leaks or corruption.
Case 1: zone == NULL — behave like mem_alloc(size). This matches the C standard.
Case 2: size == 0 — behave like mem_free(zone) followed by mem_alloc(0). Also matches the standard.
Case 3 (shrink) with right neighbor free: Extend the right neighbor to absorb the freed portion. This is in-place and avoids fragmentation.
Case 3 (shrink) with right neighbor occupied, remainder too small: Do nothing. The block stays at its current size. Better to waste a few bytes than to produce a fragment that can never be reused.
Case 3 (shrink) with right neighbor occupied, remainder large enough: Insert a new free block in the gap between the shrunk allocation and its right neighbor.
Case 4 (grow) with right neighbor not free or too small: Allocate a new block, memcpy the data, free the old block. This is the expensive path.
Case 4 (grow) with right neighbor free and large enough, but remainder too small after growth: Absorb the entire right neighbor into the allocation. The allocation ends up slightly larger than requested.
Case 4 (grow) with right neighbor free and large enough, remainder survives: Grow the allocation in place and split the remaining portion back into a new free block.
The in-place growth cases avoid the memcpy entirely. For a realloc that is growing a block by 10% and happens to have free space immediately to the right, this is a near-zero-cost operation.
The reentrancy problem
The malloc stub that replaces libc symbols has a subtle problem. The stub overrides malloc, calloc, realloc, and free globally. That means every call to any function that internally calls malloc also hits this code. printf allocates internally. So if malloc calls printf for debug output, and printf calls malloc, you get infinite recursion.
The fix uses a thread-local flag:
static __thread int gbl_in_lib = 0;
#define dprintf(args...) \
do { \
if (!gbl_in_lib) { \
gbl_in_lib = 1; \
printf(args); \
gbl_in_lib = 0; \
} \
} while (0)
__thread makes the flag per-thread, so each thread gets its own copy. The flag is set before calling printf and cleared after. If printf calls malloc, which checks the flag, dprintf inside that second malloc call finds the flag already set and skips the output. This is not recursive suppression — the actual malloc still runs, it just does not try to print again.
The stub replaces the libc symbols at link time. When you compile with -L./libs -lmalloc, the linker resolves malloc to your implementation instead of the standard library’s. Any program linked this way (including programs you did not write) will run your allocator under the hood.
Why this matters for security
The failure modes this project implements defenses against are the same ones that real heap exploitation relies on.
A use-after-free works by freeing a block and then writing to the same address again, corrupting what is now either a free block header or a new allocation at the same address. In this allocator, the guard check on mem_free would detect corruption of the header guard; an overwrite of the next pointer in a free block header would cause the list to reference arbitrary memory on the next allocation, producing a crash or a return of an attacker-controlled address.
A heap overflow past an allocation corrupts the tail guard. The next call to mem_free on that block will abort on the assert.
A double-free corrupts the free list the second time. The block’s size and next fields have been overwritten with free block metadata after the first free. The guard in the original allocated block header is gone. The guard check on the second call will fail.
These are exactly the conditions that exploitation techniques like tcache poisoning or unsafe unlink target in glibc. The defenses here — address-XOR guards, bounds checking, tail canaries — are simplified versions of what modern allocators use. Working through the implementation makes it concrete why corrupting a heap header is so powerful: the allocator unconditionally trusts the metadata it reads, and a corrupted header redirects control.
That translation from “bug in user code” to “attacker controls allocation return values” is what makes heap security work interesting, and implementing the allocator from scratch is one of the clearest ways to internalize it.