In this blog post I will be covering the memory manager I had to implement as homework for module 3. I will start off by explaining the data structure I used for this task. I will then explain the workings of memory allocation and justify the management strategy chosen for this homework. Then, I will cover memory de-allocation and advanced debug features. All of this will be done via an example use case scenario, as it might help visualise the behaviour of the memory manager. I will finish this post with considerations for how my implementation could be improved if I had to do this over again.
Keeping track of memory blocks
In order to manage memory, some sort of data structure is needed to keep track of available and allocated memory blocks. In my implementation, the structure is SMemoryBlock:
In a nutshell, it’s a singly-linked list that carries some extra information in it to keep track of memory. uBlockSize tells us how big a memory block is, psNextBlock points to the next block in the sequence and uAlignmentDelta records how many bytes were used to align the memory to the requested alignment. This is what you would call the Header of a memory block. This structure is used to keep track of available blocks, as well as allocated ones. However, the block layout in memory might look quite different when it’s free or when it’s occupied.
First, lets look at how the heap looks when it’s just been initialised with 2048 bytes of memory. A single free block is created, so our heap looks something like this:
To allocate, we first need to find a suitable block for the allocation. Various strategies exist for doing this, such as First-Fit, Best-Fit, Worst-Fit and so on. There is a basic trade-off between these strategies: fragmentation vs allocation speed. So for example, a First-Fit allocator will be fast at making allocations, but is likely to cause fragmentation down the line. On the other hand, Best-Fit is likely to reduce fragmentation, but allocations can be slow, as we need to search the full list of free memory to find the best fitting candidate.
I have chosen to use an Address-Aligned (fancy word for sorted) Best-Fit strategy, since it is one of the least fragmenting strategies according to the paper “The memory fragmentation problem: solved?” by Mark S. Johnstone and Paul R. Wilson. Furthermore, for applications in games development, our priority is to have the least amount of fragmentation (i.e. be able to fit everything we want), and speed is of second priority, since we would be pre-allocating everything for our game in advance as it loads.
The strategy is summarised in this diagram in the scenario of requesting 32 bytes of memory:
And my implementation does exactly this, searches through the free list, and finds the block that best matches the size of the requested memory block ( along with header/alignment/padding overheads ):
In the scenario where we make the first allocation, there is only one block, so we are going to make our allocation there. We first set the size of the block to what’s requested:
Then, we shift the block by the amount of bytes specified for padding, and we zero those bytes (only in debug!):
We add the same amount of padding to the back of the block we are allocating, and zero those bytes as well:
Next, we check if there’s enough space for a new free block after the allocation (i.e. if the free block was too big, so after we allocated the requested memory, we have tons left over). If so, we create a new free block which manages the left over memory:
Then, the allocated blocks nextBlock pointer is set to null (helps check integrity of header when memory is returned). In this case, the newly created block also becomes the first block in the list of free memory, so the m_psFreeList pointer is updated to point to it:
The first allocation is made successfully. Now assume that another allocation is sent, but this time the memory is not aligned by default, and so an additional step is required to align it. The whole process is repeated with the additional step in the diagram below. As noted in the diagram, when shifting the header for alignment, the size of the block remains written in the alignment area. In this case, the header was shifted by 4 bytes to match requested alignment:
For the sake of explaining the memory manager, fill up the memory with some more allocations. Note that after the last allocation, there is not enough space to create a new header for the remaining free memory ( the header takes up 12 bytes, but there is only 4 free ). In this situation, the memory manager simply assigns the left-over memory to the requested allocation. The client is not aware of this anyway, so it should not cause any issues:
When client code returns allocated memory, the memory manager receives a pointer to the start of the client data section in the memory block:
From this, the manager can retrieve the header of the memory block by shifting the pointer left by the amount of bytes that a header takes up (12):
Since the memory manager now knows all of the information about this block, it can start retrieving memory from it. In particular, it knows where the block starts and ends, so it can check and return the padding from the front and back:
Afterwards, if memory was used to provide alignment, the manager can retrieve that as well and shift the header left again:
All overheads particular to an allocated block of memory are cleared. Now, the memory manager needs to determine where to insert this block with regards to the list of free memory blocks. In this case, there are no free blocks, so the returned block is set as the start of the list:
Next, assume that the block that was allocated first is returned. The same steps are repeated to reclaim overheads, but this time it is immediately next to a free block. In this case, the newly returned block and the adjacent block are merged together:
Assume that allocation no. 4 is returned next. In this case, the memory manager finds that the block is further down in memory than any free block, and it is not immediately adjacent to any free blocks. This means that merging is not possible. Instead, the free block before the returned block is linked, and the list of free blocks now has 2 members in it:
Now, say that allocation no. 3 gets returned. The memory manager finds that it is between two free blocks. Furthermore, it is directly adjacent to the one before and the one after. The memory manager merges all 3 blocks together:
Next, lets assume that the client tries to de-allocate memory which has already been de-allocated. In this case, the memory points to somewhere in the list of free memory. The memory manager can check through the list of free blocks, and if the address is found within one of them, then the block has already been de-allocated:
Assume that the final block left to de-allocate has been under-run or over-run. In the simplest cases, the padding bytes are checked. If they are not set to 0, then an over-run or under-run occurred, depending on which padding (front or back) was corrupted:
However, the check for under-runs is not robust enough and could be improved upon. For some under-runs, the Header of the memory block might be corrupted itself. The memory manager has several checks it can run to verify if this happened. The header has three variables, so all of them need to be checked, starting with the alignment delta:
1. If the alignment delta is negative, it has been corrupted.
2. Given an alignment delta, we can find the start of the block returned (including padding). If the start of the block goes out of heap bounds, or overlaps with a block in the free list, then the delta must have been corrupted. This is illustrated in the example below:
Next, the block size is checked. Similar mechanisms are applied:
1. If block size is negative, it is corrupted.
2. If the end of the memory block is out of heap bounds, or overlaps with a free block in front of it, then block size is corrupted. This is illustrated in the figure below:
However, these mechanisms do not work if the under-written values of block size or alignment delta are either smaller than what they originally were, or if there are other allocated blocks around the returned block, and the under-written values point to these allocated blocks which are not tracked.
Therefore, an additional mechanism is provided. When allocating a block, before it is shifted for alignment, the block size is recorded. When the block is shifted for alignment, this value remains in the area used to achieve alignment ( as seen in diagrams ). This provides a way to check both alignment delta and block size for corruption at once, and is effective because these two variables both have to be uncorrupted, providing a strongly coupled check:
start of header – alignment delta = block size
If the alignment delta is under-written, then this equation will not point to the pre-shifted area, and the block size will not match the number from the offset. If block size is under-written, then start of header – alignment delta will reveal the true block size, and the mis-match will be found:
Note that alignment delta has to be at least 4 bytes in order to use this, so it’s a shortcoming of the method. However, all of these checks combined should greatly reduce the chances that an under-run would be undetected. Even if alignment delta is under-written to 0, the padding check will fail, since the alignment area hold a non-zero value (the block size).
Finally, to check the psNextBlock pointer for under-run, the manager simply needs to look if it’s set to nullptr. If it’s not, then it has been under-written.
Weaknesses of the Memory Manager and What could be done better
The memory manager passes all of the tests specified and implements all of the functionality required from it. However, if I wrote it again, or spent more time to further optimise it, I would do the following:
1. Put the front padding immediately before the Client Data area. This would reduce chances of the header being corrupt, increasing robustness of the memory manager even further.
2. In the case where the memory manager assigns left-over space to the allocated block since there is not enough to create a new free block, the extra memory is never checked for over-run. This poses a risk of an over-run of up to 12 bytes go unnoticed in this very special scenario. The current implementation does not track this extra memory, as it only occurs when the heap is full, but because of this it can’t check this memory either.
This special case could be refined by keeping a member variable to track bytes of useless memory at the end of the heap.
3. The implementation could potentially be optimised to be faster by reducing the amount of header shifts by performing accumulated, bigger shifts instead.
4. The Allocate and De-allocate functions take up to 150 lines of code each. A lot of it is comments and blank lines, and considerable effort was put into splitting up the functionality, but it would still be nice if the functions fit in a single visual studio window without scrolling.
5. The FindBestFitBlock function uses pointers to pointers. This is because it needs to overwrite the actual pointers supplied as arguments, because that’s how it returns the search results. The block search functionality was put in a function of its own to make it easier to switch to a different strategy for managing memory. However, the design could be further improved by creating an interface class such as CMemoryManagementStrategy which would provide a function such as FindFreeBlock, and specific strategies, such as Best-Fit and First-Fit could be implemented as sub-classes of this interface.
Also note: The full combined diagram of the example scenario is attached to this post.