Memory leakage is a term that is used to describe a bug that gradually uses all the memory within a system until such point that a request to use or access memory that should succeed, fails. The term leakage is analogous to a leaking bucket where the contents gradually disappear. The contents within an embedded system are memory. This is often seen as a symptom to buffer problems where data is either read or written using locations outside the buffer.
The common symptoms are stack frame errors caused by the stack overflowing its allocated memory space and malloc() or similar calls to get memory failing. There are several common programming faults that cause this problem.
Stack frame errors
It is common within real-time systems, especially those with nested exceptions, to use the exception handler to clean up the stack before returning to the previous executing software thread or to a generic handler. The exception context information is typically stored on the stack either automatically or as part of the initial exception routine. If the exception is caused by an error, then there is probably little need to return execution to the point where the error occurred. The stack, however, contains a frame with all this return information and therefore the frames need to be removed by adjusting the stack pointer accordingly. It is normally this adjustment where the memory leakage occurs.
• Adjusting the stack for the wrong size frame. If the adjust-ment is too large, then other stack frames can be corrupted. If it is too small, then at best some stack memory can be lost and at worst the previous frame can be corrupted.
• Adjusting the stack pointer by the wrong value, e.g. using the number of words in the frame instead of the number of bytes.
• Setting the stack pointer to the wrong address so that it is on an odd byte boundary, for example.
Failure to return memory to the memory pool
This is a common cause of bus and memory errors. It is caused by tasks requesting memory and then not releasing it when their need for it is over. It is good practice to ensure that when a routine uses malloc() to request memory that it also uses unmalloc() to return it and make it available for reuse. If a system has been designed with this in mind, then there are two potential scenarios that can occur that will result in a memory problem. The first is that the memory is not returned and therefore subsequent malloc() requests cannot be serviced when they should be. The second is similar but may only occur in certain circumstances. Both are nearly always caused by failure to return memory when it is finished, but the error may not occur until far later in time. It may be the same task asking for memory or another that causes the problem to arise. As a result, it can be difficult to detect which task did not return the memory and is responsible for the problem.
In some cases where the task may return the memory at many different exit points within its code — this could be deemed as bad programming practice and it would be better to use a single exit sub-routine for example — it is often a programming omission at one of these points that stops the memory recycling.
It is difficult to identify when and where memory is allo-cated unless some form of record is obtained. With memory management systems, this can be derived from the descriptor tables and address translation cache entries and so on. These can be difficult to retrieve and decode and so a simple transaction record of all malloc() and unmalloc() calls along with a time stamp can prove invaluable. This code can be enabled for debugging if needed by passing a DEBUG flag to the pre-processor. Only if the flag is true will it compile the code.
• Access rights not given
This is where a buffer is shared between tasks and only one has the right access permission. The pointer may be passed correctly by using a mailbox or message but any access would result in a protection fault or if mapped incorrectly in accessing the wrong memory location.
• Pointer corruption
It is very easy to get pointers mixed up and to use or update the wrong one and thus corrupt the buffer.
• Timing problems with high and low water marks
Water marks are used to provide early warning and should be adjusted to cope with worst case timings. If not, then despite their presence, it is possible to get data overrun and underrun errors.
Wrong memory specification
This can be a very difficult problem to track down as it is can be very temporal in nature. It may only happen in certain situa-tions which can be hard to reproduce. The problem is caused by a programming error essentially where it is assumed that any type of success message that malloc() or similar function returns actu-ally means that all the memory requested is available. The soft-ware then starts to use what it thinks it has been allocated only to find that has not with disastrous results.
This situation can occur when porting software from one system to another where the malloc() call is used but has different behaviour and return messages. In one system it may return an error if the complete memory specification can be met while in another it will return the number of bytes that are allocated which may be less than the total requested. In one situation, an error message is returned and in another a partial success is reported back. These are very different and can cause immense problems.
Other errors can occur with non-linear addressing proces-sors which may have restrictions on the pointer types and address-ing range that is supported that is not there with a linear address-ing architecture. This can be very common with 80x86 architec-tures and can cause problems or require a large redesign of the software.