Click to search the site Click to log in
Online articles
Download free tools
Support pages, per product
Services
Frequently asked questions, per product
Memory management options in Win32
Author: George Mihaescu
Published: June 1, 2005
Category: Insight / Win32
Notes:
Description: This article describes the various memory management APIs offered by Win32, describes each one and offers guidelines for choosing the appropriate one for your purpose.
View count: 5,560
Comments: 3 Read comments or post your own

  Print viewOpens in new window
 Memory management options in Win32

Memory management options in Win32

 

By George Mihaescu

 

Summary: this article presents the different options available to Win32 developer for memory management.

Overview of the memory management APIs in Win32

As a Win32 application developer, the Win32 APIs offers you a number of options for managing memory. The diagram below displays them all, and the rest of the article covers briefly each one.

 

 

1.    Global and Local memory API: this API (consisting of GlobalAlloc / GlobalFree and LocalAlloc / LocalFree family of functions) is only provided for backward compatibility with Win16. Both local and global allocations using this API are internally mapped in the Win32 implementation to heap allocation functions. You should not be using this API to write a new Win32 application (I only listed this API here for completeness).

2.    The Windows heaps API: this API allows an application access to heaps. A Windows heap should not be confused with the C++ heap, as the two terms, although related (I'll explain below), are semantically different. A Windows heap is an object managed by the Win32 subsystem that represents one or more regions (that are not necessarily contiguous) in a process address space. When a Win32 process is created, the Win32 subsystem creates an initial heap for it (called the process default heap) – all allocations done using the backward compatibility Global / Local API (described above) will be done here, but in a slower and with more limitations than when using the heap allocation functions directly. Allocations done using the COM IMalloc allocator (or the CoTaskMemAlloc convenience wrapper function) are also done in this process default heap.
You can create additional heap(s) for your process (using the HeapCreate function), called private heap(s) and you will be responsible for managing them (allocating / freeing memory in the heap, releasing the heap, etc).

3.    Virtual Memory API: this API offers the lowest level of memory management available to a Win32 application. It allows the application to manipulate memory pages in the process address space. This API works on a two step allocation: you first reserve an area of memory in the process space (which does not actually use any of the available memory) then commit memory from the reserved area as you need memory (which does the actual allocation of memory). Conversely, de-commit an area that must be freed when you don't need the memory anymore (which releases the physical memory) and finally, release the memory that was reserved.

4.    Memory Mapped File API: this API allows an application to map physical (disk) files (or portions of them) into its address space, or to reserve an area of memory (even if it is not backed up by a disk file). The typical use of this API is to share memory between one or more processes. The access to the shared memory (be it backed up by a disk file or not) from multiple processes, the access must be synchronized (using synchronization kernel objects, such as semaphores, mutexes or events).

 

If your application uses the C Run Time Library (CRT) you also have the CRT memory management API available (malloc, new, etc). Note that the internal implementation of the CRT memory management functions relies internally on the Win32 heaps API – and in some cases on the Virtual Memory API. Upon initialization, the CRT creates its own Windows heap (i.e. a private heap of the process that uses the library) in which the allocations using new / malloc will take place – but not always. Versions 6 and below of the CRT have a special mechanism for addressing the problem of heap fragmentation (a phenomenon in which applications that frequently allocate and de-allocate small blocks of memory can produce heaps that have many small non-contiguous free blocks, similar to disk fragmentation – such that the allocations requests start to be quite slow, or may even fail because there isn't one contiguous free block available to satisfy the allocation request). Those versions of the CRT deal with this problem by managing internally a "small block memory area" in which allocations below a certain threshold are done (through Virtual Memory API), while allocations of larger size go, as expected, in the Windows heap that the CRT has created when loaded in the process. Versions above 6 of the CRT don't have this mechanism anymore by default, but it can be enabled using the _set_sbh_threshold function they offer ("set small block heap threshold"). My article Analyzing the heaps of a Win32 process describes in detail how the CRT manages its memory and what you can do to improve its memory fragmentation handling.

More details

Following all this, a natural question would be: "which one of those APIs should I use?" The answer to this depends on what you are trying to achieve. The table below lists a number of common scenarios and suggests which Win32 memory API is best used for that purpose.

 

1.    I want to implement my own memory management mechanism, to allow me to allocate / free blocks in a manner that is more efficient for my application.

2.    I want to share memory between processes.

3.    I want an easy to manage buffer of memory that can grow potentially very large.

4.    I want to optimize the memory allocations produced by the C/C++ memory management functions.

 

 

I want to implement my own memory management mechanism, to allow me to allocate / free blocks in a manner that is more efficient for my application

 

·         Your fist choice would be the Virtual Memory API: first call VirtualAlloc to reserve an address range of a given size in the memory space of your process. Your memory management implementation would then commit areas of this reserved space when there are memory allocation requests (using again VirtualAlloc, but with a different flag), and de-commit them when a piece of memory that it previously allocated is not needed anymore, using VirtualFree. Finally, when the range of addresses originally reserved is not needed anymore, use the VirtualFree again to release the memory.
This is not very complex, but does require a good level of experience with memory management techniques (among other reasons, because of the potential for memory fragmentation) – so it's not something I would recommend for beginners. It does offer, however, the best memory (de)allocation performance available to applications.

·         Your second choice would be the Heaps API. This API is more straightforward, so can be tackled easier by beginners, and has the advantage that on certain platforms, the fragmentation issue is addressed. On XP and above and on Windows 2003 Server (or 2000 Server with a hotfix), the Win32 heaps have a flag called LFH (Low Fragmentation Heap) that allows the Windows heap management code to organize the heap allocations such that it reduces the fragmentation.
Start by creating your private heap using HeapCreate. Consider using HeapSetInformation with the LFH flag to take advantage of anti-fragmentation algorithms implemented on the platforms specified above (depending on how you do your allocations and de-allocations, this may or may not be needed). Then use HeapAlloc to allocate and HeapFree to free blocks within this heap. Finally, use HeapDestroy to get rid of the heap when it is not needed anymore.
The Heaps API is slower than the Virtual Memory API, but it is substantially easier to implement a private memory manager on top of it.

·         The Memory Mapped File API is not a good candidate for this purpose, as it has more overhead (lower performance) than the other APIs and does not offer any additional benefit.

I want to share memory between processes


To share memory between any number of processes you should be using the Memory Mapped File API.

Any of the processes involved can start by creating the file mapping using CreateFileMapping and pass it the INVALID_HANDLE_VALUE as the handle to the file to map. This means that you are not mapping a disk file in memory, but rather the mapping comes from the system swap file. Pass a name to the function as the last parameter – this would be the name that other processes must use to map the same memory to their process space. Then call MapViewOfFile on the handle returned by CreateFileMapping (committing the memory) to get a pointer to the mapped memory created. You can use this pointer to write to it sequentially; as you write to it, physical memory would be used as needed, with no effort on your part – this is a very important feature of this API because it hides the complexity of allocating memory and allows you to write to the mapped memory as if it's a potentially very large buffer, but one that only takes up the RAM for the size of data that has been written to the buffer.

Any other process would call OpenFileMapping using the same name passed when the mapped memory was created, then use MapViewOfFile to get a pointer to that memory.

All processes involved will have to call UnmapViewOfFile using the pointer they received from MapViewOfFile, then call CloseHandle to finally close the mapped memory object obtained through Create/OpenFileMapping.

Having multiple processes access the shared memory obviously raises issues of concurrency, so you will have to implement your own synchronization mechanism. What I usually do is create a DLL that wraps the access to the shared memory and takes care of the synchronization, then use the API exposed by this DLL in all the processes involved.

I want an easy to manage buffer of memory that can grow potentially very large

 

If you need a substantially large buffer (e.g. 100 Meg) out of which you may use only a portion, but you want the buffer to take only the memory you are currently using, again the Memory Mapped file API is the answer. The key here is the fact that a view of mapped memory is backed by the paging file, and only the portion that is actually used takes up physical memory.

Just as above, start by creating the file mapping using CreateFileMapping and pass it the INVALID_HANDLE_VALUE as the handle to the file to map. This means that you are not mapping a disk file in memory, but rather the mapping comes from the system swap file. Pass NULL for the name of the mapped object, as you will not need to share it with other processes. Then call MapViewOfFile on the handle returned by CreateFileMapping (committing the memory) to get a pointer to the mapped memory created (pass 0 for the view size to get a view the size of the entire mapping). You can use this pointer to write to it sequentially; as you write to it, physical memory would be used as needed, with no effort on your part – this is a very important feature of this API because it hides the complexity of allocating memory and allows you to write to the mapped memory as if it's a potentially very large buffer, but one that only takes up the RAM for the size of data that has been written to the buffer.

This effectively creates a buffer that can keep growing (to the size specified when the mapping was created), but that takes only the physical RAM that is currently being used.

When done with this buffer, call UnmapViewOfFile using the pointer you received from MapViewOfFile, then call CloseHandle to finally close the mapped memory object obtained through CreateFileMapping.

This is a very simple and effective technique to create a buffer that can be potentially very large, but that does not take physical RAM for more than it actually needs.
 

I want to optimize the performance of memory allocation functions in the C/C++ library


Depending on the version of the Microsoft CRT you are using, you may be able to improve the performance of the malloc / new families of allocators.

For long-running processes that have random patterns of allocation of varying bock sizes, memory fragmentation of the CRT heap may start to become a performance problem. You can improve substantially this issue by:

·         Enabling the LFH (Low Fragmentation Heap) management offered by the Win32 implementation on certain platforms: XP and above, Windows Server 2003, Windows 2000 with a specific hotfix. To enable this feature, call the CRT _get_heap_handle (new in CRT 7.0, #include <malloc.h>). This returns you the handle of the Win32 heap used by the CRT for memory allocations through malloc / new. Then use HeapSetInformation to mark the heap as LFH. This is where you need to pay attention, because the API call  HeapSetInformation will not be available platforms earlier than those listed above – so don’t call it directly: rather call LoadLibrary ("Kernel32.dll") then GetProcAddress ("HeapSetInformation") to get the function pointer to HeapSetInformation and call it – if you get a NULL the pointer back from the GetProcAddress call, you'll need to fall back on the following method:

·         Enabling the fragmentation protection in the CRT itself: since version 4.0, the CRT had a built-in mechanism against heap fragmentation. This mechanism is based on doing allocations of blocks over a certain size in the regular heap, as expected, but doing the allocations of blocks below that size in a memory area allocated and managed by the CRT (using the Virtual Memory API that I've described below), called the "small block heap". More recent versions of the CRT don't have this mechanism enabled by default, but you can check whether that's enabled by calling the CRT function _get_sbh_threshold ("get small block heap threshold"). If the return is 0, it means that the fragmentation protection in CRT is not on – to turn it on, call _set_sbh_threshold with the size in bytes that you want to be considered the threshold for a "small block". In CRT 4.0 this size was 420 bytes – you may want to start with that and experiment to see what works best for the allocation patterns of your application.

 

 


Reader comments:
Name: (optional)
Verification text:    
(type as in image next to it)
Comment: max 2,000 characters; for security reasons no active content / no HTML formatting is supported.
Please stick to the subject of the article; comments are reviewed and unrelated / inappropriate ones will be deleted.

On Jun 1, 2008 at 7:27 EST Barna said:

Hi, very good article ! Could you please say something about \\.\PHYSICALDRIVEx opening as a memory mapped file? For me CreateFile() returns by a valid handle, but CreateFileMapping can't. Last error is ERROR_BAD_EXE_FORMAT aka STATUS_INVALID_FILE_FOR_SECTION. Any idea? Thx, Barna.

On Apr 30, 2007 at 19:03 EST George said:

Klein, sorry, I'm actually re-compiling the sample code and changing the project to VS2005 (used to be on 2003) and that's why the reference to the sample is missing from the article - expect it (source and all) in a couple of days, depending on how busy I am.

On Apr 30, 2007 at 18:51 EST klein m. said:

Hey, thanks for updating the article but it used to have a sample with source code that I could play with and pick the virtual memory walking code. Where's the sample now? And don't forget the source code..
Copyright 2308 registered users, 23 users online now