Threads Chap 13

Chapter 13. Using threads in conjunction with the BDE, Exceptions and DLLs.

In this chapter:

DLL's and Multiprocess programming.
Thread and process scope. A single threaded DLL.
Writing a multithreaded DLL.
DLL Set-up and Tear down.
Pitfall 1: The Delphi encapsulation of the Entry Point Function.
Writing a multiprocess DLL.
Global named objects.
The DLL in detail.
DLL Initialization.
An application using the DLL.
Pitfall 2: Thread context in Entry Point Functions
Exception Handling.
The BDE.

DLL's and Multiprocess programming.

Dynamic link libraries, or DLL's allow a programmer to share executable code between several processes. They are commonly used to provide shared library code. for several programs. Writing code for DLL's is in most respects similar to writing code for executables. Despite this, the shared nature of DLL's means that programmers familiar with multithreading often use them to provide system wide services: that is code which affects several processes that have the DLL loaded. In this chapter, we will look at how to write code for DLL's that operates across more than one process.

Thread and process scope. A single threaded DLL.

Global variables in DLL's have process wide scope. This means that if two separate processes have a DLL loaded, all the global variables in the DLL are local to that process. This is not limited to variables in the users code: it also includes all global variables in the Borland run time libraries, and any units used by code in the DLL. This has the advantage that novice DLL programmers can treat DLL programming in the same way as executable programming: if a DLL contains a global variable, then each process has its own copy. Furthermore, this also means that if a DLL is invoked by a processes which contain only one thread, then no special techniques are required: the DLL need not be thread safe, since all the processes have completely isolated incarnations of the DLL.

Writing a multithreaded DLL. Writing a multithreaded DLL is mostly the same as writing multithreaded code in an application. The behaviour of multiple threads inside the DLL is the same as the behaviour of multiple threads in a particular application. As always, there are a couple of pitfalls for the unwary:

The main pitfall one can fall into is the behaviour of the Delphi memory manager. By default, the Delphi memory manager is not thread safe. This is for efficiency reasons: if a program only ever contains one thread, then it is pure wasted overhead to include synchronization in the memory manager. The Delphi memory manager can be made thread safe by setting the IsMultiThread variable to true. This is done automatically for a given module if a descendant class of TThread is created.

The problem is that an executable and the DLL consist of two separate modules, each with their own copy of the Delphi memory manager. Thus, if an executable creates several threads, its memory manager is multithreaded. However, if those two threads call a DLL loaded by the executable, the DLL memory manager is not aware of the fact that it is being called by multiple threads. This can be solved by setting the IsMultiThread variable. It is best to set this by using the DLL entry point function, covered later.

The second pitfall occurs as a result of the same problem; that of having two separate memory managers. Memory allocated by the Delphi memory manager that is passed from the DLL to the executable cannot be allocated in one and disposed of in the other. This occurs most often with long strings, but can occur with memory allocated using New or GetMem, and disposed using Dispose or FreeMem. The solution in this case is to include ShareMem, a unit which keeps the two memory managers in step using techniques discussed later.

DLL Set-up and Tear down.

Mindful of the fact that DLL programmers often need to be aware of how many threads and processes are active in a DLL at any given time, the Win32 system architects provide a method for DLL programmers to keep track of thread and process counts in a DLL. This method is known as the DLL Entry Point Function.

In an executable, the entry point (as specified in the module header) indicates where program execution should start. In a DLL, it points to a function that is executed whenever an executable loads or unloads the DLL, or whenever an executable that is currently using the DLL creates or destroys a thread. The function takes a single integer argument which can be one of the following values:

DLL_PROCESS_ATTACH: A process has attached itself to the DLL. If this is the first process, then the DLL has just been loaded.
DLL_PROCESS_DETACH: A process has detached from the DLL. If this is the only process using the DLL, then the DLL will be unloaded.
DLL_THREAD_ATTACH: A thread in the has attached to the DLL. This will happen once when the process loads the DLL, and subsequently whenever the process creates a new thread.
DLL_THREAD_DETACH: A thread has detached from the DLL. This will happen whenever the process destroys a thread, and finally when the process unloads the DLL.

As it turns out, DLL entry points have two characteristics which can lead to misunderstandings and problems when writing entry point code. The first characteristic occurs as a result of the Delphi encapsulation of the entry point function, and is relatively simple to work around. The second occurs as a result of thread context, and will be discussed later on.

Pitfall 1: The Delphi encapsulation of the Entry Point Function.

Delphi uses the DLL entry point function to manage initialization and finalization of units within a DLL as well as execution of the main body of DLL code. The DLL writer can put a hook into the Delphi handling by assigning an appropriate function to the variable DLLProc. The default Delphi handling works as follows:

The DLL is loaded, which results in the entry point function being called with DLL_PROCESS_ATTACH
Delphi uses this to call the initialization of all the units in the DLL, followed by the main body of the DLL code.
The DLL is unloaded, resulting in two calls to the entry point function, with the arguments DLL_PROCESS_DETACH.

Now, the application writer only gets code to execute in response to the entry point function when the DLLProc variable points to a function. The correct point to set this up is in the main body of the DLL. However, this is in response to the second call to the entry point function. In short, what this means is that when using the entry point function in the DLL, the delphi programmer will never see the first process attachment to the DLL. As it turns out, this isn't such a huge problem: one can simply assume that the main body of the DLL is called in response to a process loading the DLL, and hence the process and thread count is 1 at that point. Since the DLLProc variable is replicated on a per process basis, even if more processes attach themselves later, the same argument applies, since each incarnation of the DLL has separate global variables.

In case the reader is still confused, I'll present an example. Here is a modified DLL that contains a unit with a function that displays a message. As you can see, the main body, unit initialization and DLL entry point hooks all contain "ShowMessage" calls which enable one to trace what is going on. In order to test this DLL, here is a test application. It consists of a form with a button on. When the button is clicked, a thread is created, which calls the procedure in the DLL, and then destroys itself. So, what happens when we run the program?

The DLL reports units initialization
The DLL reports main DLL body execution
Every time the button is clicked the DLL reports:
- Entry point: thread attach
- Unit procedure.
- Entry point: thread detach
Note that if we spawn more than one thread from the application, whilst leaving existing threads blocked on the Unit Procedure message box, the total thread attachment count can increase beyond one.
When the program is closed, the DLL reports entry point: process detach, followed by unit finalization.

Writing a multiprocess DLL.

Armed with a knowledge of how to use the entry point function, we will now write a multiprocess DLL. This DLL will store some information on a system wide basis using memory shared between processes. It is worth remembering that when code accesses data shared between processes, the programmer must provide appropriate synchronization. Just as multiple threads in a single process are not inherently synchronized, so the main threads in different processes are also not synchronized. We will also look at some subtleties which occur when trying to use the entry point function to keep track of global threads.

This DLL will share a single integer between processes, as well as keeping a count of the number of processes and threads in the DLL at any one time. It consists of a header file shared between the DLL and applications that use the DLL, and the DLL project file. Before we look more closely at the code, it's worth reviewing some Win32 behaviour.

Global named objects.

The Win32 API allows the programmer to create various objects. For some of these objects, they may be created either anonymously, or with a certain name. Objects created anonymously are, on the whole, limited to use by a single process, the exception being that they may be inherited by child processes. Objects created with a name can be shared between processes. Typically, one process will create the object, specifying a name for that object, and other processes will open a handle to that object by specifying its name.

The delightful thing about named objects is that handles to these objects are reference counted throughout the system. That is, several processes can acquire handles to an object, and when all the handles to that object are closed, the object itself is destroyed, and not before. This includes the situation where an application crashes: typically windows does a good job of cleaning up unused handles after a crash.

The DLL in detail.

Our DLL uses this property to maintain a memory mapped file. Normally, memory mapped files are used to create an area of memory which is a mirror image of a file on disk. This has many useful applications, not least "on demand" paging in of executable images from disk. For this DLL however, a special case is used whereby a memory mapped file is created with no corresponding disk image. This allows the programmer to allocate a section of memory which is shared between several processes. This is surprisingly efficient: once the mapping is set up, no memory copying is done between processes. Once the memory mapped file has been set up, a global named mutex is used to synchronize access to that portion of memory.

DLL Initialization.

Initialization consists of four main stages:

Creation of synchronization objects (global and otherwise).
Creation of shared data.
Initial increment of thread and process counts.
Hooking the DLL entry point function.

In the first stage, two synchronization objects are created, a global mutex, and a critical section. Little needs to be said about the critical section. The global mutex is created via the CreateMutex API call. This call has the beneficial feature that if the mutex is named, and the named object already exists, then a handle to the existing named object is returned. This occurs atomically. Were this not the case, then a whole range of unpleasant race conditions could potentially occur. Determining the precise range of possible problems and potential solutions (mainly involving optimistic concurrency control) is left as an exercise to the reader. Suffice to say that if operations on handles to global shared objects were not atomic, the application level Win32 programmer would be staring into an abyss...

In the second stage the area of shared memory is set up. Since we have already set up the global mutex, it is used when setting up the file mapping. A view of the "file" is mapped, which maps the (virtual) file into the address space of the calling process. We also check whether we happened to be the process that originally created the file mapping, and if this is the case, then we zero out the data in our mapped view. This is why the procedure is wrapped in a mutex: CreateFileMapping has the same nice atomicity properties as CreateMutex, ensuring that race conditions on handles will never occur. In the general case, however, the same is not necessarily true for the data in the mapping. If the mapping had a backing file, then we might be able to assume validity of the shared data at start-up. For virtual mappings this is not assured. In this case we need to initialize the data in the mapping atomically with setting up a handle to the mapping, hence the mutex.

In the third stage, we perform our first manipulation on the globally shared data, by incrementing the process and thread counts, since the execution of the main body of the DLL is consistent with the addition of another thread and process to those using the DLL. Note that the AtomicIncThreadCount procedure increments both the local and global threads counts whilst both the global mutex and process local critical section have been acquired. This ensures that multiple threads from the same process see a fully consistent view of both counts.

In the final stage, the DLLProc is hooked, thus ensuring that the creation and destruction of other threads in the process is monitored, and the final exit of the process is also registered.

An application using the DLL.

A simple application that uses the DLL is presented here. It consists of the global shared unit, a unit containing the main form, and a subsidiary unit containing a simple thread. Five buttons exist on the form, allowing the user to read the data contained in the DLL, increment, decrement and set the shared integer, and create one or more threads within the application, just to verify that local thread counts work. As expected, the thread counts increment whenever a new copy of the application is executed, or one of the applications creates a thread. Note that the thread need not directly use the DLL in order for the DLL to be informed of its presence.

Pitfall 2: Thread context in Entry Point Functions.

Instead of using a simple application, let's try one that does something more advanced. In this situation, the DLL is loaded manually by the application programmer, instead of being automatically loaded. This is possible by replacing the previous form unit with this one. An extra button is added which loads the DLL, and sets up the procedure addressed manually. Try running the program, creating several threads and then loading the DLL. You should find that the DLL no longer correctly keeps track of the number of threads in the various processes that use it. Why is this? The Win32 help file states that when using the entry point function with the arguments DLL_THREAD_ATTACH and DLL_THREAD_DETACH:

"DLL_THREAD_ATTACH Indicates that the current process is creating a new thread. When this occurs, the system calls the entry-point function of all DLLs currently attached to the process. The call is made in the context of the new thread. DLLs can use this opportunity to initialize a TLS slot for the thread. A thread calling the DLL entry-point function with the DLL_PROCESS_ATTACH value does not call the DLL entry-point function with the DLL_THREAD_ATTACH value.

Note that a DLL's entry-point function is called with this value only by threads created after the DLL is attached to the process. When a DLL is attached by LoadLibrary, existing threads do not call the entry-point function of the newly loaded DLL."

It drives the point home by also stating:

"DLL_THREAD_DETACH Indicates that a thread is exiting cleanly. If the DLL has stored a pointer to allocated memory in a TLS slot, it uses this opportunity to free the memory. The operating system calls the entry-point function of all currently loaded DLLs with this value. The call is made in the context of the exiting thread. There are cases in which the entry-point function is called for a terminating thread even if the DLL never attached to the thread.

The thread was the initial thread in the process, so the system called the entry-point function with the DLL_PROCESS_ATTACH value.
The thread was already running when a call to the LoadLibrary function was made, so the system never called the entry-point function for it"

This behaviour has two potentially unpleasant side effects.

It is not possible, in the general case to keep track of how many threads are in the DLL on a global basis unless one can guarantee that an application loads the DLL before creating any child threads. One might mistakenly assume that an application loading a DLL would have the DLL_THREAD_ATTACH entry point called for already existing threads. This is not the case because, having guaranteed that thread attachments and detachments are notified to the DLL in the context of the thread attaching or detaching, it is impossible to call the DLL entry point in the correct context of threads that are already running.
Since the DLL entry point can be called by several different threads, race conditions can occur between the entry point function and DLL initialization. If a thread is created at about the same time as the DLL is loaded by an application, then it is possible that the DLL entry point might be called for the thread attachment whilst the thread main body is still being executed. This is why it is always a good idea to set up the entry point function as the very last action in DLL initialization.

Readers would benefit from noting that both these side effects have repercussions when deciding when to set the IsMultiThread variable.

Exception Handling.

When writing robust applications, the programmer should always be prepared for things to go wrong. The same is true for multithreaded programming. Most of the examples presented in this tutorial have been relatively simple, and exception handling has mostly been omitted for clarity. In real world applications, this is likely to be unacceptable.

Recall that threads have their own call stack. This means that an exception in a thread does not fall through the standard VCL exception handling mechanisms. Instead of raising a user-friendly dialog box, and an unhandled exception in a thread will terminate the application. As a result of this, the execute method of a thread is one of the few places where it can be useful to create an exception handler that catches all exceptions. Once an exception has been caught in a thread, dealing with it is also slightly different from ordinary VCL handling. It may not always be appropriate to show a dialog box. Quite often, a valid tactic is to let the thread communicate the fact that a failure has occurred to the main VCL thread, using whatever communication mechanisms are in place, and then let the VCL thread decide what to do. This is particularly useful if the VCL thread has created the child thread to perform a particular operation.

Despite this, there are some situations in threads where dealing with error cases can be particularly difficult. Most of these situations occur when using threads to perform continuous background operations. Recalling chapter 10, the BAB has a couple of threads that forward read and write operations from the VCL thread to a blocking buffer. If an error occurs in either of these threads, the error may show no clear causal relationship with any particular operation in the VCL thread, and it may be difficult to communicate failure instantly back to the VCL thread. Not only this, but an exception in either of these threads is likely to break them out of the read or write loop that they are in, raising the difficult question of whether these threads can be usefully restarted. About the best that can be done is to set some state indicating that all future operations should be failed, forcing the main thread to destroy and re-initialize the buffer.

The best solution is to include the possibility of such problems into the original application design, and to determine best effort recovery attempts that may be made.

The BDE.

In Chapter 7, I indicated that one potential solution to locking problems is to put shared data in a database, and use the BDE to perform concurrency control. The programmer should note that each thread must maintain a separate database connection for this to work properly. Hence, each thread should use a separate TSession object to manage its connection to the database. Each application has a TSessionList component called Sessions to enable this to be done easily. Detailed explanation of multiple sessions is beyond the scope of this document.