On occasion, developers may want to access and work with memory, and with pointers to memory locations, directly. This is necessary, for example, for certain operating system interactions as well as with certain types of time-critical algorithms. To support this capability, C# requires use of the unsafe code construct.
One of C#’s great features is the fact that it is strongly typed and supports type checking throughout the runtime execution. What makes this feature especially beneficial is that it is possible to circumvent this support and manipulate memory and addresses directly. You would do so when working with memory-mapped devices, for example, or if you wanted to implement time-critical algorithms. The key is to designate a portion of the code as unsafe.
Unsafe code is an explicit code block and compilation option, as shown in Listing 23.10. The unsafe modifier has no effect on the generated CIL code itself, but rather is a directive to the compiler to permit pointer and address manipulation within the unsafe block. Furthermore, unsafe does not imply unmanaged.
You can use unsafe as a modifier to the type or to specific members within the type.
In addition, C# allows unsafe as a statement that flags a code block to allow unsafe code (see Listing 23.11).
Code within the unsafe block can include unsafe constructs such as pointers.
When you write unsafe code, your code becomes vulnerable to the possibility of buffer overflows and similar outcomes that may potentially expose security holes. For this reason, it is necessary to explicitly notify the compiler that unsafe code occurs. To accomplish this, set AllowUnsafeBlocks to true in your CSPROJ file, as shown in Listing 23.12. Alternatively, you can pass the property on the command line when running dotnet build (see Output 23.1). Or, if invoking the C# compiler directly, you need the /unsafe switch (see Output 23.2).
With Visual Studio, you can activate this feature by checking the Allow Unsafe Code checkbox from the Build tab of the Project Properties window.
The /unsafe switch enables you to directly manipulate memory and execute instructions that are unmanaged. Requiring /unsafe, therefore, makes explicit any exposure to potential security vulnerabilities that such code might introduce. With great power comes great responsibility.
Now that you have marked a code block as unsafe, it is time to look at how to write unsafe code. First, unsafe code allows the declaration of a pointer. Consider the following example:
byte* pData;
Assuming pData is not null, its value points to a location that contains one or more sequential bytes; the value of pData represents the memory address of the bytes. The type specified before the * is the referent type—that is, the type located where the value of the pointer refers. In this example, pData is the pointer and byte is the referent type, as shown in Figure 23.1.
Because pointers are simply integers that happen to refer to a memory address, they are not subject to garbage collection. C# does not allow referent types other than unmanaged types, which are types that are not reference types, are not generics, and do not contain reference types. Therefore, the following command is not valid:
string* pMessage;
Likewise, this command is not valid:
ServiceStatus* pStatus;
where ServiceStatus is defined as shown in Listing 23.13. The problem, once again, is that ServiceStatus includes a string field.
In C/C++, multiple pointers within the same declaration are declared as follows:
int *p1, *p2;
Notice the * on p2; this makes p2 an int* rather than an int. In contrast, C# always places the * with the data type:
int* p1, p2;
The result is two variables of type int*. The syntax matches that of declaring multiple arrays in a single statement:
int[] array1, array2;
Pointers are an entirely new category of type. Unlike structs, enums, and classes, pointers don’t ultimately derive from System.Object and are not even convertible to System.Object. Instead, they are convertible (explicitly) to System.IntPtr (which can be converted to System.Object).
In addition to custom structs that contain only unmanaged types, valid referent types include enums, predefined value types (sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, and bool), and pointer types (such as byte**). Lastly, valid syntax includes void* pointers, which represent pointers to an unknown type.
Once code defines a pointer, it needs to assign a value before accessing it. Just like reference types, pointers can hold the value null, which is their default value. The value stored by the pointer is the address of a location. Therefore, to assign the pointer, you must first retrieve the address of the data.
You could explicitly cast an int or a long into a pointer, but this rarely occurs without a means of determining the address of a particular data value at execution time. Instead, you need to use the address operator (&) to retrieve the address of the value type:
byte* pData = &bytes[0]; // Compile error
The problem is that in a managed environment, data can move, thereby invalidating the address. The resulting error message will be “You can only take the address of [an] unfixed expression inside a fixed statement initializer.” In this case, the byte referenced appears within an array, and an array is a reference type (a movable type). Reference types appear on the heap and are subject to garbage collection or relocation. A similar problem occurs when referring to a value type field on a movable type:
int* a = &"message".Length;
Either way, assigning an address of some data requires that the following criteria are met:
If the data is an unmanaged variable type but is not fixed, use the fixed statement to fix a movable variable.
To retrieve the address of a movable data item, it is necessary to fix, or pin, the data, as demonstrated in Listing 23.14.
Within the code block of a fixed statement, the assigned data will not move. In this example, bytes will remain at the same address, at least until the end of the fixed statement.
The fixed statement requires the declaration of the pointer variable within its scope. This avoids accessing the variable outside the fixed statement, when the data is no longer fixed. However, as a programmer, you are responsible for ensuring that you do not assign the pointer to another variable that survives beyond the scope of the fixed statement—possibly in an API call, for example. Unsafe code is called “unsafe” for a reason; you must ensure that you use the pointers safely, rather than relying on the runtime to enforce safety on your behalf. Similarly, using ref or out parameters will be problematic for data that will not survive beyond the method call.
Since a string is an invalid referent type, it would appear to be invalid to define pointers to strings. However, as in C++, internally a string is a pointer to the first character of an array of characters, and it is possible to declare pointers to characters using char*. Therefore, C# allows for declaring a pointer of type char* and assigning it to a string within a fixed statement. The fixed statement prevents the movement of the string during the life of the pointer. Similarly, it allows any movable type that supports an implicit conversion to a pointer of another type, given a fixed statement.
You can replace the verbose assignment of &bytes[0] with the abbreviated bytes, as shown in Listing 23.15.
Depending on the frequency and time needed for their execution, fixed statements may have the potential to cause fragmentation in the heap because the garbage collector cannot compact fixed objects. To reduce this problem, the best practice is to pin blocks early in the execution and to pin fewer large blocks rather than many small blocks. Unfortunately, this preference must be tempered with the practice of pinning as little as possible for as short a time as possible, so as to minimize the chance that a collection will happen during the time that the data is pinned. To some extent, .NET 2.0 reduces this problem through its inclusion of some additional fragmentation-aware code.
Potentially you might need to fix an object in place in one method body and have it remain fixed until another method is called; this is not possible with the fixed statement. If you are in this unfortunate situation, you can use methods on the GCHandle object to fix an object in place indefinitely. You should do so only if it is absolutely necessary, however; fixing an object for a long time makes it highly likely that the garbage collector will be unable to efficiently compact memory.
You should use the fixed statement on an array to prevent the garbage collector from moving the data. However, an alternative is to allocate the array on the call stack. Stack allocated data is not subject to garbage collection or to the finalizer patterns that accompany it. Like referent types, the requirement is that the stackalloc data is an array of unmanaged types. For example, instead of allocating an array of bytes on the heap, you can place it onto the call stack, as shown in Listing 23.16.
Because the data type is an array of unmanaged types, the runtime can allocate a fixed buffer size for the array and then restore that buffer once the pointer goes out of scope. Specifically, it allocates sizeof(T) * E, where E is the array size and T is the referent type. Given the requirement of using stackalloc only on an array of unmanaged types, the runtime restores the buffer back to the system by simply unwinding the stack, thereby eliminating the complexities of iterating over the f-reachable queue (see the “Garbage Collection” section and discussion of finalization in Chapter 10) and compacting reachable data. Thus, there is no way to explicitly free stackalloc data.
The stack is a precious resource. Although it is small, running out of stack space will have a big effect—namely, the program will crash. For this reason, you should make every effort to avoid running out stack space. If a program does run out of stack space, the best thing that can happen is for the program to shut down/crash immediately. Generally, programs have less than 1MB of stack space (and possibly a lot less). Therefore, take great care to avoid allocating arbitrarily sized buffers on the stack.
Accessing the data stored in a variable of a type referred to by a pointer requires that you dereference the pointer, placing the indirection operator prior to the expression. For example, byte data = *pData; dereferences the location of the byte referred to by pData and produces a variable of type byte. The variable provides read/write access to the single byte at that location.
Using this principle in unsafe code allows the unorthodox behavior of modifying the “immutable” string, as shown in Listing 23.17 with Output 23.3. In no way is this strategy recommended, even though it does expose the potential of low-level memory manipulation.
In this case, you take the original address and increment it by the size of the referent type (sizeof(char)), using the pre-increment operator. Next, you dereference the address using the indirection operator and then assign the location with a different character. Similarly, using the + and – operators on a pointer changes the address by the * sizeof(T) operand, where T is the referent type.
The comparison operators (==, !=, <, >, <=, and >=) also work to compare pointers. Thus, their use effectively translates to a comparison of address location values.
One restriction on the dereferencing operator is the inability to dereference a void*. The void* data type represents a pointer to an unknown type. Since the data type is unknown, it can’t be dereferenced to produce a variable. Instead, to access the data referenced by a void*, you must convert it to another pointer type and then dereference the latter type.
You can achieve the same behavior as implemented in Listing 23.17 by using the index operator rather than the indirection operator (see Listing 23.18 with Output 23.4).
Modifications such as those in Listing 23.17 and Listing 23.18 can lead to unexpected behavior. For example, if you reassigned text to "S5280ft" following the Console.WriteLine() statement and then redisplayed text, the output would still be Smile because the address of two equal string literals is optimized to one string literal referenced by both variables. In spite of the apparent assignment
text = "S5280ft";
after the unsafe code in Listing 23.17, the internals of the string assignment are an address assignment of the modified "S5280ft" location, so text is never set to the intended value.
Dereferencing a pointer produces a variable of the pointer’s underlying type. You can then access the members of the underlying type using the member access dot operator in the usual way. However, the rules of operator precedence require that *x.y means *(x.y), which is probably not what you intended. If x is a pointer, the correct code is (*x).y, which is an unpleasant syntax. To make it easier to access members of a dereferenced pointer, C# provides a special member access operator: x->y is a shorthand for (*x).y, as shown in Listing 23.19 with Output 23.5.