The fundamental types discussed so far are numeric types. C# includes some additional types as well: bool, char, and string.
Another C# primitive is a Boolean or conditional type, bool, which represents true or false in conditional statements and expressions. Allowable values are the keywords true and false. The BCL name for bool is System.Boolean. For example, to compare two strings in a case-insensitive manner, you call the string.Compare() method and pass a bool literal true (see Listing 2.10).
In this case, you make a case-insensitive comparison of the contents of the variable option with the literal text /Help and assign the result to comparison.
Although theoretically a single bit could hold the value of a Boolean, the size of bool is 1 byte.
A char type represents 16-bit characters whose set of possible values are drawn from the Unicode character set’s UTF-16 encoding. A char is the same size as a 16-bit unsigned integer (ushort), which represents values between 0 and 65,535. However, char is a unique type in C# and code should treat it as such.
The BCL name for char is System.Char.
Unicode is an international standard for representing characters found in most human languages. It provides computer systems with functionality for building localized applications—that is, applications that display the appropriate language and culture characteristics for different cultures.
Unfortunately, not all Unicode characters can be represented by just one 16-bit char. The original Unicode designers believed that 16 bits would be enough, but as more languages were supported, it was realized that this assumption was incorrect. As a result, some (rarely used) Unicode characters are composed of “surrogate pairs” of two char values.
To construct a literal char, place the character within single quotes, as in 'A'. Allowable characters comprise the full range of ANSI keyboard characters, including letters, numbers, and special symbols.
Some characters cannot be placed directly into the source code and instead require special handling. These characters are prefixed with a backslash (\) followed by a special character code. In combination, the backslash and special character code constitute an escape sequence. For example, \n represents a newline and \t represents a tab. Since a backslash indicates the beginning of an escape sequence, it can no longer identify a simple backslash; instead, you need to use \\ to represent a single backslash character.
Listing 2.11 writes out one single quote because the character represented by \' corresponds to a single quote.
In addition to showing the escape sequences, Table 2.5 includes the Unicode representation of characters.
Escape Sequence |
Character Name |
Unicode Encoding |
\' |
Single quote |
\u0027 |
\" |
Double quote |
\u0022 |
\\ |
Backslash |
\u005C |
\0 |
Null character |
\u0000 |
\a |
Alert (system beep) |
\u0007 |
\b |
Backspace |
\u0008 |
\f |
Form feed |
\u000C |
\n |
Line feed (often referred to as a newline) |
\u000A |
\r |
Carriage return |
\u000D |
\t |
Horizontal tab |
\u0009 |
\v |
Vertical tab |
\u000B |
\uxxxx |
Unicode character in hexadecimal |
\u0029 |
\x[n][n][n]n |
Unicode character in hex (first three placeholders are optional); variable-length version of \uxxxx |
\u3A |
\U00xxxxxx, |
Unicode escape sequence for creating surrogate pairs (max supported sequence is \U0010FFFF) |
\U00020100
|
\uxxxx\uxxxx |
Unicode escape sequence for creating surrogate pairs beyond \U0010FFFF |
\uD83D\uDE00
|
You can represent any character using Unicode encoding. To do so, prefix the Unicode value with \u. You represent Unicode characters in hexadecimal notation. The letter A, for example, is the hexadecimal value 0x41. Listing 2.12 uses Unicode characters to display a smiley face (:)), and Output 2.8 shows the results.
A finite sequence of zero or more characters is called a string. The C# keyword is string, whose BCL name is System.String. The string type includes some special characteristics that may be unexpected to developers familiar with other programming languages. In addition to the string literal format discussed in Chapter 1, strings include a “verbatim string” prefix character of @, support for string interpolation with the $ prefix character, and the potentially surprising fact that strings are immutable. In C# 11, raw string literals were also added to the language.
You can enter a literal string into code by placing the text in double quotes ("), as you saw in the HelloWorld program. Strings are composed of characters, and consequently, character escape sequences can be embedded within a string.
In Listing 2.13, for example, two lines of text are displayed. However, instead of using Console.WriteLine(), the code listing shows Console.Write() with the newline character, \n. Output 2.9 shows the results.
The escape sequence for double quotes differentiates the printed double quotes from the double quotes that define the beginning and end of the string.
In C#, you can use the @ symbol in front of a string to signify that a backslash should not be interpreted as the beginning of an escape sequence. The resultant verbatim string literal does not reinterpret just the backslash character. Whitespace is also taken verbatim when using the @ string syntax. The triangle in Listing 2.14, for example, appears in the console exactly as typed, including the backslashes, newlines, and indentation. Output 2.10 shows the results.
Without the @ character, this code would not even compile. In fact, even if you changed the shape to a square, eliminating the backslashes, the code still would not compile because a newline cannot be placed directly within a string literal that is not prefaced with the @ symbol.
The only escape sequence the verbatim string does support is "", which signifies a single set of double quotes and does not terminate the string.
Unlike C++, C# does not automatically concatenate literal strings. You cannot, for example, specify a string literal as follows:
"Major Strasser has been shot."
"Round up the usual suspects."
Rather, concatenation requires the use of the addition operator (+). (If the compiler can calculate the result at compile time, however, the resultant CIL code will be a single string.)
If the same literal string appears within an assembly multiple times, the compiler will define the string only once within the assembly and all variables will refer to the same string. That way, if the same string literal containing thousands of characters was placed multiple times into the code, the resultant assembly would reflect the size of only one of them.
Earlier in the chapter, we mentioned that characters in .NET are all Unicode. Specifically, they are UTF-16 and requiring two bytes per character (or symbol). Therefore, the strings that are comprised of characters are all Unicode (UTF-16) as well.
Occasionally, especially working with externally defined formats, it will be necessary to work with UTF-8 stings. C# 11 allows specification of UTF-8 string literals with a suffix following a string’s closing quote—u8. However, the data type is not a .NET string (System.String). Rather the data type is a generic ReadOnlySpan<byte>. An example of such a string declaration and assignment, therefore, would be:
ReadOnlySpan<byte> rates = "€ rates"u8;
As discussed in Chapter 1, strings can support embedded expressions when using the string interpolation format.4 The string interpolation syntax prefixes a string literal with a dollar symbol and then embeds the expressions within curly brackets. Listing 2.15 is an example in which firstName and lastName are simple expressions that refer to variables.
Starting in C# 11, you can place new lines between the curly braces (Listing 2.16). In fact, any valid C# expression (where a single value is returned) is allowed. Prior to that, new lines were only support in verbatim interpolated strings.
Note that verbatim string literals can be combined with string interpolation by specifying the $ prior to the @ symbol (or @$"..." starting in C# 8.0), as in Listing 2.17:
Since this is a verbatim string literal, the text is output on two lines. You can, however, make a similar line break in the code without incurring a line break in the output by placing the line feeds inside the curly braces in a verbatim string (even prior to C# 11). The problem with verbatim strings, however, is that the whitespace on all lines outside of the expression is significant. Therefore, you have to outdent the code to avoid extraneous indention within the code from appearing in the output. To resolve this problem formatting idiosyncrasy, however, C# 11 introduced raw string literals.
Similar to string literals, raw string literals were added in C# 11. Like string literals, raw string literals enable embedding any arbitrary text, including whitespace, newlines, additional quotes, and other special characters, without requiring escape sequences. However, there are some important differences. The simpler form is a single line raw string literal, such as in Listing 2.18.
The distinguishing characteristic, of course, is that the both the begin and end sets of quotes appear on the same line.
Notice that we don’t need to escape the quotes embedded within the string. However, there is a space after the closing quotes of the chocolates quotation because we can’t end the string with four double quotes when it begins with only three. The ending quote count must match the begin quote count, and if they don’t match, the compiler issues an error. (The solution to avoiding the extra space is provided by multiline raw string literals later in the section.)
The reason that raw string literals allow for more than three double quotes to delineate the raw string literal is to allow placement of three (or more) consecutive double quotes within the text. With five double quotes at the start and end of the raw string literal, for example, you can place four consecutive double quotes within the text and have them be interpreted literally as four consecutive double quotes.
You also can include interpolation as in Listing 2.19 with raw string literals.
However, an additional complexity is that the curly brace count into which you place the expression must match the number of dollar symbols ($) you place at the beginning of the string. For example, two dollar symbols would require each expression to be surrounded by two curly braces. Listing 2.20 provides an example such that curly braces surround mobius in Output 2.11. (Two dollar symbols require two curly braces to render the name expression, resulting in a curly brace pair in the output because there are three curly brace pairs used in the code.)
Additionally, Listing 2.20 demonstrates a multiline version of raw string literals (multiline raw string literals).
Note that even though the code is indented in Listing 2.20, the output is not indented. All whitespace to the left of the closing triple quotes—for all lines of the raw string literal—is removed. The start of the closing triple quotes, therefore, establishes the character column that will be leftmost in the output. For this reason, there can be nothing but whitespace to the left of the first quote of the closing triple quotes column or else the complier will emit a compile warning.
Also, with multiline raw string literals, there can be no characters following the initial set of double quotes or text prior to the closing set of double quotes. Anything other than white space following the $$""" opening sequence or prior to the closing """ will result in a compile error.
By using a multiline raw string literal, you can avoid the problematic final quote within your text as shown in the assignment of mamaSaid variable in Listing 2.21.
The text for the jsonDialogue variable in Listing 2.21 is, in fact, a special text format called JSON. The text is a collection of hierarchical name value pairs that is compatible with JavaScript. Initially it was only prevalent when programming for the web but the format is now a common means (at a minimum) for transferring data. Support for raw string literals is especially important for JSON because it allows direct placement of JSON strings into C# programs without complex and error-prone escape sequences.
The backslashes within the raw string literal assigned to jsonDialogue have no C# significance. Each string, both the name and the value within JSON, is surrounded by quotes. Therefore, JSON text within the value is escaped with a backslash. However, because the text within the entire string value is a raw string literal in which backslashes have no special significance, backslashes are just normal characters within the represented JSON string. This is a huge advantage to using raw string literals for placing JSON within C#. (Representing regular expressions within C# is another format where raw string literals prove extremely helpful.)
In addition, there is the invocation of the Replace() method embedded in the JSON. A list of the common string methods appears in the next section.
String interpolation is shorthand for invoking the string.Format() method. For example, a statement such as
Console.WriteLine(
$"Your full name is {firstName} {lastName}.");
will be transformed into the C# equivalent of
object[] args = new object[] { firstName, lastName };
Console.WriteLine(string.Format("Your full name is {0} {1}.", args));
This leaves in place support for localization in the same way it works with composite strings and doesn’t introduce any post-compile injection of code via strings.
The string type, like the Console type, includes several methods. There are methods, for example, for formatting, concatenating, and comparing strings.
The Format() method in Table 2.6 behaves similarly to the Console.Write() and Console.WriteLine() methods, except that instead of displaying the result in the console window, string.Format() returns the result to the caller. Of course, with string interpolation, the need for string.Format() is significantly reduced (except for localization support). Under the covers, however, string interpolation compiles down a CIL combination of constants (C# 11) plus string.Concat() and string.Format() invocations whenever the string combination is not made of string literals.
All of the methods in Table 2.6 are static. This means that, to call the method, it is necessary to prefix the method name (e.g., Concat) with the type that contains the method (e.g., string). As illustrated later in this chapter, however, some of the methods in the string class are instance methods. Instead of prefixing the method with the type, instance methods use the variable name (or some other reference to an instance). Table 2.7 shows a few of these methods, along with examples of their use.
Statement |
Example |
static string string.Format( string format, ...) |
string text, firstName, lastName; //... text = string.Format("Your full name is {0} {1}.", firstName, lastName); // Display // "Your full name is <firstName> <lastName>." Console.WriteLine(text); |
static string string.Concat( string str0, string str1) |
string text, firstName, lastName; //... text = string.Concat(firstName, lastName); // Display "<firstName><lastName>", notice // that there is no space between names Console.WriteLine(text); |
static int string.Compare( string str0, string str1) |
// 1. string option; //... // String comparison in which case matters int result = string.Compare(option, "/help");
// Display: // 0 if equal // negative if option < /help // positive if option > /help Console.WriteLine(result); // 2. string option; //... // Case-insensitive string comparison int result = string.Compare( option, "/Help", true);
// Display: // 0 if equal // < 0 if option < /help // > 0 if option > /help Console.WriteLine(result); |
Statement |
Example |
bool StartsWith( string value) bool EndsWith( string value) |
string lastName //... bool isPhd = lastName.EndsWith("Ph.D."); bool isDr = lastName.StartsWith("Dr."); |
string ToLower() string ToUpper() |
string severity = "warning"; // Display the severity in uppercase Console.WriteLine(severity.ToUpper()); |
string Trim() string Trim(...) string TrimEnd() string TrimStart() |
// 1. // Remove any whitespace from // both the start and end username = username.Trim(); // 2. string text = "indiscriminate bulletin"; // Remove 'i' and 'n' from both the start or end text = text.Trim("in".ToCharArray()); // Display: discriminate bullet Console.WriteLine(text); |
string Replace( string oldValue, string newValue) |
string filename; //... // Remove ?'s from anywhere in the string filename = filename.Replace("?", "");; |
Whether you use string.Format() or string interpolation to construct complex formatting strings, a rich and complex set of composite formatting patterns is available to display numbers, dates, times, time spans, and so on. For example, if price is a variable of type decimal, then string.Format("{0,20:C2}", price) and the equivalent interpolation $"{price,20:C2}" both convert the decimal value to a string using the default currency formatting rules, rounded to two figures after the decimal place and right-justified in a 20-character-wide string. Space does not permit a detailed discussion of all the possible formatting strings; consult the MSDN documentation on composite formatting for a complete listing of possibilities.
If you want an actual left or right curly brace inside an interpolated string or formatted string, you can double the brace to indicate that it is not introducing a pattern. For example, the interpolated string $"{{ {price:C2} }}" might produce the string "{ $1,234.56 }".
When writing out a newline, the exact characters for the newline depend on the operating system on which you are executing. On Microsoft Windows operating systems, the newline is the combination of both the carriage return (\r) and newline (\n) characters, while a single line feed is used on UNIX. One way to overcome the discrepancy between operating systems is simply to use Console.WriteLine() to output a blank line. Another approach, which is almost essential for working with newlines from the same code base on multiple operating systems, is to use System.Environment.NewLine. In other words, Console.WriteLine("Hello World") and Console.Write($"Hello World{Environment.NewLine}") are equivalent. However, on Windows, WriteLine() and Console.Write(Environment.NewLine) are equivalent to Console.Write("\r\n")—not Console.Write("\n"). In summary, rely on WriteLine() and Environment.NewLine rather than \n to accommodate Windows-specific operating system idiosyncrasies with the same code that runs on Linux and macOS.
The Length member referred to in the following section is not actually a method, as indicated by the fact that there are no parentheses following its call. Length is a property of string, and C# syntax allows access to a property as though it were a member variable (known in C# as a field). In other words, a property has the behavior of special methods called setters and getters, but the syntax for accessing that behavior is that of a field.
Examining the underlying CIL implementation of a property reveals that it compiles into two methods: set_<PropertyName> and get_<PropertyName>. Neither of these, however, is directly accessible from C# code, except through the C# property constructs. See Chapter 6 for more details on properties.
To determine the length of a string, you use a string member called Length. This particular member is called a read-only property. As such, it cannot be set, nor does calling it require any parameters. Listing 2.22 demonstrates how to use the Length property, and Output 2.12 shows the results.
The length for a string cannot be set directly; it is calculated from the number of characters in the string. Furthermore, the length of a string cannot change because a string is immutable.
A key characteristic of the string type is that it is immutable. A string variable can be assigned an entirely new value, but there is no facility for modifying the contents of a string. It is not possible, therefore, to convert a string to all uppercase letters. It is trivial to create a new string that is composed of an uppercase version of the old string, but the old string is not modified in the process. Consider Listing 2.23 as an example.
Output 2.13 shows the results of Listing 2.23.
At a glance, it would appear that text.ToUpper() should convert the characters within text to uppercase. However, strings are immutable and, therefore, text.ToUpper() will make no such modification. Instead, text.ToUpper() returns a new string that needs to be saved into a variable or passed to Console.WriteLine() directly. The corrected code is shown in Listing 2.24, and its output is shown in Output 2.14.
If the immutability of a string is ignored, mistakes like those shown in Listing 2.17 can occur with other string methods as well.
To actually change the value of text, assign the value from ToUpper() back into text, as in the following code:
text = text.ToUpper();
If considerable string modification is needed, such as when constructing a long string in multiple steps, you should use the data type System.Text.StringBuilder rather than string. The StringBuilder type includes methods such as Append(), AppendFormat(), Insert(), Remove(), and Replace(), some of which are also available with string. The key difference, however, is that with StringBuilder these methods will modify the data in the StringBuilder itself and will not simply return a new string.
Two additional keywords relating to types are null and void. The null value, identified with the null keyword, indicates that the variable does not refer to any valid object. void is used to indicate the absence of a type or the absence of any value altogether.
null can also be used as a type of “literal.” The value null indicates that a variable is set to nothing. Code that sets a variable to null explicitly assigns a “nothing” value. In fact, it is even possible to check whether a variable refers to a null value.
Assigning the value null is not equivalent to not assigning it at all. In other words, a variable that has been assigned null has still been set, whereas a variable with no assignment has not been set and, therefore, will often cause a compile error if used prior to assignment.
Note that assigning the value null to a string variable is distinctly different from assigning an empty string, "". Use of null indicates that the variable has no value, whereas "" indicates that there is a value—an empty string. This type of distinction can be quite useful. For example, the programming logic could interpret a string variable named homePhone of null to mean that the home phone number is unknown, while a homePhone value of "" could indicate that there is no home phone number.
Listing 2.25 demonstrates assigning null to an integer variable by adding nullable modifier—a question mark (?)—to the type declaration.
Prior to the nullable modifier,6 there was no way to assign null to a variable of any of the types we have introduced so far (value types) except string (which is a reference type). (See Chapter 3 for more information on value types and reference types.)
Furthermore, prior to C# 8.0, reference types (like string) supported null assignment by default, and as such, you couldn’t decorate a reference type with a nullable modifier. Since null could implicitly be assigned anyway, decorating it with a nullable modifier was redundant.
Prior to C# 8.0, since all reference types were nullable by default, there was no concept of a nullable reference type—all reference types just were nullable. In C# 8.0, however, the behavior became configurable such that reference types could be declared as nullable with the nullable modifier or default to not null otherwise. In so doing, C# 8.0 introduced the concept of nullable reference types. Reference type variables with the nullable modifier are nullable reference types. And, when nullable reference types are enabled, a warning will appear when assigning null to a variable of reference type without the nullable modifier.
The only reference type covered so far in this book is string. If configured to support the nullable modifier on reference types, you could, for example, declare a string variable such as
string? homeNumber = null;
To support nullability, be sure that the Nullable element in the .csproj file is set to enable.
Sometimes the C# syntax requires a data type to be specified, but no data is passed. For example, if no return from a method is needed, C# allows you to specify void as the data type instead. The declaration of Main within the HelloWorld program (Listing 1.1) is an example. The use of void as the return type indicates that the method is not returning any data and tells the compiler not to expect a value. void is not a data type per se but rather an indication that there is no data being returned.
In both C++ and C#, void has two meanings: as a marker that a method does not return any data and to represent a pointer to a storage location of unknown type. In C++ programs, it is quite common to see pointer types such as void**. C# can also represent pointers to storage locations of unknown type using the same syntax, but this usage is comparatively rare in C# and typically encountered only when writing programs that interoperate with unmanaged code libraries.
________________________________________