On a previous posts we discussed Runes and Variables. Today, let's continue our study of Go's basic types by learning more about Strings in Go. Since Strings in Go as not as obvious as in your favorite programming language, we recommend to explore this article at your own pace.
Declaring Strings in Go
You probably know how to declare variables in Go. Declaring strings is as simple as:
var s2 string
A string value can be written as a string literal, a sequence of bytes enclosed in double quotes. Strings in Go can also contain UTF characters:
We can also treat Strings as arrays to access parts of it. For example:
fmt.Println(s[:]) // "Gopher"
fmt.Println(s[:]) // "Hello Gopher"
Concatenating Strings
Comparing Strings
Strings may be compared with
operators
like ==
and
<.
And since the comparison is done byte
by byte, the result is a sweet natural lexicographic ordering:
name2 := "smith"
fmt.Println(name1 == name2) // false
fmt.Println(name1 > name2) // false
Substrings
Go also allows easy access to access parts of your string. For example:
String Length
If you thought that Go's built-in len function returns the length of a string, you're incorrect. As per the official documentation, len over strings returns the number of bytes in the string (not the number of characters). So if your variable contained any UTF-8 character, it would fail. For example:
To solve the above problem, we should resort to the package encoding/utf8:
fmt.Println(utf8.RuneCountInString(s)) // yes! now we have an 8!
Loops over Strings
As per the above, loops over strings should use range instead of len. Example:
Immutability
Another important concept of Strings in Go is that they are immutable. By that, it means that once assigned, the byte sequence contained in a string value cannot be changed:
But, as expected, a string can be reassigned another value:
Escape Sequences
Within a double-quoted string literal, escape sequences that begin with a backslash (\) can be used to insert arbitrary byte values into the string. The most common are:
- \a - “alert” or bell
- \b - backspace
- \f - form feed
- \n - newline
- \r - carriage return
- \t - tab
- \v - vertical tab
- \' - single quote (only in the rune literal '\'')
- \" - double quote (only within "..." literals)
- \\ - backslash
Runes, ASCII, Unicode and UTF
And since we're talking Go Strings, Runes, ASCII and Unicode, let's review a little about these topics.
ASCII
ASCII (American Standard Code for Information Interchange) is a character encoding standard created in the 60's and still widely used. ASCII's only supports 128 characters such as un-accented letters, numbers and a few other characters.
Unicode
Due to ASCII's limitations, Unicode was created as a superset of it. Today it defines over 140k characters (but capable of more than a million code points), more than sufficient to handle most of the characters and symbols present in the world. The Unicode standard defines Unicode Transformation Formats (UTF) UTF-8, UTF-16, and UTF-32, and several other encodings.
UTF-8
Today, UTF-8 is the most common encoding on the internet. UTF-8 was invented by Ken Thompson and Rob Pike, two of the creators of Go. It uses between 1 and 4 bytes to represent each rune but only one byte for ASCII characters, and 2 or 3 bytes for runes. The first 128 Unicode code points represent the ASCII characters, which means that any ASCII text is also a UTF-8 text.
Unicode Standard Notation
Unicode has the standard notation for codepoint, starting with U+, followed by its codepoint in hexadecimal. For example, U+1F600 represents the Unicode character 😀. To get the Unicode value in Go, use the %U verb.
Printing Runes
Runes are usually printed with the following verbs:
- %c: to print the character
- %q: to print the character within quotes
- %U: to print the value of the character in Unicode notation (U+<value>)
For example:
Other formats can also be used, including:
- %b: base 2
- %o: base 8
- %d: base 10
- %x: base 16, with lower-case letters for a-f
Raw String Literals
A raw string literal is written using backticks (`). Within raw string literals, no escape sequences are processed; the contents are taken literally. For example:
"name": "john"
}`
fmt.Println(s)
Prints:
"name": "john"
}
Standard Library Support
Conclusion
On this post we learned a little more about Strings in Go. Since manipulating Strings is an essential part of a programmer's life, understanding their particularities is important to master the Go programming language.
To summarize, here are some important particularities that you should know:
- strings in Go are immutable sequence of bytes
- strings in Go can contain human-readable or any data including bytes
- text strings in Go are conventionally interpreted as UTF-8-encoded sequences of Unicode code points (runes)
- as Go files (which are always encoded in UTF-8) Go text strings are conventionally interpreted as UTF-8 and can include Unicode code points in string literals
- strings in Go accept either ASCII characters as well as Unicode code points
- a rune whose value is less than 256 can be written with a single hexadecimal escape (e.g., '\x41' for 'A') but \u or \U escape must be used for higher values