Kodeclik Blog
What is a string?
A string is a data type used in programs to denote a sequence of characters. Strings can be used to represent names, addresses, documents, emails, and messages.
Strings are available in practically every programming language. We will use Python to illustrate strings but similar notation is available in other languages as well.
Strings can be used to denote names. For instance, “Mickey Mouse” is a string which we can store in a variable called, say, “name”. Here is some Python code for this purpose:
name = "Mickey Mouse"
print(name)
The first line declares the variable called “name” and stores “Mickey Mouse” in it. When we print this string variable, we obtain the output:
Mickey Mouse
Strings are case sensitive
Because strings are a sequence of characters, the specific sequence of characters uniquely define a string. Thus for instance, the following two variables do not denote the same string.
name1 = "Mickey Mouse"
name2 = "mickey mouse"
We can confirm that these are not the same as far as Python is concerned with the following piece of code:
name1 = "Mickey Mouse"
name2 = "mickey mouse"
print(name1 == name2)
The output is:
False
We can expand the above program to create a replica of the first string, like so:
name1 = "Mickey Mouse"
name2 = "mickey mouse"
name3 = "Mickey Mouse"
print(name1 == name2)
print(name1 == name3)
The output is, as expected:
False
True
Strings are taken literally
A string is just a collection of characters - nothing more, nothing less. For instance, the following is a string:
homework = "3+4"
print(homework)
The output is:
3+4
Note that 3+4 is a string because it is encapsulated in quotes. Strings are not evaluated in any manner. So “3+4” when printed gives, literally, “3+4”.
Strings should not be confused with integers
Consider the following program:
x = 5
y = "5"
if (x==y):
print("They are the same")
else:
print("They are different")
The program creates a variable called “x” and assigns it an integer value, namely the number 5. Then the program creates a variable called “y” and assigns it a string value, namely the string “5”. Of course, these two variables are not the same. As a result the rest of the program prints:
They are different
When should I use strings?
You should use strings to store and represent anything that should be interpreted literally without any calculation or computation. For instance, names, addresses, paragraphs of text, books, the content of a webpage, messages (like emails, tweets) can all be stored as strings. Note that a complete document comprising multiple paragraphs can be stored as a single string using special characters such as “newlines” (denoted as “\n” in many languages). For instance, below is a paragraph with embedded newlines:
letter = "\\nDear Mr. Sacks,\\n\\nHope you are doing well.\\n\\n
I am going to miss music class tomorrow as I have a field trip
and my return will be delayed.\\n\\nPlease excuse my absence.\\n\\n
Thank you,\\nTommy Student"
print(letter)
The output of this program will be:
Dear Mr. Sacks,
Hope you are doing well.
I am going to miss music class tomorrow as I have a field trip
and my return will be delayed.
Please excuse my absence.
Thank you,
Tommy Student
How are strings stored in memory?
In most programming languages, strings are stored in memory as an array of bytes, with each byte denoting a character. An encoding scheme such as Unicode or ASCII is used to represent the characters. For instance, a string such as “Kodeclik” is stored as a string of numbers, each number representing the ASCII code of one of the characters in the string. For instance, the ASCII code for “K” is 75, the ASCII code of “o” is 111, and so on. We can confirm this by using the ord() function in Python:
name = "Kodeclik"
for i in name:
print(ord(i))
The output is:
75
111
100
101
99
108
105
107
You can notice from the above output that nearby characters in the alphabet have nearby encodings. For instance, “i” is represented as 105, “k” is represented as 107 because “i” and “k” are separated by only one character (“j” represented as 106).
If you pore even deeper into memory, i.e., in your hard drive, strings are stored in consecutive bytes of memory. To denote when a string ends, a special “end of string” character is often used. This is an implementation detail that most programmers will not need to worry about.
What operations can be performed on strings?
Given a string you can extract prefixes, suffixes and specific substrings of the string. For instance, in Python a string’s contents can be denoted using integer indices (beginning at 0).
The following program:
name = "Kodeclik"
print(name[0])
print(name[2])
outputs:
K
d
because K is the first character (denoted by name[0]) and “d” is the third character (denoted by name[2]). You can extract substrings by using range operators, eg:
name = "Kodeclik"
print(name[0:4])
print(name[4:9])
This outputs:
Kode
clik
The first line, namely print(name[0:4]) prints characters from 0 to 3 (4 denotes the end point, one number higher than the index needed), i.e., “Kode”. The second line, namely print(name[4:9]) prints characters from 4 to 8 (again 9 denotes one more than the character index intended).
There is more to learn in this space but we hope we have whetted your appetite about strings and string variables. If you liked this blogpost, checkout our blogpost on integers.