#Python's strings
Strings can be considered as a list of characters, and they are a very common and feature-rich type.
We have already touched upon strings in the section Basic Syntax - Variables and Basic Types.
#Encoding and bytes
Computers store data in binary using electronic components, so characters must be mapped to binary values. This mapping is called encoding.
For example, the English letter A
corresponds to the binary value 01000001
(decimal 65) in ASCII encoding.
The smallest storage unit in a computer is a byte, which has 8 bits.
In Python, there is a type called bytes
that stores a sequence of bytes.
A bytes
literal looks similar to a string but is prefixed with a b
, such as b'hello world'
.
Converting a string to bytes is called encoding, and converting bytes back to string is called decoding:
data: bytes = b'hello world'
print(data)
text: str = data.decode() # decode
print(text)
print(text.encode()) # encode
Bytes may look like strings with a b
prefix, but there are key differences:
- Bytes elements are bytes, while string elements are characters (one character may occupy multiple bytes)
- Bytes can store non-text data, such as images
text: str = '你好世界'
print(text)
data: bytes = text.encode() # encode
print(data)
print(len(text), len(data)) # lengths differ
print(text[1], data[1]) # text[1] is a whole Chinese character '好', while data[1] is a byte of the character '你'
#Old-style string formatting
In programming, you often need to create strings based on variables.
You can use the %
operator for formatting, with the syntax:
"format string" % (value1, value2, ...) # values are in a tuple
If there is only one value, you can omit the tuple:
"format string" % value
For example:
print("Pork price is %d yuan per jin" % 15)
print("%d jin of pork costs %d yuan" % (3, 3*15))
Here %d
is a decimal integer placeholder that will be replaced by the corresponding value in decimal form. Common placeholders include:
%%
: literal percent sign%d
: decimal integer%o
: octal integer%x
: hexadecimal integer (lowercase)%X
: hexadecimal integer (uppercase)%f
: floating-point number%s
: string
This style is less common nowadays; see more details at printf-style String Formatting.
#The format() method
The format
method is more flexible than %
. It uses curly braces {}
as placeholders and supports formatting within the braces, for example:
print("Name: {}, Age: {}".format("Jerry", 18)) # positional replacement
print("Name: {1}, Age: {0}".format(19, "Tom")) # positional index
print("Name: {name}, Age: {age}".format(name="Tuffy", age=8))# named replacement
You can specify width:
# Print multiplication table
for x in range(1, 10):
for y in range(1, 10):
print(' {:2} '.format(x * y), end='') # min width 2 chars
print('')
Width can be a variable:
print("'{:{width}}'".format('txt', width=7))
You can specify alignment:
print("'{:<5}'".format('txt')) # left-align with width 5
print("'{:>5}'".format('txt')) # right-align with width 5
print("'{:^5}'".format('txt')) # center with width 5
You can use indexing for dicts:
score_list: dict[str,int] = {
'Tom': 88,
'Jerry': 99,
'Spike': 66
}
print("Scores: Tom:{0[Tom]} Jerry:{0[Jerry]} Spike:{0[Spike]}".format(score_list))
To keep n decimal places using format
function:
print("Approximate value of pi is {}".format(format(3.1415926, '.2f'))) # two decimal places
See Format String Syntax.
#Formatted string literals (f-strings)
Formatted string literals use the syntax f'xxxx'
or f"xxxx"
, with expressions inside {}
evaluated:
score_list: dict[str,int] = {
'Tom': 88,
'Jerry': 99,
'Spike': 66
}
print(f"Scores: Tom:{score_list['Tom']} Jerry:{score_list['Jerry']} Spike:{score_list['Spike']}")
#Raw strings
Raw string literals are prefixed with r'xxxx'
or r"xxxx"
, where escape sequences are not processed, so \n
is treated as two characters, \
and n
:
print(r'hello \n world')
Raw strings are useful for regular expressions or other scenarios requiring many backslashes.
Regular expressions will be covered later.
#Multiline strings
Multiline strings use triple quotes ('''
or """
), for example:
print('''
## Multiline strings
Multiline strings use triple quotes (`'''` or `"""`).
''')
Multiline strings are also commonly used as multiline comments:
'''
Not assigned to a variable and not evaluated,
so acts as a comment.
'''
print("hello world")
Multiline strings can also be combined with prefixes b
, f
, or r
for bytes, formatted strings, or raw strings.