In this post we will learn python string and many other useful concepts related to string. Like string with single quotes, double quotes or even with triple quotes, a variant of string know as raw string, python escape sequences and at the end we will discuss the Unicode with encoding and decoding concepts. Those concepts are very common in other programming languages too. without understanding these concepts no one can fully grip the python string.
Python string introduction
Python String is nothing but a sequence of characters. Where a character is a unit of information, which includes alphabets, numbers and other special symbols on a keyboard.
A python string can have as many characters as you want to insert in it and an empty string is a string with 0 number of characters.
"Python string is immutable" it means a string cannot be updated or modified, instead everytime when you need a string, you have to create a new object of string. This attribute makes it different from other datatypes of python collection.
Representation of python String
A Python String literal can be represented by either double or single quotes or In other words delimited by either single or double quotes. Which is not the case for many other popular languages like c language, java language or php language. It means in python both of the below strings are valid.
>>>print('Sample string 1') Sample string 1 >>>print("Sample string 2") Sample string 2
Single quotes vs Double quotes
In simple words there is no difference between strings with single quotes or double quotes, other then just a trick. Now let me explain the trick. i.e
If you want to use a single quote in between your string, you can use double quotes as a string delimiter or if you want to use a double quote in between your string, you can use single quotes as a string delimiter. here is the code
>>>print("Hi I'm John, who are you?") >>>Hi I'm John, who are you? >>>print('I have a "classical bike"') >>>'I have a "classical bike"
In the first print statement, we have to use a single quote (‘) within the text, as you can see the word (I’m). If this string is delimited by also a single quote, python interpreter will see this single quote as being the end or closing of the string, which will terminate the string there and for the original end of string it will give us the syntax error. so to avoid this situation it is delimited by double quotes. And a very opposite or vice versa case is in the second print statement.
But what to do, if a text or string contains both single and double quote?. Python has multiple solutions for this problem. One is very python specific and the other one is very common in other languages too. Now we will discuss those solutions one by one.
As we are discussing a problem which we may face, if we don’t use the string carefully. Python gave us another solution to this problem, by another representation of python string i.e a triple quote can also be used to represent a string literal and it is generally used to represent multiline strings as python docstring.
>>> >>>print("""Hi I'm john, I am a "Software Engineer".""") Hi I'm john, I am a "Software Engineer" >>>
To use the triple quoted string, the presence of single or double quotes is not the only reason but the string with triple quotes has another very strong reason, i.e if you want to create a text with multiple lines, triple quoted string is very easy option to pick.
>>> >>>multiline = """one man ...two men ...three men""" >>>print(multiline) one two three >>>
Line breaks in the literal remain as newline characters in the resulting string object. While seeing the previous example you may analyse that the indentation of first line “one man” is very different than the next two lines. To make it more readable we can add a line break before writing first line. It will become:
>>> >>>multiline = """ ...one man ...two men ...three men""" >>>print(multiline) one two three >>>
you can see after adding a line break before first line of text, line break is also affecting the output, by adding a line break there too. To avoid this line with remaining the code readable we can add a special character backslash ‘\’ immediately after the first triple quotes.
>>> >>>multiline = """\ ...one man ...two men ...three men""" >>>print(multiline) one two three >>>
In case of simple string representation which we discussed above, If you want to write a string which contains one or multiple special characters like newline ‘\n’ or tab ‘\t’ you have to use backslash (\) character because otherwise they have special meanings.
Escape character is a character which invokes the special behavior of subsequent characters in a character sequence and that character sequence has a name "Escape sequence". That means an escape character invokes the special meaning of subsequent characters.
When we prepend a backslash to any special character(s) they makes a escape sequence.
Python Raw string
There is another variant of string literal and that is “Raw string”. Its representation is very similar to string literal with single quote or double quote. The only difference in representation is, it precedes a character ‘r’ or ‘R’.
A raw string is also a string, but it ignores all escape sequences in a given string. In other words our escape character, backslash(\) will not change the meaning of any special characters.
>>>print(r"Here I am \nWhere are you?") Here I am \nWhere are you?
Most probably after reading the definition of python raw string a question will arise in your mind, i.e “Why we need raw string or why we ignore the special meaning of escape sequence”? Here is the answer;
Sometimes, we have a text in which we need backslash as a character but not as a escape character. e.g
>>>Print("D:\\Accounts\nathon-arson") D:\Users athon-arson >>>Print(R"D:\Users\nathon-arson") D:\Users\nathon-arson
Even python provide us another way around to use backslash as a character by adding another backslash before backslash, that means escape the special meaning of our escape character i.e
You can use any of the above solution according to your need.
Encoding vs Decoding
As we know that a computer cannot store any character directly, it can store only a binary number of it. i.e any character whether it is an alphabet or a number or any of other symbols is represented by a binary number. This conversion of characters to a binary numbers is known as Encoding. And the reverse process is known as Decoding.
so the correct definition of a string is
A string is nothing but a sequence of unicode characters.
- Python string replace
- Python string split
- Python string find
- python string substring
- python string contains
- python string concatenation
- python string append
- python string slice