Python String Data Type
In the previous article, we have discussed about the Python variables including string variables. String is a Python built-in data type which holds a sequence of characters, you will need to use it whenever you need to do any text processing. In this article, I will be sharing with you the various operations you can perform with the Python string data type.
Python string data type
In python, you can define a string variable with single quote, double quotes or triple quotes. And use type() function to verify the data type of your variable. E.g.:
text1 = 'hello \n world!' text2 = "bac;def,what$ is" text3 = """this is also fine""" print(type(text1), text1) print(type(text2), text2) print(type(text3), text3)
You should be able to see the below output, and the data type is showing as “str”.
<class 'str'> hello world! <class 'str'> bac;def,what$ is <class 'str'> this is also fine
As per the definition for Python string data type, it is a sequence of characters, which means you can access each of the character with the index. (index starts from 0 for the first element)
print(text1, text2, text3)
h a i
And you can use slice operation to get a sub set of your string variable:
#get a sub string starting from index 0 and ending at index 5 (exclusive) print(text1[0:5]) #get a sub string starting from index 5 and ending at index 7 (exclusive) print(text3[5:7]) #get a sub string starting from default index 0 and ending at index 4 (exclusive) print(text3[:4]) #get a sub string starting from index 5 and ending at the end of the string print(text3[5:])
hello is this is also fine
You can also specify the negative index value to slice the string starting from right to left:
There is actually a third option – slice step you can use, which you can specify a non-zero integer, e.g:
Since we are able to get each individual character from a string, you may wonder if we can re-assign something else to a particular position in the string. e.g.:
text4 = 'T' #TypeError: 'str' object does not support item assignment
The error shows up because string is immutable and you cannot change anything in it’s original content unless you create a new string:
new_text4 = "T" + text4[1:]
+ and *
And you may noticed different strings can be concatenated by using the “+” in the above example. There is also more operator * can be used in the string.
print(text4 + text3*2)
This will duplicate text3 twice and concatenate them into a single string:
abcdefgthis is also finethis is also fine
Formatting Python string data type
Below are some of the string formatting functions, it’s quite self-explanatory by the function name:
print("lower:", text4.lower()) #same as lower() print("casefold:", text4.casefold()) print("upper:", text4.upper()) print("title:", text4.title()) #same as title print("capitalize:", text4.capitalize()) print("swapcase:", text4.swapcase()) print("center:", text4.center(40, "*")) print("ljust:", text4.ljust(40)) print("rjust:", text4.rjust(40, "*")) print("zfill:", text4.zfill(40)) print("strip:", text4.strip("a")) print("replace:", text4.replace("a", "A"))
Below is the output:
lower: abcdefg casefold: abcdefg upper: ABCDEFG title: Abcdefg capitalize: Abcdefg swapcase: ABCDEFG center: ****************abcdefg***************** ljust: abcdefg rjust: *********************************abcdefg zfill: 000000000000000000000000000000000abcdefg strip: bcdefg replace: Abcdefg
And also there are functions you can use for checking the string format:
print("isalnum:",text4.isalnum()) print("isalpha:",text4.isalpha()) print("isdecimal:",text4.isdecimal()) print("isdigit:",text4.isdigit()) print("isnumeric:",text4.isnumeric()) print("isidentifier:",text4.isidentifier()) print("islower:",text4.islower()) print("istitle:",text4.istitle()) print("isupper:",text4.isupper()) print("isspace:",text4.isspace()) print("isprintable:",text4.isprintable())
Output will be something similar to below:
isalnum: True isalpha: True isdecimal: False isdigit: False isnumeric: False isidentifier: True islower: True istitle: False isupper: False isspace: False isprintable: True
You can use relational operators such as ==, >, < to compare the two strings. Python will try to compare letter by letter, and all the uppercase letters come before lowercase, hence you will need to convert your texts into a standard format e.g. all upper or lower case, in order to get the comparison result in alphabetical order.
To check if the string starts/ends with any characters, you can use the startswith and endswith function:
if text3.startswith("this"): print("yes, it starts with 'this'") if text3.endswith("fine"): print("yes, it ends with 'fine'")
There is no function called contains (sometime people get confused since Java string has this contains method), but you can use the below function – in, find, index or rindex to check if the string has any sub string:
if "this" in text3: print("'this' is in text3") else: print("not found") if text3.find("this") > -1: print("found 'this' from tex3") else: print("not found") if text3.find("this",1, 20) > -1: print("found 'this' from tex3") else: print("'this' is not found from text3, starting from index 1 to 20 ") if text3.index("this") >-1: print("found 'this' from tex3, index >=0") else: print("not found") #ValueError: substring not found #idx = text3.index("this",1, 20)
Both find and index function return the index value of the sub string, the difference between of two function is that, index function will raise ValueError when the sub string is not found, while find will just return -1.
Split & Join texts
A lot times you may need to split the text by certain delimiter, e.g. newlines (\n), ; space etc. You can use the split function to the text into a list. If the delimiter is not found, the split function will return the original text as in a list.
print("split by default deliminator:", text3.split()) print("split by s", text3.split('s')) print("split by ;", text3.split(';'))
The output will be:
split by default deliminator: ['this', 'is', 'also', 'fine'] split by s ['thi', ' i', ' al', 'o fine'] split by ; ['this is also fine']
On the other hand, if you have a list of string, you would like to join them into one string, you can do the following:
print("join the words with ';':", ';'.join(text3.split())) print("join the words without space:", ''.join(text3.split()))
And below is the output:
join the words with ';': this;is;also;fine join the words without space: thisisalsofine
The count function can be used for calculating the occurrence of a sub string from the original string, for instance :
print(text3*5) print("'is' occurence:',(text3*5).count("is"))
Result will be :
this is also finethis is also finethis is also finethis is also finethis is also fine 'is' occurence:10
With all the above examples provided, we have covered most of the commonly used functions for Python string data type. You may also check through the Python official document to see if there is any additional functions you are interested to know for the Python strings data type.