python string data type

Python String Data Type

In the previous article, we have discussed about the Python variables including string variables. String is a Python built-in data type which holds a sequence of characters, you will need to use it whenever you need to do any text processing. In this article, I will be sharing with you the various operations you can perform with the Python string data type.

Python string data type

In python, you can define a string variable with single quote, double quotes or triple quotes. And use type() function to verify the data type of your variable. E.g.:

text1 = 'hello \n world!'
text2 = "bac;def,what$ is"
text3 = """this is also fine"""
print(type(text1), text1)
print(type(text2), text2)
print(type(text3), text3)

You should be able to see the below output, and the data type is showing as “str”.

<class 'str'> hello 
 world!
<class 'str'> bac;def,what$ is
<class 'str'> this is also fine
Slice Operation

As per the definition for Python string data type, it is a sequence of characters, which means you can access each of the character with the index. (index starts from 0 for the first element)

print(text1[0], text2[1], text3[2])
h a i

And you can use slice operation to get a sub set of your string variable:

#get a sub string starting from index 0 and ending at index 5 (exclusive)
print(text1[0:5])
#get a sub string starting from index 5 and ending at index 7 (exclusive)
print(text3[5:7])
#get a sub string starting from default index 0 and ending at index 4 (exclusive)
print(text3[:4])
#get a sub string starting from index 5 and ending at the end of the string
print(text3[5:])
hello
is
this
is also fine

You can also specify the negative index value to slice the string starting from right to left:

print(text1[-1])
print(text3[-3:-1])
!
in

There is actually a third option – slice step you can use, which you can specify a non-zero integer, e.g:

print(text4[0::2])
print(text4[1::2])
aceg
bdf
Immutable nature

Since we are able to get each individual character from a string, you may wonder if we can re-assign something else to a particular position in the string. e.g.:

text4[0] = 'T'
#TypeError: 'str' object does not support item assignment

The error shows up because string is immutable and you cannot change anything in it’s original content unless you create a new string:

new_text4 = "T" + text4[1:]
+ and *

And you may noticed different strings can be concatenated by using the “+” in the above example. There is also more operator * can be used in the string.

print(text4 + text3*2)

This will duplicate text3 twice and concatenate them into a single string:

abcdefgthis is also finethis is also fine
Formatting Python string data type

Below are some of the string formatting functions, it’s quite self-explanatory by the function name:

print("lower:", text4.lower())
#same as lower()
print("casefold:", text4.casefold())

print("upper:", text4.upper())

print("title:", text4.title())
#same as title
print("capitalize:", text4.capitalize())

print("swapcase:", text4.swapcase())
print("center:", text4.center(40, "*"))
print("ljust:", text4.ljust(40))
print("rjust:", text4.rjust(40, "*"))
print("zfill:", text4.zfill(40))
print("strip:", text4.strip("a"))
print("replace:", text4.replace("a", "A"))

Below is the output:

lower: abcdefg
casefold: abcdefg
upper: ABCDEFG
title: Abcdefg
capitalize: Abcdefg
swapcase: ABCDEFG
center: ****************abcdefg*****************
ljust: abcdefg                                 
rjust: *********************************abcdefg
zfill: 000000000000000000000000000000000abcdefg
strip: bcdefg
replace: Abcdefg

And also there are functions you can use for checking the string format:

print("isalnum:",text4.isalnum())	
print("isalpha:",text4.isalpha())
print("isdecimal:",text4.isdecimal())
print("isdigit:",text4.isdigit())
print("isnumeric:",text4.isnumeric())
print("isidentifier:",text4.isidentifier())
print("islower:",text4.islower())
print("istitle:",text4.istitle())
print("isupper:",text4.isupper())
print("isspace:",text4.isspace())
print("isprintable:",text4.isprintable())

Output will be something similar to below:

isalnum: True
isalpha: True
isdecimal: False
isdigit: False
isnumeric: False
isidentifier: True
islower: True
istitle: False
isupper: False
isspace: False
isprintable: True
Comparison operations

You can use relational operators such as ==, >, < to compare the two strings. Python will try to compare letter by letter, and all the uppercase letters come before lowercase, hence you will need to convert your texts into a standard format e.g. all upper or lower case, in order to get the comparison result in alphabetical order.

To check if the string starts/ends with any characters, you can use the startswith and endswith function:

if text3.startswith("this"):
    print("yes, it starts with 'this'")
if text3.endswith("fine"):
    print("yes, it ends with 'fine'")

There is no function called contains (sometime people get confused since Java string has this contains method), but you can use the below function – in, find, index or rindex to check if the string has any sub string:

if "this" in text3:
    print("'this' is in text3")
else:
    print("not found")

if text3.find("this") > -1:
    print("found 'this' from tex3")
else:
    print("not found")

if text3.find("this",1, 20) > -1:
    print("found 'this' from tex3")
else:
    print("'this' is not found from text3, starting from index 1 to 20 ")

if text3.index("this") >-1:
    print("found 'this' from tex3, index >=0")
else:
    print("not found")

#ValueError: substring not found
#idx = text3.index("this",1, 20)

Both find and index function return the index value of the sub string, the difference between of two function is that, index function will raise ValueError when the sub string is not found, while find will just return -1.

Split & Join texts

A lot times you may need to split the text by certain delimiter, e.g. newlines (\n), ; space etc. You can use the split function to the text into a list. If the delimiter is not found, the split function will return the original text as in a list.

print("split by default deliminator:", text3.split())
print("split by s", text3.split('s'))
print("split by ;", text3.split(';'))

The output will be:

split by default deliminator: ['this', 'is', 'also', 'fine']
split by s ['thi', ' i', ' al', 'o fine']
split by ; ['this is also fine']

On the other hand, if you have a list of string, you would like to join them into one string, you can do the following:

print("join the words with ';':", ';'.join(text3.split()))
print("join the words without space:", ''.join(text3.split()))

And below is the output:

join the words with ';': this;is;also;fine
join the words without space: thisisalsofine
Count occurrence

The count function can be used for calculating the occurrence of a sub string from the original string, for instance :

print(text3*5)
print("'is' occurence:',(text3*5).count("is"))

Result will be :

this is also finethis is also finethis is also finethis is also finethis is also fine
'is' occurence:10

Conclusion

With all the above examples provided, we have covered most of the commonly used functions for Python string data type. You may also check through the Python official document to see if there is any additional functions you are interested to know for the Python strings data type.

You may also like

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x