Tutorials

pandas split data into buckets with cut and qcut

Pandas – split data into buckets with cut and qcut

If you do a lot of data analysis on your daily job, you may have encountered problems that you would want to split data into buckets or groups based on certain criteria and then analyse your data within each group. For instance, you would like to check the popularity of your products or website within each age groups, or understand how many percent of the students fall under each score range. The most straightforward way might be to categorize your data based on the conditions and then summarize the information, but this usually requires some additional effort to massage the data. In this article, I will be sharing with you a simple way to bin your data with pandas cut and qcut function.

Prerequisite

You will need to install pandas package if you do not have it yet in your working environment. Below is the command to install pandas with pip:

pip install pandas

And let’s import the necessary packages and create some sample sales data for our later examples.

import pandas as pd
import numpy as np
df = pd.DataFrame({"Consignee" : ["Patrick", "Sara", "Randy", "John", "Patrick", "Joe"],
                   "Age" : [44, 51, 23, 30, 44, 39],
                  "Order Date" : pd.date_range(start='2020-08-01', end="2020-08-05", periods=6),
                  "Item Desc" : ["White Wine", "Whisky", "Red Wine", "Whisky", "Red Wine", "Champagne"],
                  "Price Per Unit": [10, 20, 30, 20, 30, 30], 
                  "Order Quantity" : [50, 60, 40, 20, 10, 50],
                  "Order Dimensions" : [0.52, 0.805, 0.48, 0.235,0.12, 0.58]})

With the above codes, we can do a quick view of how the data looks like:

pandas split data into segments

And let’s also calculate the total sales amount by multiplying the price per unit and the order quantity:

df["Total Amount"] = df["Price Per Unit"] * df["Order Quantity"]

Once this data is ready, let’s dive into the problems we are going to solve today.

split data into buckets by cut

If we would like to classify our customers into a few age groups and have a overall view of how much money each age group has spent on our product, how shall we do it ? As I mentioned earlier, we are not going to apply some lambda function with conditions like : if the age is less than 30 then classify the customer as young, because this can easily drive you crazy when you have hundreds or thousands of groups to be defined. Instead, we will be using a powerful data frame cut function to achieve this.

The cut function has two mandatory arguments:

  • x – an array of values to be binned
  • bins – indicate how you want to bin your values

For instance, if you supply the df[“Age”] as the first argument, and indicate bins as 2, you are telling pandas to split your age data into 2 equal groups. In our case, the minimum age value is 23, and maximum age value is 51, so the first group will be from 23 to 23 + (51-23)/2, and second group from 23 + (51-23)/2 to 51. When you run the below code:

pd.cut(df["Age"],2)

You shall see the output similar to below:

pandas split data segment category

Pandas already classified our age data into these two groups and the output shows that data type is a pandas category object. This is very useful as you can actually assign this category column back to the original data frame, and do further analysis based on the categories from there.

Since we don’t want the decimal points for age data, we can set precision = 0, and we also want to label our age data into 3 groups as Yong, Mid-Aged and Old.

Below is the code that we assign our binned age data into “Age Group” column:

df["Age Group"] = pd.cut(df["Age"],3, precision=0, labels=["Young","Mid-Aged","Old"])

If you examine the data again, you would see:

pandas split data into buckets - age group

Pandas mapped out our age data into 3 groups evenly based on the min and max of the age values. But you may have noticed that age 44 has been classified as “Old” which does not sound that true. In this case, we would want to give our own definition of young, mid-aged and old in the bins argument. Let’s delete the “Age Group” column and redo it with below:

df["Age Group"] = pd.cut(df["Age"],[20, 30, 50, 60], precision=0, labels=["Young","Mid-Aged","Old"])

With this list of integer intervals, we are telling pandas to split our data into 3 groups (20, 30], (30, 50] and (50, 60], and label them as Young, Mid-Aged and Old respectively. (here “(” means exclusive, and “]” means inclusive)

If we check the data again:

df[["Age", "Age Group"]]

You shall see the correct result as per we expected:

pandas split data into buckets- age groups with custom intervals

Now with this additional column, you can easily find out how much each age group contributed to the total sales amount. For example:

df.groupby("Age Group").agg({"Total Amount": "sum"})[["Total Amount"]].apply(lambda x: 100*x/x.sum())

This would calculate the contribution % to the total sales amount within each group (more details from here):

pandas split data into buckets - cut age groups

If you do not wish to have any intermediate data column (for our case, the “Age Group”) added to you data frame, you can directly pass the output of the cut into the groupby function:

df.groupby(pd.cut(df["Age"],[20, 30, 50, 55], precision=0, labels=["Young","Mid-Aged","Old"])).agg({"Total Amount": "sum"})[["Total Amount"]].apply(lambda x: 100*x/x.sum())

The above code will produce the same result as previously.

There are times you may want to define your bins with a start point & end point at a fixed interval, for instance, to understand for order dimensions at each 0.1, how much is the total sales amount.

For such case, we can make use of the arange function from numpy package, e.g.:

np.arange(0, 1, 0.1)

This would give us an array of values between 0 and 1 with interval of 0.1, and we can supply it as the bins to cut function:

df.groupby(pd.cut(df["Order Dimensions"],np.arange(0, 1, 0.1))).agg({"Total Amount": "sum"})

With the above code, we can see pandas split the order dimensions into small chunks of every 0.1 range, and then summarized the sales amount for each of these ranges:

pandas split data into buckets - order dimensions

Note that arange does not include the stop number 1, so if you wish to include 1, you may want to add an extra step into the stop number, e.g.: np.arange(0, 1 + 0.1, 0.1). And cut function also has two arguments – right and include_lowest to control how you want to include the left and right edge. E.g.:

df.groupby(pd.cut(df["Order Dimensions"],np.arange(0, 1 + 0.1, 0.1), right=False, include_lowest=True)).agg({"Total Amount": "sum"})

This will make the left edge inclusive and right edge exclusive, the output will be similar to below:

pandas split data into buckets - order dimensions left inclusive

cut vs qcut

Pandas also provides another function qcut, which helps to split your data based on quantiles (the cut points based on the distribution of the data). For instance, if you use qcut for the “Age” column:

pd.qcut(df["Age"],2, duplicates="drop")

You would see the age data has been split into two groups : (22.999, 41.5] and (41.5, 51.0]. 

pandas split data into buckets - age groups qcut

If you examine the data inside each group:

pd.qcut(df["Age"],2, duplicates="drop").value_counts()

You would see qcut has split the total of 6 rows of age data equally into 2 groups, and the cut point is at 41.5:

pandas split data into buckets - age groups qcut - value_counts1

So if you would like to understand what are the 4 age groups spent similar amount of money on your product, you can do as below:

df.groupby(pd.qcut(df["Age"],4, duplicates="drop")).agg({"Total Amount" : "sum"})

And you would see if we split our data into these 4 groups, the total sale amount are relatively the same:

pandas split data into buckets - age groups qcut - sales amount

Conclusion

In this article, we have reviewed through the pandas cut and qcut function where we can make use of them to split our data into buckets either by self defined intervals or based on cut points of the data distribution.

Hope this gives you some hints when you are solving the problems similar to what we have discussed here.

 

python decorators

Why we should use Python decorator

Introduction

Decorator is one of the very important features in Python, and you may have seen it many places in Python code, for instance, the functions with annotation like @classmethod, @staticmethod, @property etc. By definition, decorator is a function that extends the functionality of another function without explicitly modifying it. It makes the code shorter and meanwhile improve the readability. In this article, I will be sharing with you how we shall use the Python decorators.

Basic Syntax

If you have checked my this article about the Python closure, you may still remember that we have discussed about Python allows to pass in a function into another function as argument. For example, if we have the below functions:

add_log – to add log to inspect all the positional and keyword arguments of a function before actually calling it

send_email – to accept some positional and keyword arguments for sending out emails

def add_log(func):
    def log(*args, **kwargs):
        for arg in args:
            print(f"{func.__name__} - args: {arg}")
        for key, val in kwargs.items():
            print(f"{func.__name__} - {key}, {val}")
        return func(*args, **kwargs)
    return log

def send_email(subject, to, **kwargs):  
    #send email logic 
    print(f"email sent to {to} with subject {subject}.")

We can pass in the send_email function to add_log as argument, and then we trigger the sending of the email.

sender = add_log(send_email)
sender("hello", "[email protected]", attachment="debug.log", urgent_flag=True)

This code will generate the output as per below:

python decorator pass function as argument

You can see that the send_email function has been invoked successfully after all the arguments were printed out. This is exactly what decorator is doing – extending the functionality of the send_email function without changing its original structure. When you directly call the send_email again, you can still see it’s original behavior without any change.

python decorator pass function as argument

Python decorator as a function

Before Python 2.4, the classmethod() and staticmethod() function were used to decorate functions by passing in the decorated function as argument. And later the @ symbol was introduced to make the code more concise and easier to read especially when the functions are very long.

So let implement our own decorator with @ syntax.

Assuming we have the below decorator function and we want to check if user is in the whitelist before allowing he/she to access certain resources. We follow the Python convention to use wrapper as the name of the inner function (although it is free of your choice to use any name).

class PermissionDenied(Exception):
    pass

def permission_required(func):
    whitelist = ["John", "Jane", "Joe"]
    def wrapper(*args, **kwargs):
        user = args[0]
        if not user in whitelist:
            raise PermissionDenied
        func(*args, **kwargs)
    return wrapper

Next, we decorate our function with permission_required as per below:

@permission_required
def read_file(user, file_path):
    with open(file_path, "r") as f:
        #print out the first line of the file
        print(f.readline())

When we call our function as per normal, we shall expect the decorator function to be executed first to check if user is in the whitelist.

read_file("John", r"C:\pwd.txt")

You can see the below output has been printed out:

python decorator read file output -1

If we pass in some user name not in the whitelist:

read_file("Johnny", r"C:\pwd.txt")

You would see the permission denied exception raised which shows everything works perfect as per we expected.

python decorator read file permission denied

But if you are careful enough, you may find something strange when you check the below.

python decorator read file output -3

So it seems there is some flaw with this implementation although the functional requirement has been met. The function signature has been overwritten by the decorator, and this may cause some confusing to other people when they want to use your function.

Use of the functools.wraps

To solve this problem, we will need to introduce one more Python module functools, where we can use the wraps method to update back the metadata info for the original function.

Let update our decorator function again by adding @wraps(func) to the wrapper function:

from functools import wraps

def permission_required(func):
    ...
    @wraps(func)
    def wrapper(*args, **kwargs):
       ...
    return wrapper

Finally, when we check the function signature and name again, it shows the correct information now.

python decorator read file output -4

So what happened was that, the @wraps(func) would invoke a update_wrapper function which updates the metadata of the original function automatically so that you will not see the wrapper’s metadata. You may want to check the update_wrapper function in the functools module to further understand how the metadata is updated.

Beside decorating normal function, the decorator function can be also used to decorate the class function, for instance, the @staticmethod and @property are commonly seen in Python code to decorate the class functions.

Python decorator as a class

Decorator function can be also implemented as a class in case you find your wrapper function has grown too big or has nested too deeply. To make this happen, you will need to implement a __call__ function so that the class instance become callable with the decorated function as argument.

Below is the code that implements our earlier example as a class:

from functools import update_wrapper
class PermissionRequired:
    def __init__(self, func):
        self._whitelist = ["John", "Jane", "Joe"]
        update_wrapper(self, func)
        self._func = func
        
    def __call__(self, *args, **kwargs):  
        user = args[0]
        if not user in self._whitelist:
            raise PermissionDenied
        return self._func(*args, **kwargs)

Take note that we will need to call the update_wrapper function to manually update the metadata for our decorated function. And same as before, we can continue using @ with class name to decorate our function.

@PermissionRequired
def read_file(user, file_path):
    with open(file_path, "r") as f:
        #print out the first line of the file
        print(f.readline())

Conclusion

In this article, we have reviewed through the reasons of Python decorators being introduced with the basic syntax of implementing our own decorators. And we also discussed about the decorator as function and class with some examples. Hopefully this article would help you to enhance your understanding about Python decorator and guide you on how to use it in your project.

 

Photo by Ali Yahya on Unsplash

Master python closure with 3 real-world examples

Introduction

Python closure is a technique for binding function with an environment where the function gets access to all the variables defined in the enclosing scope. Closure typically appears in the programming language with first class function, which means functions are allowed to be passed as arguments, return value or assigned to a variable.

This definition sounds confusing to the python beginners, and sometimes the examples found from online also not intuitive enough in the way that most of the examples are trying to illustrate with some printing statement, so the readers may not get the whole idea of why and how the closure should be used. In this article, I will be using some real-world example to explain how to use closure in your code.

Nested function in Python

To understand closure, we must first know that Python has nested function where one function can be defined inside another. For instance, the below inner_func is the nested function and the outer_func returns it’s nested function as return value.

def outer_func():    
    print("starting outer func")
    def inner_func():
        pi = 3.1415926
        print(f"pi is : {pi}")
    return inner_func

When you invoke the outer_func, it returns the reference to the inner_func, and subsequently you can call the inner_func. Below is the output when you run in Jupyter Notebook:

python closure nested function example

After you have got some feeling about the nested function, let’s continue to explore how nested function is related to closure. If we modify our previous function and move the pi variable into outer function, surprisedly it generates the same result as previously.

def outer_func():    
    print("starting outer func")
    #move pi variable definition to outer function
    pi = 3.1415926
    def inner_func():
        print(f"pi is : {pi}")
    return inner_func

You may wonder the pi variable is defined in outer function which is a local variable to outer_func, why inner_func is able access it since it’s not a global scope? This is exactly where closure happens, the inner_func has the full access to the environment (variables) in it’s enclosing scope. The inner_func refers to pi variable as nonlocal variable since there is no other local variable called pi.

If you want to modify the value of the pi inside the inner_func, you will have to explicitly specify “nonlocal pi” before you modify it since it’s immutable data type.

With the above understanding, now let’s walk through some real-world examples to see how we can use closure in our code.

Hide data with Python closure

Let’s say we want to implement a counter to record how many time the word has been repeated. The first thing you may want to do is to define a dictionary in global scope, and then create a function to add in the words as key into this dictionary and also update the number of times it repeated. Below is the sample code:

counter = {}

def count_word(word):    
    global counter
    counter[word] = counter.get(word, 0) + 1
    return counter[word]

To make sure the count_word function updates the correct “counter”, we need to put the global keyword to explicitly tell Python interpreter to use the “counter” defined in global scope, not any variable we accidentally defined with the same name in the local scope (within this function).

Sample output:

python closure word counter sample output

The above code works as expected, but there are two potential issues: Firstly, the global variable is accessible to any of the other functions and you cannot guarantee your data won’t be modified by others. Secondly, the global variable exists in the memory as long as the program is still running, so you may not want to create so many global variables if not necessary.

To address these two issues, let’s re-implement it with closure:

def word_counter():
    counter = {}
    def count(word):
        counter[word] = counter.get(word, 0) + 1
        return counter[word]
    return count

If we run it from Jupyter Notebook, you will see the below output:

python closure word counter example output

With this implementation, the counter dictionary is hidden from the public access and the functionality remains the same. (you may notice it works even after the word_counter function is deleted)

Convert small class to function with Python closure

Occasionally in your project, you may want to implement a small utility class to do some simple task. Let’s take a look at the below example:

import requests

class RequestMaker:
    def __init__(self, base_url):
        self.url = base_url
    def request(self, **kwargs):
        return requests.get(self.url.format_map(kwargs))

You can see the below output when you call the make_request from an instance of RequestMaker:

python closure small class example

Since you’ve already seen in the word counter example, the closure can also hold the data for your later use, the above class can be converted into a function with closure:

import requests

def request_maker(url):
    def make_request(**kwargs):
        return requests.get(url.format_map(kwargs))
    return make_request

The code becomes more concise and achieves the same result. Take note that in the above code, we are able to pass in the arguments into the nested function with **kwargs (or *args).

python closure convert small class to closure

Replace text with case matching

When you use regular express to find and replace some text, you may realize if you are trying to match text in case insensitive mode, you will not able to replace the text with proper case. For instance:

import re

paragraph = 'To start Python programming, you need to install python and configure PYTHON env.'
re.sub("python", "java", paragraph, flags=re.I)

Output from above:

python closure replace with case

It indeed replaced all the occurrence of the “python”, but the case does not match with the original text. To solve this problem, let’s implement the replace function with closure:

def replace_case(word):
    def replace(m):
        text = m.group()
        if text.islower():
            return word.lower()
        elif text.isupper():
            return word.upper()
        elif text[0].isupper():
            return word.capitalize()
        else:
            return word
    return replace

In the above code, the replace function has the access to the original text we intend to replace with, and when we detect the case of the matched text, we can convert the case of original text and return it back.

So in our original substitute function, let’s pass in a function replace_case(“java”) as the second argument. (You may refer to Python official doc in case you want to know what is the behavior when passing in function to re.sub)

re.sub("python", replace_case("java"), paragraph, flags=re.IGNORECASE)

If we run the above again, you should be able to see the case has been retained during the replacement as per below:

python closure replace with case

Conclusion

In this article, we have discussed about the general reasons why Python closure is used and also demonstrated how it can be used in your code with 3 real-world examples. In fact, Python decorator is also a use case of closure, I will be discussing this topic in the next article.

 

pyinstaller pack python program into exe

How to pack python program into exe file

After you have built your python program, you may want to distribute this program to your users to run by themselves. However, in most of the cases, your uses either may not have the access to install Python for executing the script nor have the knowledge to run script from command line. In this case, you will need to find a way to pack your program into some executable file, so that it can be run with a simply click like other apps. In this article, I will be sharing with you how to pack python program into exe file with PyInstaller library for Windows users.

Prerequisite

You will need to create a virtual environment for your python program and activate it with the below command. I will explain why this is needed later.

python -m venv test
test\Scripts\activate.bat

Then install PyInstaller library:

pip install pyinstaller

Let’s get started

Let me first explain why we need to set up a virtual environment for your program. If you are concurrently working on different projects, and each of them are using a different set of python libraries, sometimes these libraries may conflict with each other due the version difference or other dependencies. In this case, you will need to use venv module to create a isolated python environment for each of your projects, so that each virtual environment only has the necessary libraries for running that particular python project.

Same comes when packing your program with PyInstaller, the virtual environment will ensure only the necessary libraries will be packed generating the executable file.

Build your Python program

For this article, our main objective is to demonstrate how to pack python program into exe file, so let’s just include some random library and write some dummy code.

pip install requests

And create a hello.py with the below code:

import requests
import sys, time

result = requests.get("https://www.google.com")
print(f"Google responded {result.status_code}")

with open("test.config") as f:
    print(f.read())

for i in range(15, 0, -1):
    sys.stdout.write("\r")
    sys.stdout.write(f"Window will be closed in {i:2d} seconds")
    sys.stdout.flush()
    time.sleep(1)

Let’s also create a file at the current directory called “test.config” and write some random words, saying “some configurations”.

If you run it with python hello.py, you shall get something similar output to the below:

Google responded 200
some configuration
Window will be closed in  1 seconds

Everything is ready, let’s move to the next step to pack python program into exe file.

Pack python program into exe file with PyInstaller

The PyInstaller program is actually quite easy to use, everything comes with a default option. E.g., If you do not specify any parameter and just run the below:

pyinstaller hello.py

You will be able to get a folder (onedir mode) under dist\hello, where you can find a hello.exe. But if you click to run it, it probably will auto close after a few seconds before you can see any error message.

The problem here is that, inside our program, we have some code to read some external file “test.config”, and this file was not packed into the dist\hello folder. Of course you can manually copy this file to dist\hello every time after you built the Python program, but there is a option you can use to tell PyInstaller to include the additional files.

–add-data option

This –add-data option can be used to include the additional file or directory. e.g.:

–add-data “src file or folder;dest file or folder”

If you have multiple files to be added, you can use this option multiple times. (for binary file, you may consider to use –add-binary option)

So you can re-run the below command to include the additional file, and also use –clean to clean up the directory before generating the files again.

pyinstaller hello.py --add-data "test.config;." --clean
–noconfirm option

You may see the warning similar to below to ask your confirmation to delete the old files, you can just key in “y” to confirm. This question can be avoided if you put the option –noconfirm.

WARNING: The output directory “c:\test\dist\hello” and ALL ITS CONTENTS will be REMOVED! Continue? (y/n)

So once the new exe file generated, you shall be able to run and see the below result:

pack python program into exe file

So far so good, but still can be better. Let’s specify the name of the exe file, and make it one file rather than a directory.

–onefile vs –onedir

With the below extra options : –onefile and –name “SuperHero”, we shall expect to pack the Python program into a single SuperHero.exe file.

pyinstaller --onefile hello.py --name "SuperHero" --add-data "test.config;." --clean

When we try to execute this exe file, you would see some error like below. This is because when running the exe, PyInstaller unpack your data into a temp folder, and the temp folder path is set to sys._MEIPASS, which will be different from your original file path.

pack python program into exe file

In this case, let’s modify our code again to cater for this:

import os

def get_resource_path(relative_path):
    try:
        # PyInstaller creates a temp folder and set the path in _MEIPASS
        base_path = sys._MEIPASS
    except Exception:
        base_path = os.path.abspath(".")

    return os.path.join(base_path, relative_path)

with open(get_resource_path("test.config")) as f:
    print(f.read())

When you rebuild the SuperHero.exe, this time you shall be able to execute it without any issue. And it also works perfectly if you rebuild your exe with –onedir mode.

–log-level

If you do not wish to see so many output messages when packing the program, you can turn it off by using the –log-level, the log level option can be one of TRACE, DEBUG, INFO, WARN, ERROR, CRITICAL. For instance, –log-level=”ERROR” will only show any output with error, and you do not even see a “Building completed successfully” message after build completion as it is logged as INFO.

–noconsole

If you are working with some automation program like auto sending emails or auto save some attachments, which does not necessarily interact with users, you can use –noconsole option, so when you click to run your exe file, it does not show up any console window.

PyInstaller specification file

You may noticed after you run the pyinstaller command, there is a .spec file generated. This file keeps all the options you have used for your last build. So if you just want to rebuild your executable files without changing any option, you may use the below command:

pyinstaller - D SuperHero.spec

Conclusion

With the options covered in above, it should meet your basic needs to pack python program into exe file. You may also refer to the official document for the other options PyInstaller offers.

python string data type

Python String Data Type

In the previous article, we have discussed about the Python variables including string variables. String is a Python built-in data type which holds a sequence of characters, you will need to use it whenever you need to do any text processing. In this article, I will be sharing with you the various operations you can perform with the Python string data type.

Python string data type

In python, you can define a string variable with single quote, double quotes or triple quotes. And use type() function to verify the data type of your variable. E.g.:

text1 = 'hello \n world!'
text2 = "bac;def,what$ is"
text3 = """this is also fine"""
print(type(text1), text1)
print(type(text2), text2)
print(type(text3), text3)

You should be able to see the below output, and the data type is showing as “str”.

<class 'str'> hello 
 world!
<class 'str'> bac;def,what$ is
<class 'str'> this is also fine
Slice Operation

As per the definition for Python string data type, it is a sequence of characters, which means you can access each of the character with the index. (index starts from 0 for the first element)

print(text1[0], text2[1], text3[2])
h a i

And you can use slice operation to get a sub set of your string variable:

#get a sub string starting from index 0 and ending at index 5 (exclusive)
print(text1[0:5])
#get a sub string starting from index 5 and ending at index 7 (exclusive)
print(text3[5:7])
#get a sub string starting from default index 0 and ending at index 4 (exclusive)
print(text3[:4])
#get a sub string starting from index 5 and ending at the end of the string
print(text3[5:])
hello
is
this
is also fine

You can also specify the negative index value to slice the string starting from right to left:

print(text1[-1])
print(text3[-3:-1])
!
in

There is actually a third option – slice step you can use, which you can specify a non-zero integer, e.g:

print(text4[0::2])
print(text4[1::2])
aceg
bdf
Immutable nature

Since we are able to get each individual character from a string, you may wonder if we can re-assign something else to a particular position in the string. e.g.:

text4[0] = 'T'
#TypeError: 'str' object does not support item assignment

The error shows up because string is immutable and you cannot change anything in it’s original content unless you create a new string:

new_text4 = "T" + text4[1:]
+ and *

And you may noticed different strings can be concatenated by using the “+” in the above example. There is also more operator * can be used in the string.

print(text4 + text3*2)

This will duplicate text3 twice and concatenate them into a single string:

abcdefgthis is also finethis is also fine
Formatting Python string data type

Below are some of the string formatting functions, it’s quite self-explanatory by the function name:

print("lower:", text4.lower())
#same as lower()
print("casefold:", text4.casefold())

print("upper:", text4.upper())

print("title:", text4.title())
#same as title
print("capitalize:", text4.capitalize())

print("swapcase:", text4.swapcase())
print("center:", text4.center(40, "*"))
print("ljust:", text4.ljust(40))
print("rjust:", text4.rjust(40, "*"))
print("zfill:", text4.zfill(40))
print("strip:", text4.strip("a"))
print("replace:", text4.replace("a", "A"))

Below is the output:

lower: abcdefg
casefold: abcdefg
upper: ABCDEFG
title: Abcdefg
capitalize: Abcdefg
swapcase: ABCDEFG
center: ****************abcdefg*****************
ljust: abcdefg                                 
rjust: *********************************abcdefg
zfill: 000000000000000000000000000000000abcdefg
strip: bcdefg
replace: Abcdefg

And also there are functions you can use for checking the string format:

print("isalnum:",text4.isalnum())	
print("isalpha:",text4.isalpha())
print("isdecimal:",text4.isdecimal())
print("isdigit:",text4.isdigit())
print("isnumeric:",text4.isnumeric())
print("isidentifier:",text4.isidentifier())
print("islower:",text4.islower())
print("istitle:",text4.istitle())
print("isupper:",text4.isupper())
print("isspace:",text4.isspace())
print("isprintable:",text4.isprintable())

Output will be something similar to below:

isalnum: True
isalpha: True
isdecimal: False
isdigit: False
isnumeric: False
isidentifier: True
islower: True
istitle: False
isupper: False
isspace: False
isprintable: True
Comparison operations

You can use relational operators such as ==, >, < to compare the two strings. Python will try to compare letter by letter, and all the uppercase letters come before lowercase, hence you will need to convert your texts into a standard format e.g. all upper or lower case, in order to get the comparison result in alphabetical order.

To check if the string starts/ends with any characters, you can use the startswith and endswith function:

if text3.startswith("this"):
    print("yes, it starts with 'this'")
if text3.endswith("fine"):
    print("yes, it ends with 'fine'")

There is no function called contains (sometime people get confused since Java string has this contains method), but you can use the below function – in, find, index or rindex to check if the string has any sub string:

if "this" in text3:
    print("'this' is in text3")
else:
    print("not found")

if text3.find("this") > -1:
    print("found 'this' from tex3")
else:
    print("not found")

if text3.find("this",1, 20) > -1:
    print("found 'this' from tex3")
else:
    print("'this' is not found from text3, starting from index 1 to 20 ")

if text3.index("this") >-1:
    print("found 'this' from tex3, index >=0")
else:
    print("not found")

#ValueError: substring not found
#idx = text3.index("this",1, 20)

Both find and index function return the index value of the sub string, the difference between of two function is that, index function will raise ValueError when the sub string is not found, while find will just return -1.

Split & Join texts

A lot times you may need to split the text by certain delimiter, e.g. newlines (\n), ; space etc. You can use the split function to the text into a list. If the delimiter is not found, the split function will return the original text as in a list.

print("split by default deliminator:", text3.split())
print("split by s", text3.split('s'))
print("split by ;", text3.split(';'))

The output will be:

split by default deliminator: ['this', 'is', 'also', 'fine']
split by s ['thi', ' i', ' al', 'o fine']
split by ; ['this is also fine']

On the other hand, if you have a list of string, you would like to join them into one string, you can do the following:

print("join the words with ';':", ';'.join(text3.split()))
print("join the words without space:", ''.join(text3.split()))

And below is the output:

join the words with ';': this;is;also;fine
join the words without space: thisisalsofine
Count occurrence

The count function can be used for calculating the occurrence of a sub string from the original string, for instance :

print(text3*5)
print("'is' occurence:',(text3*5).count("is"))

Result will be :

this is also finethis is also finethis is also finethis is also finethis is also fine
'is' occurence:10

Conclusion

With all the above examples provided, we have covered most of the commonly used functions for Python string data type. You may also check through the Python official document to see if there is any additional functions you are interested to know for the Python strings data type.