ken

pandas tricks calculate percentage within group

Pandas Tricks – Calculate Percentage Within Group

Pandas groupby probably is the most frequently used function whenever you need to analyse your data, as it is so powerful for summarizing and aggregating data. Often you still need to do some calculation on your summarized data, e.g. calculating the % of vs total within certain category. In this article, I will be sharing with you some tricks to calculate percentage within groups of your data.

Prerequisite

You will need to install pandas if you have not yet installed:

pip install pandas
#or conda install pandas

I am going to use some real world example to demonstrate what kind of problems we are trying to solve. The sample data I am using is from this link , and you can also download it and try by yourself.

Let’s first read the data from this sample file:

import pandas as pd

# You can also replace the below file path to the URL of the file
df = pd.read_excel(r"C:\Sample Sales Data.xlsx", sheet_name="Sheet")

The data will be loaded into pandas dataframe, you will be able to see something as per below:

pandas tricks - calculate percentage within group

Let’s first calculate the sales amount for each transaction by multiplying the quantity and unit price columns.

df["Total Amount"] = df["Quantity"] * df["Price Per Unit"]

You can see the calculated result like below:

pandas tricks - calculate percentage within group

Calculate percentage within group

With the above details, you may want to group the data by sales person and the items they sold, so that you have a overall view of their performance for each person. You can do with the below :

#df.groupby(["Salesman","Item Desc"])["Total Amount"].sum()
df.groupby(["Salesman", "Item Desc"]).agg({"Total Amount" : "sum"})

And you will be able to see the total amount per each sales person:

pandas tricks - calculate percentage within group

This is good as you can see the total of the sales for each person and products within the given period.

Calculate the best performer

Now let’s see how we can get the % of the contribution to total revenue for each of the sales person, so that we can immediately see who is the best performer.

To achieve that, firstly we will need to group and sum up the “Total Amount” by “Salemans”, which we have already done previously.

df.groupby(["Salesman"]).agg({"Total Amount" : "sum"})

And then we calculate the sales amount against the total of the entire group. Here we can get the “Total Amount” as the subset of the original dataframe, and then use the apply function to calculate the current value vs the total. Take note, here the default value of axis is 0 for apply function.

[["Total Amount"]].apply(lambda x: 100*x/x.sum())

With the above, we should be able get the % of contribution to total sales for each sales person. And let’s also sort the % from largest to smallest:

sort_values(by="Total Amount", ascending=False)

Let’s put all together and run the below in Jupyter Notebook:

df.groupby(["Salesman"])\
.agg({"Total Amount" : "sum"})[["Total Amount"]]\
.apply(lambda x: 100*x/x.sum())\
.sort_values(by="Total Amount", ascending=False)

You shall be able to see the below result with the sales contribution in descending order. (Do not confuse with the column name “Total Amount”, pandas uses the original column name for the aggregated data. You can rename it to whatever name you want later)

pandas tricks - calculate percentage within group for salesman

 

Calculate the most popular products

Similarly, we can follow the same logic to calculate what is the most popular products. This time we want to summarize the sales amount by product, and calculate the % vs total for both “Quantity” and “Total Amount”. And also we want to sort the data in descending order for both fields. e.g.:

df.groupby(["Item Desc"])\
.agg({"Quantity": "sum", "Total Amount" : "sum"})[["Quantity", "Total Amount"]]\
.apply(lambda x: 100*x/x.sum())\
.sort_values(by=["Quantity","Total Amount"], ascending=[False,False])

This will produce the below result, which shows “Whisky” is the most popular product in terms of number of quantity sold. But “Red Wine” contributes the most in terms of the total revenue probably because of the higher unit price.

pandas tricks - calculate percentage within group for products

 

Calculate best sales by product for each sales person

What if we still wants to understand within each sales person, what is the % of sales for each product vs his/her total sales amount?

In this case, we shall first group the “Salesman” and “Item Desc” to get the total sales amount for each group. And on top of it, we calculate the % within each “Salesman” group which is achieved with groupby(level=0).apply(lambda x: 100*x/x.sum()).

Note: After grouping, the original datafram becomes multiple index dataframe, hence the level = 0 here refers to the top level index which is “Salesman” in our case.

df.groupby(["Salesman", "Item Desc"])\
.agg({"Total Amount" : "sum"})\
.groupby(level=0).apply(lambda x: 100*x/x.sum())\
.sort_values(by=["Salesman", "Item Desc","Total Amount"], ascending=[True, True, False])

You will be able see the below result which already sorted by % of sales contribution for each sales person.

pandas tricks - calculate percentage within group - for salesman and product

 

Conclusion

This is just some simple use cases where we want to calculate percentage within group with the pandas apply function, you may also be interested to see what else the apply function can do from here.

 

pyinstaller pack python program into exe

How to pack python program into exe file

After you have built your python program, you may want to distribute this program to your users to run by themselves. However, in most of the cases, your uses either may not have the access to install Python for executing the script nor have the knowledge to run script from command line. In this case, you will need to find a way to pack your program into some executable file, so that it can be run with a simply click like other apps. In this article, I will be sharing with you how to pack python program into exe file with PyInstaller library for Windows users.

Prerequisite

You will need to create a virtual environment for your python program and activate it with the below command. I will explain why this is needed later.

python -m venv test
test\Scripts\activate.bat

Then install PyInstaller library:

pip install pyinstaller

Let’s get started

Let me first explain why we need to set up a virtual environment for your program. If you are concurrently working on different projects, and each of them are using a different set of python libraries, sometimes these libraries may conflict with each other due the version difference or other dependencies. In this case, you will need to use venv module to create a isolated python environment for each of your projects, so that each virtual environment only has the necessary libraries for running that particular python project.

Same comes when packing your program with PyInstaller, the virtual environment will ensure only the necessary libraries will be packed generating the executable file.

Build your Python program

For this article, our main objective is to demonstrate how to pack python program into exe file, so let’s just include some random library and write some dummy code.

pip install requests

And create a hello.py with the below code:

import requests
import sys, time

result = requests.get("https://www.google.com")
print(f"Google responded {result.status_code}")

with open("test.config") as f:
    print(f.read())

for i in range(15, 0, -1):
    sys.stdout.write("\r")
    sys.stdout.write(f"Window will be closed in {i:2d} seconds")
    sys.stdout.flush()
    time.sleep(1)

Let’s also create a file at the current directory called “test.config” and write some random words, saying “some configurations”.

If you run it with python hello.py, you shall get something similar output to the below:

Google responded 200
some configuration
Window will be closed in  1 seconds

Everything is ready, let’s move to the next step to pack python program into exe file.

Pack python program into exe file with PyInstaller

The PyInstaller program is actually quite easy to use, everything comes with a default option. E.g., If you do not specify any parameter and just run the below:

pyinstaller hello.py

You will be able to get a folder (onedir mode) under dist\hello, where you can find a hello.exe. But if you click to run it, it probably will auto close after a few seconds before you can see any error message.

The problem here is that, inside our program, we have some code to read some external file “test.config”, and this file was not packed into the dist\hello folder. Of course you can manually copy this file to dist\hello every time after you built the Python program, but there is a option you can use to tell PyInstaller to include the additional files.

–add-data option

This –add-data option can be used to include the additional file or directory. e.g.:

–add-data “src file or folder;dest file or folder”

If you have multiple files to be added, you can use this option multiple times. (for binary file, you may consider to use –add-binary option)

So you can re-run the below command to include the additional file, and also use –clean to clean up the directory before generating the files again.

pyinstaller hello.py --add-data "test.config;." --clean
–noconfirm option

You may see the warning similar to below to ask your confirmation to delete the old files, you can just key in “y” to confirm. This question can be avoided if you put the option –noconfirm.

WARNING: The output directory “c:\test\dist\hello” and ALL ITS CONTENTS will be REMOVED! Continue? (y/n)

So once the new exe file generated, you shall be able to run and see the below result:

pack python program into exe file

So far so good, but still can be better. Let’s specify the name of the exe file, and make it one file rather than a directory.

–onefile vs –onedir

With the below extra options : –onefile and –name “SuperHero”, we shall expect to pack the Python program into a single SuperHero.exe file.

pyinstaller --onefile hello.py --name "SuperHero" --add-data "test.config;." --clean

When we try to execute this exe file, you would see some error like below. This is because when running the exe, PyInstaller unpack your data into a temp folder, and the temp folder path is set to sys._MEIPASS, which will be different from your original file path.

pack python program into exe file

In this case, let’s modify our code again to cater for this:

import os

def get_resource_path(relative_path):
    try:
        # PyInstaller creates a temp folder and set the path in _MEIPASS
        base_path = sys._MEIPASS
    except Exception:
        base_path = os.path.abspath(".")

    return os.path.join(base_path, relative_path)

with open(get_resource_path("test.config")) as f:
    print(f.read())

When you rebuild the SuperHero.exe, this time you shall be able to execute it without any issue. And it also works perfectly if you rebuild your exe with –onedir mode.

–log-level

If you do not wish to see so many output messages when packing the program, you can turn it off by using the –log-level, the log level option can be one of TRACE, DEBUG, INFO, WARN, ERROR, CRITICAL. For instance, –log-level=”ERROR” will only show any output with error, and you do not even see a “Building completed successfully” message after build completion as it is logged as INFO.

–noconsole

If you are working with some automation program like auto sending emails or auto save some attachments, which does not necessarily interact with users, you can use –noconsole option, so when you click to run your exe file, it does not show up any console window.

PyInstaller specification file

You may noticed after you run the pyinstaller command, there is a .spec file generated. This file keeps all the options you have used for your last build. So if you just want to rebuild your executable files without changing any option, you may use the below command:

pyinstaller - D SuperHero.spec

Conclusion

With the options covered in above, it should meet your basic needs to pack python program into exe file. You may also refer to the official document for the other options PyInstaller offers.

python string data type

Python String Data Type

In the previous article, we have discussed about the Python variables including string variables. String is a Python built-in data type which holds a sequence of characters, you will need to use it whenever you need to do any text processing. In this article, I will be sharing with you the various operations you can perform with the Python string data type.

Python string data type

In python, you can define a string variable with single quote, double quotes or triple quotes. And use type() function to verify the data type of your variable. E.g.:

text1 = 'hello \n world!'
text2 = "bac;def,what$ is"
text3 = """this is also fine"""
print(type(text1), text1)
print(type(text2), text2)
print(type(text3), text3)

You should be able to see the below output, and the data type is showing as “str”.

<class 'str'> hello 
 world!
<class 'str'> bac;def,what$ is
<class 'str'> this is also fine
Slice Operation

As per the definition for Python string data type, it is a sequence of characters, which means you can access each of the character with the index. (index starts from 0 for the first element)

print(text1[0], text2[1], text3[2])
h a i

And you can use slice operation to get a sub set of your string variable:

#get a sub string starting from index 0 and ending at index 5 (exclusive)
print(text1[0:5])
#get a sub string starting from index 5 and ending at index 7 (exclusive)
print(text3[5:7])
#get a sub string starting from default index 0 and ending at index 4 (exclusive)
print(text3[:4])
#get a sub string starting from index 5 and ending at the end of the string
print(text3[5:])
hello
is
this
is also fine

You can also specify the negative index value to slice the string starting from right to left:

print(text1[-1])
print(text3[-3:-1])
!
in

There is actually a third option – slice step you can use, which you can specify a non-zero integer, e.g:

print(text4[0::2])
print(text4[1::2])
aceg
bdf
Immutable nature

Since we are able to get each individual character from a string, you may wonder if we can re-assign something else to a particular position in the string. e.g.:

text4[0] = 'T'
#TypeError: 'str' object does not support item assignment

The error shows up because string is immutable and you cannot change anything in it’s original content unless you create a new string:

new_text4 = "T" + text4[1:]
+ and *

And you may noticed different strings can be concatenated by using the “+” in the above example. There is also more operator * can be used in the string.

print(text4 + text3*2)

This will duplicate text3 twice and concatenate them into a single string:

abcdefgthis is also finethis is also fine
Formatting Python string data type

Below are some of the string formatting functions, it’s quite self-explanatory by the function name:

print("lower:", text4.lower())
#same as lower()
print("casefold:", text4.casefold())

print("upper:", text4.upper())

print("title:", text4.title())
#same as title
print("capitalize:", text4.capitalize())

print("swapcase:", text4.swapcase())
print("center:", text4.center(40, "*"))
print("ljust:", text4.ljust(40))
print("rjust:", text4.rjust(40, "*"))
print("zfill:", text4.zfill(40))
print("strip:", text4.strip("a"))
print("replace:", text4.replace("a", "A"))

Below is the output:

lower: abcdefg
casefold: abcdefg
upper: ABCDEFG
title: Abcdefg
capitalize: Abcdefg
swapcase: ABCDEFG
center: ****************abcdefg*****************
ljust: abcdefg                                 
rjust: *********************************abcdefg
zfill: 000000000000000000000000000000000abcdefg
strip: bcdefg
replace: Abcdefg

And also there are functions you can use for checking the string format:

print("isalnum:",text4.isalnum())	
print("isalpha:",text4.isalpha())
print("isdecimal:",text4.isdecimal())
print("isdigit:",text4.isdigit())
print("isnumeric:",text4.isnumeric())
print("isidentifier:",text4.isidentifier())
print("islower:",text4.islower())
print("istitle:",text4.istitle())
print("isupper:",text4.isupper())
print("isspace:",text4.isspace())
print("isprintable:",text4.isprintable())

Output will be something similar to below:

isalnum: True
isalpha: True
isdecimal: False
isdigit: False
isnumeric: False
isidentifier: True
islower: True
istitle: False
isupper: False
isspace: False
isprintable: True
Comparison operations

You can use relational operators such as ==, >, < to compare the two strings. Python will try to compare letter by letter, and all the uppercase letters come before lowercase, hence you will need to convert your texts into a standard format e.g. all upper or lower case, in order to get the comparison result in alphabetical order.

To check if the string starts/ends with any characters, you can use the startswith and endswith function:

if text3.startswith("this"):
    print("yes, it starts with 'this'")
if text3.endswith("fine"):
    print("yes, it ends with 'fine'")

There is no function called contains (sometime people get confused since Java string has this contains method), but you can use the below function – in, find, index or rindex to check if the string has any sub string:

if "this" in text3:
    print("'this' is in text3")
else:
    print("not found")

if text3.find("this") > -1:
    print("found 'this' from tex3")
else:
    print("not found")

if text3.find("this",1, 20) > -1:
    print("found 'this' from tex3")
else:
    print("'this' is not found from text3, starting from index 1 to 20 ")

if text3.index("this") >-1:
    print("found 'this' from tex3, index >=0")
else:
    print("not found")

#ValueError: substring not found
#idx = text3.index("this",1, 20)

Both find and index function return the index value of the sub string, the difference between of two function is that, index function will raise ValueError when the sub string is not found, while find will just return -1.

Split & Join texts

A lot times you may need to split the text by certain delimiter, e.g. newlines (\n), ; space etc. You can use the split function to the text into a list. If the delimiter is not found, the split function will return the original text as in a list.

print("split by default deliminator:", text3.split())
print("split by s", text3.split('s'))
print("split by ;", text3.split(';'))

The output will be:

split by default deliminator: ['this', 'is', 'also', 'fine']
split by s ['thi', ' i', ' al', 'o fine']
split by ; ['this is also fine']

On the other hand, if you have a list of string, you would like to join them into one string, you can do the following:

print("join the words with ';':", ';'.join(text3.split()))
print("join the words without space:", ''.join(text3.split()))

And below is the output:

join the words with ';': this;is;also;fine
join the words without space: thisisalsofine
Count occurrence

The count function can be used for calculating the occurrence of a sub string from the original string, for instance :

print(text3*5)
print("'is' occurence:',(text3*5).count("is"))

Result will be :

this is also finethis is also finethis is also finethis is also finethis is also fine
'is' occurence:10

Conclusion

With all the above examples provided, we have covered most of the commonly used functions for Python string data type. You may also check through the Python official document to see if there is any additional functions you are interested to know for the Python strings data type.

Python Variables and Keywords

Python Tutorial – Variables and Keywords

This article serves as a tutorial for Python beginners to gain the essential knowledge to start coding in Python. By complete this tutorial, you shall be able to know how to correctly use Python variables as well as the Python keywords.

Python Variable

Variable is a name that refers to some value. Like any other programming languages, Python allows to define variables and manipulate it in your code logic.

Name convention

Python allows to use letter, number, or underscore [_] in a variable name, but it has to start with a letter or an underscore (_) followed by zero or more letters, underscores and digits (0 to 9).

There is no limit on the length of your variable name, so you can choose anything meaningful to you in your code. but Python provided some guidelines to use lowercase as much as possible for the variables and function name.

Below are some examples of valid variable names:

a = "a"
#Python variable name is case sensitive
A = "a"
 
module_name = "Python Tutorial for Variables & Keywords"
speed_of_gravity = 299792458
pi = 3.14159265359
is_matched = True

And some invalid variable names as per below, if you use them in your code, Python throws “SyntaxError: invalid syntax” error.

1st_name = "John"
#invalid as variable cannot start with digits
first name = "John"
right/wrong = True
#invalid as variable cannot has special characters like /, whitespace, @, &, * etc., except _

Use of underscore

Take note of the _, although it is allowed to use in your variable name, it has some special meaning if you use it at the beginning. e.g. if you use _salary in your class, Python will protect it from accessing from outside of the class. This is out of scope for this topic, but do bear in mind on this.

Also if you use _ as your available name, there will be a conflict in the Python interactive mode, as in interactive mode, _ is interpreted as the result of the last executed expression, check more from this article.

You may also noticed that variables can hold different sorts of values, e.g. single character, multiple characters, numbers, and True or False etc. This is the different data type in Python, we will come to this topic in the later article.

Reserved Keywords

There are some other words we cannot directly use as variable, these words are so called Python reserved keywords, as Python uses these words to recognize the structure of the program.

Below are all the keywords reserved by Python3, and it is not allowed to use them directly as variable name.

False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield

For Python beginners, if you use some IDE like PyCharm or Jupyter Notebook, these keywords will be automatically highlighted in different color, so you don’t worry about you mistakenly used them as variable name.

Python variables and keywords

Besides these reserved keywords, there are a few more words you shall try to avoid using them when defining your variable. For instance the below:

str
int
float
list
dict
set
tuple
bytes

These are the Python built-in data types which will be covered in the next tutorial. And there won’t be any error prompted immediately when you assign a value to them, but you will face some issues when you want to call the default behavior of the built-in data type later. Below is an example:

python built-in data type

The str() will throw error if you assigned “Test” to it, and it only works again if you delete the “str” as a variable. Hence the best practice is not to use these words as variable name in your code to prevent some unexpected errors and confusions.

 

How to close Windows process with python

When automating some tasks in Windows OS, you may wonder how to automatically close Windows process if you do not have the direct control of the running application or when the application is just running for too long time. In this article, I will be sharing with you how to close the Windows process with some python library, to be more specific, the pywin32 library.

Prerequisites

You will need to install the pywin32 library if you have not yet installed:

pip install pywin32

Find the process name from Windows Task Manager

You will need to first find out the application name which you intend to close, the application name can be found from the Windows task manager. E.g. If you expand the “Windows Command Processor” process, you can see the running process is “cmd.exe”.

python close Windows process

Let’s get started with the code!

Import the below modules that we will be using later:

from win32com.client import GetObject
from datetime import datetime

import os

And we need to get the WMI (Windows Management Instrumentation) service via the below code, where we can further access the window processes. For more information about WMI, please check this.

WMI = GetObject('winmgmts:')

Next, we will use the WMI SQL query to get the processes from the Win32_Process table by passing in the application name. Remember we have already found the application name earlier from the task manager.

 

for p in WMI.ExecQuery('select * from Win32_Process where Name="cmd.exe"'):
    #the date format is something like this 20200613144903.166769+480
    create_dt, *_ = p.CreationDate.split('.')
    diff = datetime.now() - datetime.strptime(create_dt,'%Y%m%d%H%M%S')

There are other properties such as Description, Status, Executable Path, etc. You can check the full list of the process properties from this win32-process documentation. Here we want to base on the creation date to calculate how much time the application has been running to determine if we want to kill it.

Assuming we need to close windows process after it is running for 5 minutes.

    if diff.seconds/60 > 5:		
        print("Terminating PID:", p.ProcessId)
	os.system("taskkill /pid "+str(p.ProcessId))

With this taskkill command, we will be able to terminate all the threads under this Windows process peacefully.

Conclusion

The pywin32 is super powerful python library especially when dealing with the Windows applications. You can use it to read & save attachments from outlook, send emails via outlookopen excel files and some more. Do have a check on these articles.

As per always, welcome any comments or questions.