
Pandas tricks – pass multiple columns to lambda
Pandas is one of the most powerful tool for analyzing and manipulating data. In this article, I will be sharing with you the solutions for a very common issues you might have been facing with pandas when dealing with your data – how to pass multiple columns to lambda or self-defined functions.
Prerequisite
You will have to install pandas on your working environment:
pip install pandas
When dealing with data, you will always have the scenario that you want to calculate something based on the value of a few columns, and you may need to use lambda or self-defined function to write the calculation logic, but how to pass multiple columns to lambda function as parameters?
Let me use some real world example, so that easier for you to understand the issue that I am talking about. Below table shows partial of the e-com delivery charges offered by some company, so the delivery charges are determined by package size (H+L+W), package weight and the delivery mode you are choosing.
Size (cm/kg) | 3 hours express | Next Day Delivery | Same Day Delivery |
<60 CM (H+L+W) & MAX 1KG | 12 | 8 | 10 |
<80 CM (H+L+W) & MAX 5KG | 15 | 9 | 11 |
<100 CM (H+L+W) & MAX 8KG | 17 | 11 | 13 |
<120 CM (H+L+W) & MAX 10KG | 19 | 14 | 16 |
And assuming we have the below order data, and we want to simulate the delivery charges. Let’s create the data in a pandas dataframe.
import pandas as pd df = pd.DataFrame({ "Order#" : ["1", "2", "3", "4"], "Weight" : [5.0, 2.1, 8.1, 7.5], "Package Size" : [80, 45, 110, 90], "Delivery Mode": ["Same Day", "Next Day", "Express", "Next Day"]})
If you view dataframe from Jupyter Notebook (you can sign up here to use it for free), you shall be able to see the data as per below.
Let’s also implement a calculate_rate function where we need to pass in the weight, package size, and delivery mode in order to calculate the delivery charges:
def calculate_rate(weight, package_size, delivery_mode): #set the charges as $20 since we do not have the complete rate card charges = 20 if weight <=1 and package_size <60: if delivery_mode == "Express": charges = 13 elif delivery_mode == "Next Day": charges = 8 else: charges = 10 elif weight <=5 and package_size <80: if delivery_mode == "Express": charges = 15 elif delivery_mode == "Next Day": charges = 9 else: charges = 11 elif weight <=8 and package_size <100: if delivery_mode == "Express": charges = 17 elif delivery_mode == "Next Day": charges = 11 else: charges = 13 return charges
Pass multiple columns to lambda
Here comes to the most important part. You probably already know data frame has the apply function where you can apply the lambda function to the selected dataframe. We will also use the apply function, and we have a few ways to pass the columns to our calculate_rate function.
Option 1
We can select the columns that involved in our calculation as a subset of the original data frame, and use the apply function to it.
And in the apply function, we have the parameter axis=1 to indicate that the x in the lambda represents a row, so we can unpack the x with *x and pass it to calculate_rate.
df["Delivery Charges"] = df[["Weight", "Package Size", "Delivery Mode"]].apply(lambda x : calculate_rate(*x), axis=1)
If we check the df again in Jupyter Notebook, you should see the new column “Delivery Charges” with the figures calculated based on the logic we defined in calculate_rate function.
Option 2:
If you do not want to get a subset of the data frame and then apply the lambda, you can also directly use the apply function to the original data frame. In this case, you will need to select the columns before passing to the calculate_rate function. Same as above, we will need to specify the axis=1 to indicate it’s applying to each row.
df["Delivery Charges"] = df.apply(lambda x : calculate_rate(x["Weight"], x["Package Size"], x["Delivery Mode"]), axis=1)
This will produce the same result as option 1. And you can also use x.Weight instead of x[“Weight”] when passing in the parameter.
Conclusion
The two options we discussed to pass multiple columns to lambda are basically the same, and it’s either applying to the subset or the original data frame. I have not yet tested with a large set of data, so there might be some differences in terms of the performance, you may need to take a note if you are dealing with a lot of data.
You may also interested to read some other articles related to pandas.
This site really has all the information I wanted concerning this subject
and didn’t know who to ask.
Thanks for your comment!
Hi there! This article couldn’t be written much better!
Looking at this post reminds me of my previous
roommate! He constantly kept preaching about this. I most certainly will send this information to him.
Fairly certain he will have a very good read.
Thanks for sharing!
Thanks for your comment!
I am not sure where you are getting your info, but good topic.
I needs to spend some time learning more or understanding
more. Thanks for wonderful info I was looking for this info for my mission.
Thanks for your comments, it’s all from the practice -:)
I searched a lot, but didn’t quite know how to ask the question this article answered. I knew there had to be a clever way to work on multiple columns. Excellent!
Thanks for your reading 🙂
Another way to do it:
import numpy as np
decorate the function with following line:
@np.vectorize
then call the function:
calculate_rate(df.Weight, df[‘Package Size’], df[‘Delivery Mode’])
Hi Jianhong,
Brilliant! Your suggestion looks great and it shall return the calculated charges in an array. Let me check if any difference in terms of the performance between the np.vectorize and the df.apply.