Pandas tricks – pass multiple columns to lambda

Pandas is one of the most powerful tool for analyzing and manipulating data. In this article, I will be sharing with you the solutions for a very common issues you might have been facing with pandas when dealing with your data – how to pass multiple columns to lambda or self-defined functions.

Prerequisite

You will have to install pandas on your working environment:

`pip install pandas`

When dealing with data, you will always have the scenario that you want to calculate something based on the value of a few columns, and you may need to use lambda or self-defined function to write the calculation logic, but how to pass multiple columns to lambda function as parameters?

Let me use some real world example, so that easier for you to understand the issue that I am talking about. Below table shows partial of the e-com delivery charges offered by some company, so the delivery charges are determined by package size (H+L+W), package weight and the delivery mode you are choosing.

 Size (cm/kg) 3 hours express Next Day Delivery Same Day Delivery <60 CM (H+L+W) & MAX 1KG 12 8 10 <80 CM (H+L+W) & MAX 5KG 15 9 11 <100 CM (H+L+W) & MAX 8KG 17 11 13 <120 CM (H+L+W) & MAX 10KG 19 14 16

And assuming we have the below order data, and we want to simulate the delivery charges. Let’s create the data in a pandas dataframe.

```import pandas as pd

df = pd.DataFrame({
"Order#" : ["1", "2", "3", "4"],
"Weight" : [5.0, 2.1, 8.1, 7.5],
"Package Size" : [80, 45, 110, 90],
"Delivery Mode": ["Same Day", "Next Day", "Express", "Next Day"]})```

If you view dataframe from Jupyter Notebook (you can sign up here to use it for free), you shall be able to see the data as per below.

Let’s also implement a calculate_rate function where we need to pass in the weight, package size, and delivery mode in order to calculate the delivery charges:

```def calculate_rate(weight, package_size, delivery_mode):
#set the charges as \$20 since we do not have the complete rate card
charges = 20
if weight <=1 and package_size <60:
if delivery_mode == "Express":
charges = 13
elif delivery_mode == "Next Day":
charges = 8
else:
charges = 10
elif weight <=5 and package_size <80:
if delivery_mode == "Express":
charges = 15
elif delivery_mode == "Next Day":
charges = 9
else:
charges = 11
elif weight <=8 and package_size <100:
if delivery_mode == "Express":
charges = 17
elif delivery_mode == "Next Day":
charges = 11
else:
charges = 13
return charges```

Pass multiple columns to lambda

Here comes to the most important part. You probably already know data frame has the apply function where you can apply the lambda function to the selected dataframe. We will also use the apply function, and we have a few ways to pass the columns to our calculate_rate function.

Option 1

We can select the columns that involved in our calculation as a subset of the original data frame, and use the apply function to it.

And in the apply function, we have the parameter axis=1 to indicate that the x in the lambda represents a row, so we can unpack the x with *x and pass it to calculate_rate.

`df["Delivery Charges"] = df[["Weight", "Package Size", "Delivery Mode"]].apply(lambda x : calculate_rate(*x), axis=1)`

If we check the df again in Jupyter Notebook, you should see the new column “Delivery Charges” with the figures calculated based on the logic we defined in calculate_rate function.

Option 2:

If you do not want to get a subset of the data frame and then apply the lambda, you can also directly use the apply function to the original data frame. In this case, you will need to select the columns before passing to the calculate_rate function. Same as above, we will need to specify the axis=1 to indicate it’s applying to each row.

`df["Delivery Charges"] = df.apply(lambda x : calculate_rate(x["Weight"], x["Package Size"], x["Delivery Mode"]), axis=1)`

This will produce the same result as option 1. And you can also use x.Weight instead of x[“Weight”] when passing in the parameter.

Conclusion

The two options we discussed to pass multiple columns to lambda are basically the same, and it’s either applying to the subset or the original data frame. I have not yet tested with a large set of data, so there might be some differences in terms of the performance, you may need to take a note if you are dealing with a lot of data.

You may also interested to read some other articles related to pandas.

You may also like

Read and Generate QR Code With 5 Lines of Python Code

Article Rating
Subscribe
Notify of
Inline Feedbacks

This site really has all the information I wanted concerning this subject
and didn’t know who to ask.

Looking at this post reminds me of my previous
Fairly certain he will have a very good read.
Thanks for sharing!

I am not sure where you are getting your info, but good topic.
I needs to spend some time learning more or understanding
more. Thanks for wonderful info I was looking for this info for my mission.

I searched a lot, but didn’t quite know how to ask the question this article answered. I knew there had to be a clever way to work on multiple columns. Excellent!

Another way to do it:

import numpy as np

decorate the function with following line:
@np.vectorize

then call the function:
calculate_rate(df.Weight, df[‘Package Size’], df[‘Delivery Mode’])

10
0