Python

combine data in pandas with merge vs join

Pandas Tricks – Combine Data in Different Ways

Introduction

If you have used pandas for your data analysis work, you may already get some idea on how powerful and flexible it is in terms of data processing. Many times there are more than one way to solve your problem, and choosing the best approach become another tough decision. For instance, in one of my previous article, I tried to summarize the 20 ways to filter records in pandas which definitely is not a complete list for all the possible solutions. In this article, I will be discussing about the different ways to merge/combine data in pandas and when you shall use them since combining data probably is one of the necessary step you shall perform before starting your data analysis.

Prerequisites

If you have not yet installed pandas, you may use the below command to install it from PyPI:

pip install pandas

And import the module at the beginning of your code:

import pandas as pd

Let’s dive into the code examples.

Combine Data with Append vs Concat

Imagine you have below two data frames from different sources, now you would like to merge them into one data frame.

df1 = pd.DataFrame({"ID" : [1, 2, 3, 4, 5], 
"Name" : ["Aaron", "Jimmy", "Zoe", "Jill", "Jenny"]})
df2 = pd.DataFrame({"ID": [6], "Name" : ["Kelly"]})

The most straightforward way would be using the append method from the pandas DataFrame object:

df1.append(df2, ignore_index=True)

The append method allows to add rows to the end of the current data frame, and with the ignore_index parameter as True, the resulting axis will be relabeled starting from 0.

You would see the output as per below:

combine data in pandas with merge vs join

Alternatively, you can use the pandas concat method which is self-explanatory based on its name. It provides a few more parameters to manipulate the resulting data frame such as specifying the axis for the concatenation to be done as well as the join logic for either union or intersection operation.

You can use the below to generate the same output as previously:

pd.concat([df1, df2], ignore_index=True)

And if you would like to retain a reference to the sources in your result, you can use the keys as per below:

pd.concat([df1, df2], keys=["src_1", "src_2"])

This would return a multi-index data frame where you can easily refer back to the data by source (e.g. df.loc[“src_1”]).

combine data in pandas with merge vs join

Adding new data frame as columns can be also done with axis = 1, for instance:

df3 = pd.DataFrame({"Age" : [12, 13, 13, 12, 13]})
pd.concat([df1, df3], axis=1)

The data frame has been added as one column to the caller:

combine data in pandas with merge vs join

As concat method accepts a list of data frames, you can combine multiple data frames at one time, which would be much faster than using append to do one by one.

Merge Data with Join vs Merge

Beside appending rows or columns based on axis, sometimes you may need more sophisticated operations similar to the left/right join in a rational database. For such scenarios, you shall make use of the pandas merge or join method.

For the previous example to append df2 to df1, you can achieve it with merge as well:

df1.merge(df2, how="outer")

Output as following:

combine data in pandas with merge vs join

It would be more tedious if you want to achieve the same via join since it can only join the data frame based on index, so you will have to set the index to the correct columns you would like to use as key. Below is how you can do it via join:

df1.join(df2.set_index(["ID", "Name"]), 
        on=["ID", "Name"], how="outer").reset_index(drop=True)

Assuming you have the below student’s score for each subject, and you want to merge the student information (df1) and the below based on the “Name” column:

df4 = pd.DataFrame({"ID" : [1001, 1002, 1003, 1002, 1001],
                    "Subject": ["Science", "Math", "English", "Math", "Science"], 
                    "Name": ["Aaron", "Jimmy", "Jimmy", "Zoe", "Jenny"], 
                    "Score" : ["A", "B", "C", "B", "B"]})

With merge function, you can specify the joining logic as left join on “Name” column as per below:

df1.merge(df4, on="Name", how="left")

Pandas will automatically add suffix whenever there are columns with duplicate names (e.g. “ID” in df1 and df4) from the two data frames, below is the output you may see:

combine data in pandas with merge vs join

To generate the same output via join, you can use below code which you need to pre-set the index for df4 and specify the suffix for left and right data frame:

df1.join(df4.set_index("Name"), on="Name", lsuffix="_x", rsuffix="_y")

Of course, if you would like to perform the right join for the above two data frames, you can do as per below:

df1.merge(df4, on="Name", how="right")
# or
df1.join(df4.set_index("Name"), on="Name", how="right", lsuffix="_x", rsuffix="_y")

Output as per below:

combine data in pandas with merge vs join

Merge DataFrame with Duplicate Keys

When merging multiple DataFrame objects, you may occasionally encounter the scenario that there are duplicate values for the columns you want to use as keys for joining. For instance, you may have below records if one subject has more than one lecturers:

df5 = pd.DataFrame({"Subject": ["Science", "Science", "Math", "Math", "English"], 
                    "Lecturer": ["Michael", "John", "Tim", "Robert", "Alex"]})

When you merge this information with student score based on the subject with merge or join method:

df4.merge(df5, on="Subject", how="left")
#or 
df4.join(df5.set_index("Subject"), on="Subject", how="left")

You would see the below output with M x N records due to the duplicate key in the df5:

combine data in pandas with merge vs join

If your objective is to perform something similar to excel vlookup to return the first matched value, then you can use the drop_duplicates method to remove the duplicate records before joining. E.g.:

df4.merge(df5.drop_duplicates("Subject"), on="Subject", how="left")

This would allow you to combine the two data frames with the first matched record from df5:

combine data in pandas with merge vs join

And in case you do not want to lose the information from the lecturer data frame, you will need to perform some sort of data aggregation before joining, e.g.:

df4.merge(df5.groupby("Subject").agg({"Lecturer" : lambda x: ','.join(x)}),
 on="Subject", how="left")

With this aggregation on the lecturer values, you would be able to see the below output:

combine data in pandas with merge vs join

Based on the above examples, you may find that merge and join are interchangeable in most of the cases, and you may have to type a bit more when using join method due to the different default arguments used. Since it always works on the index, you will have to preset the index on the key columns before joining.

Conclusion

In this article, we have reviewed through a few methods pandas offered for combining data frames with some sample code. To wrap up, the append and concat are usually used for merging two or more data frames based on the row or column index, and concat has better performance over append when you have multiple data frames to be worked on. If you need some high performance in-memory join operations like SQL joining for rational database, you will need to use merge or join method which can be interchangeable in most of the scenario. In addition, if the data frame you worked on does not have a index on the joining row/column, using merge over join would probably save your some typing.

 

python logging, queuehandler

8 Tips for Using Python Logging

Python Logging is a built-in module typically used for capturing runtime events for diagnostic purposes. If you are using Python to build a tool for yourself or just for proof of concept, you may not necessarily need any logging. But if the code is to be used in production environment, it would be a better idea to add proper logging information to make your life easier later when you need to troubleshoot any production issues.

In this article, I will be sharing with you some tips for using Python logging module in your production code.

Basic Python Logging With basicConfig

When you have a very simple script and need a very basic logging mechanism to capture some runtime information, Python basicConfig is the best fit where everything comes with a default value.

Below is the code to create a root logger to write log to the console:

import logging
logging.basicConfig()
logging.warning('Authentication failure: %s', 'username or password incorrect.')

When running the above code, you shall see the below printed out in your console:

WARNING:root:Authentication failure: username or password incorrect.

It indicates the severity of the log record, the logger hierarchy as well as the actual log message. Usually you may also want to capture the timestamp of the log event. To specify what are the details (refer to the available attributes here) you want to add in your log record, you can provide your own logging format when initializing the root logger.

For instance the below code:

FORMAT = '[{asctime}] [{name}] [{levelname}] - {message}'
logging.basicConfig(format=FORMAT, style="{")

logging.warning('Authentication failure: %s', 'username or password incorrect.')

With the above custom formatting, you shall see the output as per below:

[2021-03-25 21:37:09,158] [root] [WARNING] - Authentication failure: username or password incorrect.

To capture the log into a file rather than printing it out in the console (sys.stderr), you can achieve it with a minor change to specify the filename parameter for the basicConfig method, you would then be able to write logs to the file you’ve specified.

logging.basicConfig(filename="a.log", format=FORMAT, style="{")

When filename parameter is not empty, python logging will create a FileHandler object to handle the I/O related operations.

Create Logger at Module Level

The Python logging module has a logger hierarchy for ancestors and descendant to share the configurations or passing around the messages. In the above examples, the root logger is initialized when calling the basicConfig.

A good practice is to create module-level logger so that you have the flexibility to handle the log messages differently for each module and track which module generated the logs.

Below is an example to create a module-level logger with getLogger method:

logger = logging.getLogger('module.x')
logger.setLevel(logging.DEBUG)

You can create also your own formatter and specify the handler you need for your module-level logger. Below is an example to create a StreamHandler with your own logging format:

FORMAT = '[{asctime}] [{name}] [{levelname}] - {message}'
formatter = logging.Formatter(FORMAT, style="{")

ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
ch.setFormatter(formatter)
logger.addHandler(ch)

To verify if your logger is working as expected, you can log some messages in different levels as per below:

def log_func(msg):
    logger.debug("Debugging now: %s", msg)
    logger.info("For your info only : %s", msg)
    logger.warning("This is a warning : %s", msg)
    logger.error("Error is happening : %s", msg)
    logger.critical("App is going to crash : %s", msg)

log_func("yolo")

You may see the output similar to below:

[2021-03-25 15:17:12,638] [module.x] [INFO] - For your info only : yolo
[2021-03-25 15:17:12,639] [module.x] [WARNING] - This is a warning : yolo
[2021-03-25 15:17:12,642] [module.x] [ERROR] - Error is happening : yolo
[2021-03-25 15:17:12,642] [module.x] [CRITICAL] - App is going to crash : yolo

Similar to Python package hierarchy, by specifying the logger name as “module.x.y”, a child logger will be created under “module.x”.

Pass Extra Information to LogRecord

When calling logger to log some event, a LogRecord instance is created automatically to pass to log handler. If you need some additional information to be added into the LogRecord, you can make use of the extra parameter together with the formatter.

d = {'clientip': '192.168.0.1'}
FORMAT = '[{clientip}] [{asctime}] [{name}] [{levelname}] - {message}'

logging.basicConfig(format=FORMAT, style="{")
logging.warning('Protocol problem: %s', 'connection reset', extra=d)

You shall see the output as per below:

[192.168.0.1] [2021-03-25 15:36:31,449] [root] [WARNING] - Authentication failure: password incorrect.

Create Multiple Log Files with RotatingFileHandler

There are cases where you need to switch to a new log file when the file size reaching to certain limit to avoid the particular log file growing too big. The RotatingFileHandler from the logging.handlers module provides a easy way to limit the max size of the file and number of backup files to be created.

Below is an example to set file size at 2KB and create max 5 backup files for logging:

from logging.handlers import RotatingFileHandler

logger = logging.getLogger("simple app")
logger.setLevel(logging.DEBUG)
fh = RotatingFileHandler(filename="app.log", maxBytes=2000, backupCount=5)
fh.setLevel(logging.DEBUG)
formatter = logging.Formatter('[{asctime}] - [{name}] - [{levelname}] - {message}', style="{")
fh.setFormatter(formatter)

logger.addHandler(fh)

When the app.log file grows to 2KB, the new logs will be automatically directed to a new file, and the old file will be renamed to app.log.X (until X reaches to 5).  If you’ve set backupCount as 1, the log file will be growing without changing to a new file.

Similarly, you can use TimedRotatingFileHandler to rotate the log files at certain time intervals such as every X hours/days etc.

Send Log to Email via SMTPHandler

If you would like to send log events via email to notify the recipients e.g. to notify the system administrator or support team on some critical runtime errors, you can achieve it with SMTPHandler.  Below is an example to use gmail to send out emails. (you will need to replace the email address and password to your own gmail account)

from logging.handlers import SMTPHandler

smtph = SMTPHandler(mailhost=("smtp.gmail.com", 587), 
fromaddr="from@gmail.com", 
toaddrs="to@gmail.com", 
subject="Test Error from Python Script", 
credentials=("from@gmail.com", "password"), 
secure=())
smtph.setLevel(logging.ERROR)
logger.addHandler(smtph)

logger.critical("App is going to crash")

Note that I have passed an empty tuple for the secure parameter in order for SMTPHandler to use TLS encryption for the SMTP connection.

(You will also need to enable the less secure apps for your google account in order to go through the authentication)

Use Queue to Process Log Asynchronously with QueueHandler

The previous example demonstrated how to use the SMTPHandler to send out email. If you have tried the code yourself, you may notice the sluggish behavior especially at the last line of the code. This is because sending email will block the main thread until it finishes, which may not be something you are expecting when you need an immediate response in your web application.

To make sure the performance-critical thread is not blocked by any of these slow operations, you shall consider to use queue to process them in separate threads. The QueueHandler and QueueListener are designed for this purpose. Below is an example to show you how to use queue handler and listener:

from logging.handlers import SMTPHandler, QueueHandler, QueueListener

log_queue = queue.Queue(-1)
queue_handler = QueueHandler(log_queue)

#smtph is the SMTPHandler created from previous example
listener = QueueListener(log_queue, smtph, respect_handler_level=True)
logger.addHandler(queue_handler)

listener.start()

#a log function created in previous example
log_func("Bingoooo")

listener.stop()

Take note that the respect_handler_level has been specified as True in order for the listener to filter the messages based on the severity level set in the handlers. E.g. the handler smtph has been set to handle messages with severity level above error, when respect_handler_level is True, only messages with severity level above error will be passed into the queue. Otherwise if respect_handler_level is False, all the messages regardless of the severity levels are passed into queue for processing.

Use dictConfig for Log Configuration

Setting up logger configurations line by line via code can be tedious sometimes especially when you have more than one handlers or formatters to be set up. The logging module also provides ways to load configurations from config files into dictionary object, and then load the configurations by calling the dictConfig method.

Below is an example for using dictionary to create a logger with StreamHandler and SMTPHandler:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'standard': {
            'format': '%(levelname)s %(asctime)s %(module)s %(message)s'
        }
    },
    'handlers': {
        'console':{
            'level':'DEBUG',
            'class':'logging.StreamHandler',
            'formatter': 'standard'
        },
        'email': {
            'level': 'ERROR',
            'formatter': 'standard',
            'class': 'logging.handlers.SMTPHandler',
            'mailhost': ("smtp.gmail.com", 587),
            'fromaddr': "from@gmail.com", 
            'toaddrs': "to@gmail.com", 
            'subject': "Test Error from Python Script",
            'credentials': ("from@gmail.com", "password"),
            'secure': ()            
        }
    },
    'loggers': {
        'app_log': {
            'handlers': ['console', 'email'],
            'level': 'INFO',
        }
    }
}

logging.config.dictConfig(LOGGING)
log = logging.getLogger("app_log")

In your project, you may put your configurations into a file in INI/JSON/YAML format, and load it as a dictionary to pass into the dictConfig method, so that you do not need to modify any source code whenever you want to change any logging configurations.

Suppress Error in Production Environment

Last but not the least, you shall consider to suppress any exceptions raised during the logging rather than letting it crash your entire application in a production environment. For instance, if your logs are written to a file and the disk is full or someone accidently updated the file permission to read-only, you would not wish such exceptions to bring down the entire application.

To prevent such issue, you can set raiseExceptions to False, so that any logging exceptions would not impact your application functionality.

logging.raiseExceptions = False

With above set to False, you shall not see any error even when there is some TypeError like below:

logging.info("For your info only : %s %s", msg)
# No error message will be showing and nothing will be logged

But make sure to set it as True (the default value) in your test environment, so that you can identify these type of errors before deploying your code into production. (refer to another topic for python suppressing stdout and stderror)

Conclusion

In this article, we have summarized 8 tips for using the Python logging module with some sample codes which hopefully will give you some ideas on how to use this module when you are developing some code for production use.

A lot of details are not covered such as the list of attributes for Formatter, LogRecord, Handler and Filter etc., you may need to go through the official documents if certain details you need is not discussed here in this article.

common python mistakes for beginners

8 Common Python Mistakes You Shall Avoid

Introduction

Python is a very powerful programming language with easily understandable syntax which allows you to learn by yourself even you are not coming from a computer science background. Through out the learning journey, you may still make lots mistakes due to the lack of understanding on certain concepts. Learning how to fix these mistakes will further enhance your understanding on the fundamentals as well as the programming skills.

In this article, I will be summarizing a few common Python mistakes that many people may have encountered when they started the learning journey and how they can be fixed or avoided.

Reload Modules after Modification

Have you ever wasted hours to debug and fix an issue and eventually realized you were not debugging on your modified source code? This usually happens to the beginners as they did not realize the entire module was only loaded into memory once when import statement was executed. So if you are modifying some code in separate module and import to your current code, you will have to reload the module to reflect the latest changes.

To reload a module, you can use the reload function from the importlib module:

from importlib import reload

# some module which you have made changes
import externallib

reload(externallib)

Naming Conflict for Global and Local Variables

Imagine you have defined a global variable named app_config, and you would like to use it inside the init_config function as per below:

app_config = "app.ini"

def init_config():
    app_config = app_config or "default.ini"
    print(app_config)

You may expect to print out “app.ini” since it’s already defined globally, but surprisedly you would get the “UnboundLocalError” exception due to the variable app_config is referenced before assignment. If you comment out the assignment statement and just print out the variable, you would see the value printed out correctly. So what is going on here?

The above exception is due to Python tries to create a variable in local scope whenever there is an assignment expression, and since the local variable and global variable have the same name, the global variable being shadowed in local scope. Thus Python throws an error saying your local variable app_config is used before it’s initialized.

To solve this naming conflict, you shall use different name for your global variable and local variables to avoid any confusion, e.g.:

app_config = "app.ini"

def init_config():
    config = app_config or "default.ini"
    print(config)

Checking Falsy Values

Examining true or false of a variable in if or while statement sometimes can also go wrong. It’s common for Python beginners to mix None value and other falsy values and eventually write some buggy code. E.g.:  assuming you want to check when price is not None and below 5, trigger some selling alert:

def selling_alert(price):
    if price and price < 5:
        print("selling signal!!!")

Everything looks fine, but when you test with price = 0, you would not get any alert:

selling_alert(0)
# Nothing has been printed out

This is due to both None and 0 are evaluated as False by Python, so the printing statement would be skipped although price < 5 is true.

In python, empty sequence objects such as “” (empty string), list, set, dict, tuple etc are all evaluated as False, and also zero in any numeric format like 0 and 0.0. So to avoid such issue, you shall be very clear whether your logic need to differentiate the None and other False values and then split the logic if necessary, e.g.:

if price is None:
   print("invalid input")
elif price < 5:
   print("selling signal!!!")

Default Value and Variable Binding

Default value can be used when you want to make your function parameter optional but still flexible to change. Imagine you need to implement a logging function with an event_time parameter, which you would like to give a default value as current timestamp when it is not given. You can happily write some code as per below:

from datetime import datetime

def log_event_time(event, event_time=datetime.now()):
    print(f"log this event - {event} at {event_time}")

And you would expect as long as the event_time is not provided during log_event_time function call, it shall log an event with the timestamp when the function is invoked. But if you test it with below:

log_event_time("check-in")

# log this event - check-in at 2021-02-21 14:00:56.046938

log_event_time("check-out")

# log this event - check-out at 2021-02-21 14:00:56.046938

You shall see that all the events were logged with same timestamp. So why the default value for event_time did not work?

To answer this question, you shall know the variable binding happens during the function definition time. For the above example, the default value of the event_time was assigned when the function is initially defined. And the same value will be used each time when the function is called.

To fix the issue, you can assign a None as default value and check to overwrite the event_time inside your function call when it is None. For instance:

def log_event_time(event, event_time=None):
    event_time = event_time or datetime.now()
    print(f"log this event - {event} at {event_time}")

Similar variable binding mistakes can happens when you implement your lambda functions. For your further reading, you may check my previous post why your lambda function does not work for more examples.

Default Value for Mutable Objects

Another mistake Python beginners trend to make is to set a default value for a mutable function parameter. For instance, the below user_list parameter in the add_white_list function:

def add_white_list(user, user_list=[]):
    user_list.append(user)
    return user_list

You may expect when user_list is not given, a empty list will be created and then new user will be added into this list and return back. It is working as expected for below:

my_list = add_white_list('Jack')

# ['Jack']

my_list = add_white_list('Jill', my_list)

#['Jack', 'Jill']

But when you want to start with a empty list again, you would see some unexpected result:

my_new_list = add_white_list('Joe')
# ['Jack', 'Jill', 'Joe']

From the previous variable binding example, we know that the default value for user_list is created only once at the function definition time. And since list is mutable, the changes made to the list object will be referred by the subsequent function calls.

To solve this problem, we shall give None as the default value for user_list and use a local variable to create a new list when user_list is not given during the call. e.g.:

def add_white_list(user, user_list=None):
    if user_list is None:
        user_list = []
    user_list.append(user)
    return user_list

Someone may get confused that datetime.now() shall create a Python class instance, which supposed to be mutable also. If you checked Python documentation, you would see the implementation of datetime is actually immutable.

Misunderstanding of Python Built-in Functions

Python has a lot of powerful built-in functions and some of them look similar by names, and if you do not spend some time to read through the documentation, you may end up using them in the wrong way.

For instance, you know built-in sorted function or list sort function both can be used to sort sequence object. But occasionally, you may make below mistake:

random_ints = [80, 53, 7, 92, 30, 31, 42, 10, 42, 18]

# The sorting is done in-place, and function returns None
even_integers_first = random_ints.sort(key=lambda x: x%2)

# Sorting is not done in-place, function returns a new list
sorted(random_ints)

Similarly for reverse and reversed function:

# The reversing is done in-place, and function returns None
random_ints = random_ints.reverse()
# reversing is not done in-place, function returns a new generator
reversed(random_ints)

And for list append and extend function:

crypto = ["BTC", "ETH"]

# the new list will be added as 1 element to crypto list
crypto.append(["XRP", "BNB"])

print(crypto)
#['BTC', 'ETH', ['XRP', 'BNB']]

# the new list will be flattened when adding to crypto list
crypto.extend(["UNI"])

print(crypto)
# ['BTC', 'ETH', ['XRP', 'BNB'], 'UNI']

Modifying Elements While Iterating

When iterating a sequence object, you may want to filter out some elements based on certain conditions.

For instance, if you want to iterate below list of integers and remove any elements if it is below 5. You probably would write the below code:

a = [1, 2, 3, 4, 5, 6, 2]

for b in a:
    if b < 5:
        a.remove(b)

But when checking the output of the list a, you would see the result is not as per you expected:

print(a)
# [4, 5, 6, 2]

This is because the for statement will evaluate the expression and create a generator for iterating the elements. Since we are deleting elements from the original list, it will also change the state of the generator, and then further cause the unexpected result. To fix this issue, you can make use of the list comprehension as per below if your filter condition is not complex:

[b for b in a if b >= 5]

Or if you wish, you can use the the filterfalse together with the lambda function:

from itertools import filterfalse
list(filterfalse(lambda x: x < 5, a))

Re-iterate An Exhausted Generator

Many Python learners started writing code without understanding the difference between generator and iterator. This would cause the error that re-iterating an exhausted generator. For instance the below generator, you can print out the values in a list:

some_gen = (i for i in range(10))
print(list(some_gen))
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

And sometimes you may forget you have already iterated the generator once, and when you try to execute the below:

for x in some_gen:
    print(x)

You would not be able to see anything printed out.

To fix this issue, you shall save your result into a list first if you’re not dealing with a lot of data, or you can use the tee function from itertools module to create multiple copies of the generator so that you can iterate multiple times:

from itertools import tee

# create 3 copies of generators from the original iterable
x, y, z = tee(some_gen, n=3)

Conclusion

In this article, we have reviewed through some common Python mistakes that you may encounter when you start writing Python codes. There are definitely more mistakes you probably would make if you simply jump into the coding without understanding of the fundamentals. But as the old saying, no pain no gain, ultimately you will get there when you drill down all the mistakes and clear all the roadblocks.

Python one-liners with list comprehension and ternary operation

15 Most Powerful Python One-liners You Can’t Skip

Introduction

One-liner in Python refers to a short code snippet that achieves some powerful operations. It’s popular and widely used in Python community as it makes the code more concise and easier to understand. In this article, I will be sharing some most commonly used Python one-liners that would definitely speed up your coding without compromising any clarity.

Let’s start from the basis.

Ternary operations

Ternary operation allows you to evaluate a value based on the condition being true or false. Instead of writing a few lines of if/else statements, you can simply do it with one line of code:

x = 1
y = 2

result = 1 if x > 0 and y > x else -1
print(result)
# 1

#re-assign x to 6 if it is evaluated as False
x = x or 6

Assign values for multiple variables

You can assign values for multiple variables simultaneously as per below. (You may want to check this article to understand what is going on under the hood)

key, value = "user", "password"

print(key, value)
#('user', 'password')

Swap variables

To swap the values of the variables, simply perform the below without having a temp variable which is usually required by other programming languages like Java or C.

key, value = value, key

print(key, value) 
#('password', 'user')

Swap elements in a list

Imagine you have a list of users as per below, and you would like to swap the first element with last element:

users = ["admin", "anonymous1", "anonymous2"]

Since list is mutable, you can re-assign the values of first and last elements by swapping their sequence as per below:

users[0], users[2] = users[2], users[0]
# or users[0], users[-1] = users[-1], users[0]

print(users)
#['anonymous2', 'anonymous1', 'admin']

Further more, if the requirement is to swap the elements at the odd and even positions in a list, e.g. in the below list:

numbers = list(range(10))
#[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We would like to swap the elements of the 2nd and 1st, 4th and 3rd, and so on. It can be achieved by performing the below list slicing with assignment operation:

numbers[::2], numbers[1::2] = numbers[1::2], numbers[0::2]

print(numbers)
#[1, 0, 3, 2, 5, 4, 7, 6, 9, 8]

Replace elements in a list

To further expend the above example, if we want to replace the elements on every odd/even position in a list, for instance to 0, we can do the re-assignment with below:

numbers[1::2] = [0]*len(numbers[1::2])

print(numbers)
#[0, 0, 2, 0, 4, 0, 6, 0, 8, 0]

Of course there is an alternative way with list comprehension, we shall touch on it later.

Generate list with list comprehension

By using list comprehension, you can easily generate new a list with certain filtering conditions from the current sequence object. For instance, the below will generate a list of even numbers between 1 to 20:

even_nums = [i for i in range(1, 20) if i%2 == 0]

print(even_nums)
#[2, 4, 6, 8, 10, 12, 14, 16, 18]

Create sub list from a list

Similarly, you can get a sub list from the existing list with the list comprehension as per below:

[i for i in even_nums if i <5]
# 2, 4

Manipulating elements in the list

With list comprehension, you can also transform your list of elements into another format. For instance, to convert the integers to alphabets:

alphabets = [chr(65+i) for i in even_nums]
# ['C', 'E', 'G', 'I', 'K', 'M', 'O', 'Q', 'S']

Or convert the upper case into lower case:

[i.lower() for i in alphabets]
#['c', 'e', 'g', 'i', 'k', 'm', 'o', 'q', 's']

And all the above can be done without list comprehension as well:

list(map(lambda x : chr(65+x), even_nums))
#['C', 'E', 'G', 'I', 'K', 'M', 'O', 'Q', 'S']

list(map(str.lower, alphabets))
#['c', 'e', 'g', 'i', 'k', 'm', 'o', 'q', 's']

Another real world example would be to use list comprehension to list out all the .ipynb files from current folder and its sub folders (excluding the checkpoint files):

import os

[f for d in os.walk(".") if not ".ipynb_checkpoints" in d[0]
             for f in d[2] if f.endswith(".ipynb")]

Flatten a list of sequences

If you have a list of sequence objects as per below, and you would like to flatten them into 1 dimensional:

a = [[1,2], [3,4], [5,6,7]]

You can use multiple for expressions in list comprehension to flatten it:

b = [y for x in a for y in x]

print(b)
#[1, 2, 3, 4, 5, 6, 7]

Alternatively, you can make use of the itertools module to get the same result:

import itertools

list(itertools.chain.from_iterable(a))

Ternary operation with list comprehension

In the previous ternary operation example, we have discussed how to replace the elements in the even position of a list. Here is the alternative using list comprehension in conjunction with a ternary expression:

numbers = list(range(10))
[y if i % 2 == 0 else 0 for i, y in enumerate(numbers)]
#[0, 0, 2, 0, 4, 0, 6, 0, 8, 0]

Generate a dictionary with dictionary comprehension

To derive a dictionary from a list, you can use dictionary comprehension as per below:

even_nums_dict = {chr(65+i):v for i, v in enumerate(even_nums)}

#{'A': 2, 'B': 4, 'C': 6, 'D': 8, 'E': 10, 'F': 12, 'G': 14, 'H': 16, 'I': 18}

Generate a set with set comprehension

Similar operation is also available for set data type when you want to derive elements from a list to set:

even_nums_set = {chr(65+i) for i in even_nums}
#{'C', 'E', 'G', 'I', 'K', 'M', 'O', 'Q', 'S'}

When using the built-in data type set, you shall expect that it only keeps the unique values. For instance, you can use set to remove duplicate values:

a = [1,2,2,4,6,7]
unique = set(a)

print(unique)
#{1, 2, 4, 6, 7}

More Python comprehension examples can be found here.

Read file into generator

Reading files can be done in one-liner as per below:

text = (line.strip() for line in open('response.html', 'r'))

Take note the parentheses are used in above generator expression rather than [], when [] is used, it returns a list.

One-liner with Python -c command

Sometimes you may want to run code snippets without entering into the Python interactive mode. You can execute the code with Python -c option in command line window. For instance, check the current Python version:

python -c "import sys; print(sys.version.split()[0])"

#3.7.2

Or check the value of the environment variable:

python -c "import os;print(os.getenv('PATH').split(';'))"

Conclusion

In this article, we have reviewed through some commonly used Python one-liners which would greatly improve your code readability and coding productivity. There are definitely more to be covered, and every Pythonista would have his/her own list of favorite one-liners. Sometimes you will also need to consider the code performance before you use it rather simply pursuing the conciseness of the code.

As the general rule of thumb, you shall not use/innovate something that confusing ,difficult to read or totally not benefiting either in readability or productivity.

plot route on Google Maps with Python Google Maps API, optimize route with Google Maps API, Singapore

Plot Route on Google Maps with Python

Introduction

You probably use Google Maps a lot in your daily life, such as locate a popular restaurant, check the distance between one place to another, or find the nearest driving route and shortest travelling time etc. If you have been thrown with a big set of location data where you need check and validate them from a map, and then find the optimal routes, you probably would think how this can be done programmatically in Python. In this article, we will be exploring how all these can be achieved with Python Google Maps APIs.

Prerequisites

To be able to use Google Maps APIs, you will need to create a Google Cloud Service account and set up a billing method. If you are a new signed up user, you would get $300 credit for you to try out all the various Google APIs. You can follow the official docs to set up a new project, and from where you would get an API key for using its services.

You will also need to install the GoogleMaps package in your working environment, below is the pip command to install the package:

pip install -U googlemaps

Now let’s import the package at the beginning of our code, and initialize the service with your API Key:

import googlemaps

gmaps = googlemaps.Client(key='YOUR API KEY')

If you did not get any error up to this step, you are all set to go. Let’s start to explore the various function calls you can do with this package.

Get Geolocation of the Addresses

Often you need to check a location by address, postal code or even the name of the store/company, this can be done via the geocode function which is similar to search a place in Google Maps from your browser. The function returns a list of possible locations with the detailed address info such as the formatted address, country, region, street, lat/lng etc.

Below are a few possible address info you can pass to this API call:

#short form of address, such as country + postal code
geocode_result = gmaps.geocode('singapore 018956')

#full address
geocode_result = gmaps.geocode("10 Bayfront Ave, Singapore 018956")

#a place name
geocode_result = gmaps.geocode("zhongshan park")

#Chinese characters
geocode_result = gmaps.geocode('滨海湾花园')

#place name/restaurant name
geocode_result = gmaps.geocode('jumbo seafood east coast')

print(geocode_result[0]["formatted_address"]) 
print(geocode_result[0]["geometry"]["location"]["lat"]) 
print(geocode_result[0]["geometry"]["location"]["lng"])

Depends how complete the information you have supplied to the function call, you may get some sample output as per below:

1206 ECP, #01-07/08 East Coast Seafood Centre, Singapore 449883
1.3051669,
103.930673

Reverse Geocoding

You can also use the latitude and longitude to check the address information, this is called reverse geocoding. For instance, the below will give you the human readable addresses for the given lat/lng:

reverse_geocode_result = gmaps.reverse_geocode((1.3550021,103.7084641))

print(reverse_geocode_result[0]["formatted_address"])
#'87 Farrer Dr, Singapore 259287'

You may get multiple address info with the same lat/lng as Google just return the list of addresses closest to this lat/lng.

Check the Distance Between Locations

To check the distance between multiple locations, you can use the Google distance matrix API, where you can supply multiple locations, and Google will return the travelling distance between each location as well as the travelling time. E.g.:

from datetime import datetime, timedelta

gmaps.distance_matrix(origins=geocode_result[0]['formatted_address'], 
                      destinations=reverse_geocode_result[0]["formatted_address"], 
                      departure_time=datetime.now() + timedelta(minutes=10))

In above example, we have provided the departure time as 10 minutes later from current time for Google to calculate the travelling time. The departure time cannot be any time in the past.

The return JSON object would include the travelling distance and time based on the current traffic condition (default transport mode is driving):

{'destination_addresses': ['87 Farrer Dr, Singapore 259287'],
 'origin_addresses': ['1206 ECP, #01-07/08 East Coast Seafood Centre, Singapore 449883'],
 'rows': [{'elements': [{'distance': {'text': '22.2 km', 'value': 22219},
     'duration': {'text': '24 mins', 'value': 1442},
     'duration_in_traffic': {'text': '22 mins', 'value': 1328},
     'status': 'OK'}]}],
 'status': 'OK'}

Get Directions Between Locations

One of the most frequent usage of Google Maps is to check the direction from one place to another. You can use directions API to get the route info as per below:

directions_result = gmaps.directions(geocode_result[0]['formatted_address'],
                                     reverse_geocode_result[0]["formatted_address"],
                                     mode="transit",
                                     arrival_time=datetime.now() + timedelta(minutes=0.5))

For this directions API, it returns you the detailed routing information based on what type of travelling mode you’ve chosen.

In the above example, we have specified the mode as “transit”, so Google Maps will suggest a route that you can take the public transport whenever possible to reach your final destination. It may not meet your arrival time if it is really infeasible anyway.

Below is the sample return with detailed routing directions:

[{'bounds': {'northeast': {'lat': 1.3229677, 'lng': 103.9314612},
   'southwest': {'lat': 1.2925606, 'lng': 103.8056495}},
  'copyrights': 'Map data ©2021 Google',
  'legs': [{'arrival_time': {'text': '9:16pm',
     'time_zone': 'Asia/Singapore',
     'value': 1611321373},
    'departure_time': {'text': '7:59pm',
     'time_zone': 'Asia/Singapore',
     'value': 1611316750},
    'distance': {'text': '18.0 km', 'value': 17992},
    'duration': {'text': '1 hour 17 mins', 'value': 4623},
    'end_address': '87 Farrer Dr, Singapore 259287',
    'end_location': {'lat': 1.3132547, 'lng': 103.8070619},
    'start_address': '1206 ECP, #01-07/08 East Coast Seafood Centre, Singapore 449883',

...

{'distance': {'text': '0.1 km', 'value': 106},
        'duration': {'text': '1 min', 'value': 76},
        'end_location': {'lat': 1.305934, 'lng': 103.9306822},
        'html_instructions': 'Turn <b>left</b>',
        'maneuver': 'turn-left',
        'polyline': {'points': 'q{}Fk_jyRM?KAIAE@a@DQBGDEBGDIF_@LQD'},
        'start_location': {'lat': 1.3050507, 'lng': 103.9309369},
        'travel_mode': 'WALKING'},
...
}]

You can also supply the waypoints parameter in order to route multiple locations between your origin and destination. For instance, if you want to go for a one day tour in Singapore, you can provide a list of the attractions and let Google to optimize the route for you with the optimize_waypoints = True parameter. The sample code as per below:

waypoints = ["Chinatown Buddha Tooth Relic Temple", 
"Sentosa Island, Singapore", 
"National Gallery Singapore", 
"Botanic Garden, Singapore",
"Boat Quay @ Bonham Street, Singapore 049782"]

results = gmaps.directions(origin = "Fort Canning Park, Singapore",
                                         destination = "Raffles Hotel, Singapore",                                     
                                         waypoints = waypoints,
                                         optimize_waypoints = True,
                                         departure_time=datetime.now() + timedelta(hours=1)

for i, leg in enumerate(results[0]["legs"]):
    print("Stop:" + str(i),
        leg["start_address"], 
        "==> ",
        leg["end_address"], 
        "distance: ",  
        leg["distance"]["value"], 
        "traveling Time: ",
        leg["duration"]["value"]
    )

To get a good result, you will need to make sure all the locations you’ve provided can be geocoded by Google. Below is the output:

plot route on Google Maps with Python Google Maps API

Plot Route on Google Maps

To visualize your route or location on a map, you can make use of the Maps Static API. It allows you to plot your locations as markers on the map and draw the path between each location.

To get started, we shall define the markers for our locations. We can specify the color, size and label attributes for each location in a “|” separated string. As the label only allows a single character from {A-Z, 0-9}, we will just use A-Z to indicate the sequence of each location. E.g.:

locations = ["Fort Canning Park, Singapore",
          "Chinatown Buddha Tooth Relic Temple", 
          "Sentosa Island, Singapore", 
          "National Gallery Singapore", 
          "Boat Quay @ Bonham Street, Singapore 049782",
          "Botanic Garden, Singapore",
          "Raffles Hotel, Singapore"]

markers = ["color:blue|size:mid|label:" + chr(65+i) + "|" 
                   + r for i, r in enumerate(locations)]

In the static_map function, you can specify the scale, size and zoom to define how many pixels of your output and whether you want to show the details up to the city or individual building on your map.

The format and maptype parameters are used to specify the output image format and what type of maps you want to use.

Lastly, we can specify the path parameter to connect the different locations together. Similar to how we define the markers, we can specify the attributes in a “|” separated string. Below is the sample code:

result_map = gmaps.static_map(
                 center=locations[0],
                 scale=2, 
                 zoom=12,
                 size=[640, 640], 
                 format="jpg", 
                 maptype="roadmap",
                 markers=markers,
                 path="color:0x0000ff|weight:2|" + "|".join(locations))

And if you save the return result into a .jpg file as per below:

with open(“driving_route_map.jpg”, “wb”) as img:
    for chunk in result_map:
        img.write(chunk)

You shall see something similar to the below:

plot route on Google Maps with Python Google Maps API

The routing sequence is based on the list of locations you’ve supplied, so you can probably use directions function to return a optimal route and then plot them with static_map.

A few things to be noted are that the static map only support small size of image (up to 1280×1280), and you will have to plot in more points if you want to draw a nicer driver routes. For our above code, we only take 7 location points, so the lines are not making any sense for driving. If we use the location points suggested by Google from directions function, the result would look much better. E.g.:

marker_points = []
waypoints = []

#extract the location points from the previous directions function

for leg in results[0]["legs"]:
    leg_start_loc = leg["start_location"]
    marker_points.append(f'{leg_start_loc["lat"]},{leg_start_loc["lng"]}')
    for step in leg["steps"]:
        end_loc = step["end_location"]
        waypoints.append(f'{end_loc["lat"]},{end_loc["lng"]}')
last_stop = results[0]["legs"][-1]["end_location"]
marker_points.append(f'{last_stop["lat"]},{last_stop["lng"]}')
        
markers = [ "color:blue|size:mid|label:" + chr(65+i) + "|" 
           + r for i, r in enumerate(marker_points)]
result_map = gmaps.static_map(
                 center = waypoints[0],
                 scale=2, 
                 zoom=13,
                 size=[640, 640], 
                 format="jpg", 
                 maptype="roadmap",
                 markers=markers,
                 path="color:0x0000ff|weight:2|" + "|".join(waypoints))

We are extracting all the location points from the directions function and pass them to the path parameter to draw the connection lines. The output would be similar to below which makes more sense for driving:

plot route on Google Maps with Python Google Maps API, optimize route with Google Maps API, Singapore

Conclusion

In this article, we have reviewed through a few Google Maps APIs which hopefully will help you for any feasibility study or even in your real projects. For example, to cleanse the dirty address for a large set of data, or to calculate the distance/travelling time or get the optimal routes between multiple locations. If your objective is to generate a dynamic map, you probably have to go for the Maps JavaScript API where you can display a map on web page and make it interactive, but still you may find these Python APIs would be more efficient in processing your raw data rather than doing it in JavaScript code.

(last updated on 8-May-2021)