Tutorials

group consecutive rows of same values in pandas

How to group consecutive rows of same values in pandas

Problem Statement

You have a data set which you would like to group consecutive rows if one of the columns has the same values. If there is a different value in between the rows, these records shall be split into separate groups.

To better elaborate the issue, let’s use an example.

Assuming you have the connection log data for some devices such as Bluetooth. It triggers a event when the connection is established as well as when it’s disconnected from the paired device. In between, there may be additional events triggered out for connectivity test. Let’s load the data and visualize it:

import pandas as pd 

df = pd.read_excel("connection log.xlsx")

df.head(10)

You can see the below output from Jupyter Lab:

connection log data

If you would like to check the duration for each device per every connection, you probably want to group these records if the events are triggered during the same connection. To determine whether the records are within the same connection, you shall sort the event date in ascending order and if the device Id is not the same in the consecutive rows, then they must be some events for different connections. So can this be done in pandas?

Solution to group the consecutive rows

Let’s do some sorting to our data first to make sure the records are in chronological order based on the event date:

df.sort_values(["Event Time", "Device ID"], ascending=[True, True], inplace=True)

To compare the value of current row and subsequent row for a particular column, we can use the data series shift method. For instance, we can shift the “Device ID” values to next row and store the result into new column named “Device ID X”:

df["Device ID X"] = df["Device ID"].shift()

After the shifting, you shall see the updated data frame as per below:

shifted rows

If you try to compare the values of both columns:

df["Device ID"] != df["Device ID X"]

You can see the return value of the True/False in a data series form:

compare device id column

Now it is coming to the most critical step. Since we know that there is only one True value when device ID switched to a new ID, we can use the pandas cumsum method to sum up these True values accumulatively as per below:

df["cumsum"] = (df["Device ID"] != df["Device ID X"]).cumsum()

When doing the accumulative summary, the True values will be counted as 1 and False values will be counted as 0. So you would see the below output:

comparison result

You can see that the same values calculated for the rows we would like to group together, and you can make use of this value to re-group the records for further analysis.

You can even simply combine the above steps into one liner to get the earliest and latest event time for each group as per below:

df.groupby((df["Device ID"] != df["Device ID"].shift()).cumsum()).agg({"Event Time" : ["min", "max"]})

Output as per below:

one liner result

You may be also interested in some other similar topic from here.

 

python logging, queuehandler

8 Tips for Using Python Logging

Python Logging is a built-in module typically used for capturing runtime events for diagnostic purposes. If you are using Python to build a tool for yourself or just for proof of concept, you may not necessarily need any logging. But if the code is to be used in production environment, it would be a better idea to add proper logging information to make your life easier later when you need to troubleshoot any production issues.

In this article, I will be sharing with you some tips for using Python logging module in your production code.

Basic Python Logging With basicConfig

When you have a very simple script and need a very basic logging mechanism to capture some runtime information, Python basicConfig is the best fit where everything comes with a default value.

Below is the code to create a root logger to write log to the console:

import logging
logging.basicConfig()
logging.warning('Authentication failure: %s', 'username or password incorrect.')

When running the above code, you shall see the below printed out in your console:

WARNING:root:Authentication failure: username or password incorrect.

It indicates the severity of the log record, the logger hierarchy as well as the actual log message. Usually you may also want to capture the timestamp of the log event. To specify what are the details (refer to the available attributes here) you want to add in your log record, you can provide your own logging format when initializing the root logger.

For instance the below code:

FORMAT = '[{asctime}] [{name}] [{levelname}] - {message}'
logging.basicConfig(format=FORMAT, style="{")

logging.warning('Authentication failure: %s', 'username or password incorrect.')

With the above custom formatting, you shall see the output as per below:

[2021-03-25 21:37:09,158] [root] [WARNING] - Authentication failure: username or password incorrect.

To capture the log into a file rather than printing it out in the console (sys.stderr), you can achieve it with a minor change to specify the filename parameter for the basicConfig method, you would then be able to write logs to the file you’ve specified.

logging.basicConfig(filename="a.log", format=FORMAT, style="{")

When filename parameter is not empty, python logging will create a FileHandler object to handle the I/O related operations.

Create Logger at Module Level

The Python logging module has a logger hierarchy for ancestors and descendant to share the configurations or passing around the messages. In the above examples, the root logger is initialized when calling the basicConfig.

A good practice is to create module-level logger so that you have the flexibility to handle the log messages differently for each module and track which module generated the logs.

Below is an example to create a module-level logger with getLogger method:

logger = logging.getLogger('module.x')
logger.setLevel(logging.DEBUG)

You can create also your own formatter and specify the handler you need for your module-level logger. Below is an example to create a StreamHandler with your own logging format:

FORMAT = '[{asctime}] [{name}] [{levelname}] - {message}'
formatter = logging.Formatter(FORMAT, style="{")

ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
ch.setFormatter(formatter)
logger.addHandler(ch)

To verify if your logger is working as expected, you can log some messages in different levels as per below:

def log_func(msg):
    logger.debug("Debugging now: %s", msg)
    logger.info("For your info only : %s", msg)
    logger.warning("This is a warning : %s", msg)
    logger.error("Error is happening : %s", msg)
    logger.critical("App is going to crash : %s", msg)

log_func("yolo")

You may see the output similar to below:

[2021-03-25 15:17:12,638] [module.x] [INFO] - For your info only : yolo
[2021-03-25 15:17:12,639] [module.x] [WARNING] - This is a warning : yolo
[2021-03-25 15:17:12,642] [module.x] [ERROR] - Error is happening : yolo
[2021-03-25 15:17:12,642] [module.x] [CRITICAL] - App is going to crash : yolo

Similar to Python package hierarchy, by specifying the logger name as “module.x.y”, a child logger will be created under “module.x”.

Pass Extra Information to LogRecord

When calling logger to log some event, a LogRecord instance is created automatically to pass to log handler. If you need some additional information to be added into the LogRecord, you can make use of the extra parameter together with the formatter.

d = {'clientip': '192.168.0.1'}
FORMAT = '[{clientip}] [{asctime}] [{name}] [{levelname}] - {message}'

logging.basicConfig(format=FORMAT, style="{")
logging.warning('Protocol problem: %s', 'connection reset', extra=d)

You shall see the output as per below:

[192.168.0.1] [2021-03-25 15:36:31,449] [root] [WARNING] - Authentication failure: password incorrect.

Create Multiple Log Files with RotatingFileHandler

There are cases where you need to switch to a new log file when the file size reaching to certain limit to avoid the particular log file growing too big. The RotatingFileHandler from the logging.handlers module provides a easy way to limit the max size of the file and number of backup files to be created.

Below is an example to set file size at 2KB and create max 5 backup files for logging:

from logging.handlers import RotatingFileHandler

logger = logging.getLogger("simple app")
logger.setLevel(logging.DEBUG)
fh = RotatingFileHandler(filename="app.log", maxBytes=2000, backupCount=5)
fh.setLevel(logging.DEBUG)
formatter = logging.Formatter('[{asctime}] - [{name}] - [{levelname}] - {message}', style="{")
fh.setFormatter(formatter)

logger.addHandler(fh)

When the app.log file grows to 2KB, the new logs will be automatically directed to a new file, and the old file will be renamed to app.log.X (until X reaches to 5).  If you’ve set backupCount as 1, the log file will be growing without changing to a new file.

Similarly, you can use TimedRotatingFileHandler to rotate the log files at certain time intervals such as every X hours/days etc.

Send Log to Email via SMTPHandler

If you would like to send log events via email to notify the recipients e.g. to notify the system administrator or support team on some critical runtime errors, you can achieve it with SMTPHandler.  Below is an example to use gmail to send out emails. (you will need to replace the email address and password to your own gmail account)

from logging.handlers import SMTPHandler

smtph = SMTPHandler(mailhost=("smtp.gmail.com", 587), 
fromaddr="from@gmail.com", 
toaddrs="to@gmail.com", 
subject="Test Error from Python Script", 
credentials=("from@gmail.com", "password"), 
secure=())
smtph.setLevel(logging.ERROR)
logger.addHandler(smtph)

logger.critical("App is going to crash")

Note that I have passed an empty tuple for the secure parameter in order for SMTPHandler to use TLS encryption for the SMTP connection.

(You will also need to enable the less secure apps for your google account in order to go through the authentication)

Use Queue to Process Log Asynchronously with QueueHandler

The previous example demonstrated how to use the SMTPHandler to send out email. If you have tried the code yourself, you may notice the sluggish behavior especially at the last line of the code. This is because sending email will block the main thread until it finishes, which may not be something you are expecting when you need an immediate response in your web application.

To make sure the performance-critical thread is not blocked by any of these slow operations, you shall consider to use queue to process them in separate threads. The QueueHandler and QueueListener are designed for this purpose. Below is an example to show you how to use queue handler and listener:

from logging.handlers import SMTPHandler, QueueHandler, QueueListener

log_queue = queue.Queue(-1)
queue_handler = QueueHandler(log_queue)

#smtph is the SMTPHandler created from previous example
listener = QueueListener(log_queue, smtph, respect_handler_level=True)
logger.addHandler(queue_handler)

listener.start()

#a log function created in previous example
log_func("Bingoooo")

listener.stop()

Take note that the respect_handler_level has been specified as True in order for the listener to filter the messages based on the severity level set in the handlers. E.g. the handler smtph has been set to handle messages with severity level above error, when respect_handler_level is True, only messages with severity level above error will be passed into the queue. Otherwise if respect_handler_level is False, all the messages regardless of the severity levels are passed into queue for processing.

Use dictConfig for Log Configuration

Setting up logger configurations line by line via code can be tedious sometimes especially when you have more than one handlers or formatters to be set up. The logging module also provides ways to load configurations from config files into dictionary object, and then load the configurations by calling the dictConfig method.

Below is an example for using dictionary to create a logger with StreamHandler and SMTPHandler:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'standard': {
            'format': '%(levelname)s %(asctime)s %(module)s %(message)s'
        }
    },
    'handlers': {
        'console':{
            'level':'DEBUG',
            'class':'logging.StreamHandler',
            'formatter': 'standard'
        },
        'email': {
            'level': 'ERROR',
            'formatter': 'standard',
            'class': 'logging.handlers.SMTPHandler',
            'mailhost': ("smtp.gmail.com", 587),
            'fromaddr': "from@gmail.com", 
            'toaddrs': "to@gmail.com", 
            'subject': "Test Error from Python Script",
            'credentials': ("from@gmail.com", "password"),
            'secure': ()            
        }
    },
    'loggers': {
        'app_log': {
            'handlers': ['console', 'email'],
            'level': 'INFO',
        }
    }
}

logging.config.dictConfig(LOGGING)
log = logging.getLogger("app_log")

In your project, you may put your configurations into a file in INI/JSON/YAML format, and load it as a dictionary to pass into the dictConfig method, so that you do not need to modify any source code whenever you want to change any logging configurations.

Suppress Error in Production Environment

Last but not the least, you shall consider to suppress any exceptions raised during the logging rather than letting it crash your entire application in a production environment. For instance, if your logs are written to a file and the disk is full or someone accidently updated the file permission to read-only, you would not wish such exceptions to bring down the entire application.

To prevent such issue, you can set raiseExceptions to False, so that any logging exceptions would not impact your application functionality.

logging.raiseExceptions = False

With above set to False, you shall not see any error even when there is some TypeError like below:

logging.info("For your info only : %s %s", msg)
# No error message will be showing and nothing will be logged

But make sure to set it as True (the default value) in your test environment, so that you can identify these type of errors before deploying your code into production. (refer to another topic for python suppressing stdout and stderror)

Conclusion

In this article, we have summarized 8 tips for using the Python logging module with some sample codes which hopefully will give you some ideas on how to use this module when you are developing some code for production use.

A lot of details are not covered such as the list of attributes for Formatter, LogRecord, Handler and Filter etc., you may need to go through the official documents if certain details you need is not discussed here in this article.

plot route on Google Maps with Python Google Maps API, optimize route with Google Maps API, Singapore

Plot Route on Google Maps with Python

Introduction

You probably use Google Maps a lot in your daily life, such as locate a popular restaurant, check the distance between one place to another, or find the nearest driving route and shortest travelling time etc. If you have been thrown with a big set of location data where you need check and validate them from a map, and then find the optimal routes, you probably would think how this can be done programmatically in Python. In this article, we will be exploring how all these can be achieved with Python Google Maps APIs.

Prerequisites

To be able to use Google Maps APIs, you will need to create a Google Cloud Service account and set up a billing method. If you are a new signed up user, you would get $300 credit for you to try out all the various Google APIs. You can follow the official docs to set up a new project, and from where you would get an API key for using its services.

You will also need to install the GoogleMaps package in your working environment, below is the pip command to install the package:

pip install -U googlemaps

Now let’s import the package at the beginning of our code, and initialize the service with your API Key:

import googlemaps

gmaps = googlemaps.Client(key='YOUR API KEY')

If you did not get any error up to this step, you are all set to go. Let’s start to explore the various function calls you can do with this package.

Get Geolocation of the Addresses

Often you need to check a location by address, postal code or even the name of the store/company, this can be done via the geocode function which is similar to search a place in Google Maps from your browser. The function returns a list of possible locations with the detailed address info such as the formatted address, country, region, street, lat/lng etc.

Below are a few possible address info you can pass to this API call:

#short form of address, such as country + postal code
geocode_result = gmaps.geocode('singapore 018956')

#full address
geocode_result = gmaps.geocode("10 Bayfront Ave, Singapore 018956")

#a place name
geocode_result = gmaps.geocode("zhongshan park")

#Chinese characters
geocode_result = gmaps.geocode('滨海湾花园')

#place name/restaurant name
geocode_result = gmaps.geocode('jumbo seafood east coast')

print(geocode_result[0]["formatted_address"]) 
print(geocode_result[0]["geometry"]["location"]["lat"]) 
print(geocode_result[0]["geometry"]["location"]["lng"])

Depends how complete the information you have supplied to the function call, you may get some sample output as per below:

1206 ECP, #01-07/08 East Coast Seafood Centre, Singapore 449883
1.3051669,
103.930673

Reverse Geocoding

You can also use the latitude and longitude to check the address information, this is called reverse geocoding. For instance, the below will give you the human readable addresses for the given lat/lng:

reverse_geocode_result = gmaps.reverse_geocode((1.3550021,103.7084641))

print(reverse_geocode_result[0]["formatted_address"])
#'87 Farrer Dr, Singapore 259287'

You may get multiple address info with the same lat/lng as Google just return the list of addresses closest to this lat/lng.

Check the Distance Between Locations

To check the distance between multiple locations, you can use the Google distance matrix API, where you can supply multiple locations, and Google will return the travelling distance between each location as well as the travelling time. E.g.:

from datetime import datetime, timedelta

gmaps.distance_matrix(origins=geocode_result[0]['formatted_address'], 
                      destinations=reverse_geocode_result[0]["formatted_address"], 
                      departure_time=datetime.now() + timedelta(minutes=10))

In above example, we have provided the departure time as 10 minutes later from current time for Google to calculate the travelling time. The departure time cannot be any time in the past.

The return JSON object would include the travelling distance and time based on the current traffic condition (default transport mode is driving):

{'destination_addresses': ['87 Farrer Dr, Singapore 259287'],
 'origin_addresses': ['1206 ECP, #01-07/08 East Coast Seafood Centre, Singapore 449883'],
 'rows': [{'elements': [{'distance': {'text': '22.2 km', 'value': 22219},
     'duration': {'text': '24 mins', 'value': 1442},
     'duration_in_traffic': {'text': '22 mins', 'value': 1328},
     'status': 'OK'}]}],
 'status': 'OK'}

Get Directions Between Locations

One of the most frequent usage of Google Maps is to check the direction from one place to another. You can use directions API to get the route info as per below:

directions_result = gmaps.directions(geocode_result[0]['formatted_address'],
                                     reverse_geocode_result[0]["formatted_address"],
                                     mode="transit",
                                     arrival_time=datetime.now() + timedelta(minutes=0.5))

For this directions API, it returns you the detailed routing information based on what type of travelling mode you’ve chosen.

In the above example, we have specified the mode as “transit”, so Google Maps will suggest a route that you can take the public transport whenever possible to reach your final destination. It may not meet your arrival time if it is really infeasible anyway.

Below is the sample return with detailed routing directions:

[{'bounds': {'northeast': {'lat': 1.3229677, 'lng': 103.9314612},
   'southwest': {'lat': 1.2925606, 'lng': 103.8056495}},
  'copyrights': 'Map data ©2021 Google',
  'legs': [{'arrival_time': {'text': '9:16pm',
     'time_zone': 'Asia/Singapore',
     'value': 1611321373},
    'departure_time': {'text': '7:59pm',
     'time_zone': 'Asia/Singapore',
     'value': 1611316750},
    'distance': {'text': '18.0 km', 'value': 17992},
    'duration': {'text': '1 hour 17 mins', 'value': 4623},
    'end_address': '87 Farrer Dr, Singapore 259287',
    'end_location': {'lat': 1.3132547, 'lng': 103.8070619},
    'start_address': '1206 ECP, #01-07/08 East Coast Seafood Centre, Singapore 449883',

...

{'distance': {'text': '0.1 km', 'value': 106},
        'duration': {'text': '1 min', 'value': 76},
        'end_location': {'lat': 1.305934, 'lng': 103.9306822},
        'html_instructions': 'Turn <b>left</b>',
        'maneuver': 'turn-left',
        'polyline': {'points': 'q{}Fk_jyRM?KAIAE@a@DQBGDEBGDIF_@LQD'},
        'start_location': {'lat': 1.3050507, 'lng': 103.9309369},
        'travel_mode': 'WALKING'},
...
}]

You can also supply the waypoints parameter in order to route multiple locations between your origin and destination. For instance, if you want to go for a one day tour in Singapore, you can provide a list of the attractions and let Google to optimize the route for you with the optimize_waypoints = True parameter. The sample code as per below:

waypoints = ["Chinatown Buddha Tooth Relic Temple", 
"Sentosa Island, Singapore", 
"National Gallery Singapore", 
"Botanic Garden, Singapore",
"Boat Quay @ Bonham Street, Singapore 049782"]

results = gmaps.directions(origin = "Fort Canning Park, Singapore",
                                         destination = "Raffles Hotel, Singapore",                                     
                                         waypoints = waypoints,
                                         optimize_waypoints = True,
                                         departure_time=datetime.now() + timedelta(hours=1)

for i, leg in enumerate(results[0]["legs"]):
    print("Stop:" + str(i),
        leg["start_address"], 
        "==> ",
        leg["end_address"], 
        "distance: ",  
        leg["distance"]["value"], 
        "traveling Time: ",
        leg["duration"]["value"]
    )

To get a good result, you will need to make sure all the locations you’ve provided can be geocoded by Google. Below is the output:

plot route on Google Maps with Python Google Maps API

Plot Route on Google Maps

To visualize your route or location on a map, you can make use of the Maps Static API. It allows you to plot your locations as markers on the map and draw the path between each location.

To get started, we shall define the markers for our locations. We can specify the color, size and label attributes for each location in a “|” separated string. As the label only allows a single character from {A-Z, 0-9}, we will just use A-Z to indicate the sequence of each location. E.g.:

locations = ["Fort Canning Park, Singapore",
          "Chinatown Buddha Tooth Relic Temple", 
          "Sentosa Island, Singapore", 
          "National Gallery Singapore", 
          "Boat Quay @ Bonham Street, Singapore 049782",
          "Botanic Garden, Singapore",
          "Raffles Hotel, Singapore"]

markers = ["color:blue|size:mid|label:" + chr(65+i) + "|" 
                   + r for i, r in enumerate(locations)]

In the static_map function, you can specify the scale, size and zoom to define how many pixels of your output and whether you want to show the details up to the city or individual building on your map.

The format and maptype parameters are used to specify the output image format and what type of maps you want to use.

Lastly, we can specify the path parameter to connect the different locations together. Similar to how we define the markers, we can specify the attributes in a “|” separated string. Below is the sample code:

result_map = gmaps.static_map(
                 center=locations[0],
                 scale=2, 
                 zoom=12,
                 size=[640, 640], 
                 format="jpg", 
                 maptype="roadmap",
                 markers=markers,
                 path="color:0x0000ff|weight:2|" + "|".join(locations))

And if you save the return result into a .jpg file as per below:

with open(“driving_route_map.jpg”, “wb”) as img:
    for chunk in result_map:
        img.write(chunk)

You shall see something similar to the below:

plot route on Google Maps with Python Google Maps API

The routing sequence is based on the list of locations you’ve supplied, so you can probably use directions function to return a optimal route and then plot them with static_map.

A few things to be noted are that the static map only support small size of image (up to 1280×1280), and you will have to plot in more points if you want to draw a nicer driver routes. For our above code, we only take 7 location points, so the lines are not making any sense for driving. If we use the location points suggested by Google from directions function, the result would look much better. E.g.:

marker_points = []
waypoints = []

#extract the location points from the previous directions function

for leg in results[0]["legs"]:
    leg_start_loc = leg["start_location"]
    marker_points.append(f'{leg_start_loc["lat"]},{leg_start_loc["lng"]}')
    for step in leg["steps"]:
        end_loc = step["end_location"]
        waypoints.append(f'{end_loc["lat"]},{end_loc["lng"]}')
last_stop = results[0]["legs"][-1]["end_location"]
marker_points.append(f'{last_stop["lat"]},{last_stop["lng"]}')
        
markers = [ "color:blue|size:mid|label:" + chr(65+i) + "|" 
           + r for i, r in enumerate(marker_points)]
result_map = gmaps.static_map(
                 center = waypoints[0],
                 scale=2, 
                 zoom=13,
                 size=[640, 640], 
                 format="jpg", 
                 maptype="roadmap",
                 markers=markers,
                 path="color:0x0000ff|weight:2|" + "|".join(waypoints))

We are extracting all the location points from the directions function and pass them to the path parameter to draw the connection lines. The output would be similar to below which makes more sense for driving:

plot route on Google Maps with Python Google Maps API, optimize route with Google Maps API, Singapore

Conclusion

In this article, we have reviewed through a few Google Maps APIs which hopefully will help you for any feasibility study or even in your real projects. For example, to cleanse the dirty address for a large set of data, or to calculate the distance/travelling time or get the optimal routes between multiple locations. If your objective is to generate a dynamic map, you probably have to go for the Maps JavaScript API where you can display a map on web page and make it interactive, but still you may find these Python APIs would be more efficient in processing your raw data rather than doing it in JavaScript code.

(last updated on 8-May-2021)

create animated charts and gif in python with pandas-alive

Create Animated Charts In Python

Introduction

If you are working as a data analyst or data scientist for some time, you may have already known how to use matplotlib to visualize and present data in various charts. The matplotlib library provides an animation module to generate dynamic charts to make your data more engaging, however it still takes you a few steps to format your data, and initialize and update the data into the charts. In this article, I will demonstrate you another Python library – pandas-alive which allows you to generate animated charts directly from pandas data without any format conversion.

Prerequisites

You can install this library via pip command as per below if you do not have it in your working environment yet:

pip install pandas-alive

It will also install its dependencies such as pandas, pillow and numpy etc.

For demonstration of our later examples, let’s grab some sample covid-19 data from internet, you can download it from here.

Before we start, we shall import all the necessary modules and do a preview of our sample data:

import pandas as pd
import pandas-alive

df_covid = pd.read_excel("covid-19 sample data.xlsx")

The data we will be working on would be something similar to the below:

create animated charts and gif in python with pandas-alive

Now with all above ready, let’s dive into the code examples.

Generate animated bar chart race

Bar chart is the most straightforward way to present data, it can be drawn in horizontal or vertical manner. Let’s do a minor formatting on our data so that we can use date as horizontal or vertical axis to present the data.

df_covid = df_covid.pivot(index="date", columns="location", values="total_cases").fillna(0)

To create an animated bar chart horizontally, you can simply call plot_animated as per below:

df_covid.plot_animated("covid-19-h-bar.gif", period_fmt="%Y-%m", title="Covid-19 Cases")

The plot_animated function has default parameters kind=”race” and orientation = “h”, hence the output gif would be generated as per below:

create animated charts and gif in python with pandas-alive

You can change the default values of these two parameters to generate a vertical bar chart race:

df_covid.plot_animated("covid-19-v-bar.gif", 
                     period_fmt="%Y-%m", 
                     title="Covid-19 Cases", 
                     orientation='v')

The output chart would be something similar to below:

create animated charts and gif in python with pandas-alive

 

Generate animated line chart

To create an animated line chart, you just need to change the parameter kind = “line” as per below:

df_covid.plot_animated("covid-19-line.gif",
                     title="Covid-19 Cases",
                     kind='line',
                     period_fmt="%Y-%m",
                     period_label={
                         'x':0.25,
                         'y':0.9,
                         'family': 'sans-serif',
                         'color':  'darkred'
                    })

There are some other parameters such as period_label to control the format of the label, or n_visible to constrain how many records to be shown on the chart. The output chart would be as per the below:

create animated charts and gif in python with pandas-alive

 

Generate animated pie chart

Similar to other charts, you can create a simple pie chart with below parameters:

df_covid.plot_animated(filename='covid-19-pie-chart.gif',
                     kind="pie",                     
                     rotatelabels=True,
                     tick_label_size=5,
                     dpi=300,
                     period_fmt="%Y-%m",
                     )

You can also use other Axes.Pie parameters to define the pie chart behavior. The output from above code would be:

 

 

create animated charts and gif in python with pandas-alive

 

 

Generate scatter chart

Generate scatter chart or bubble chart is slightly complicated than other charts, but for our sample data, it does not make much sense to visualize it in this type of charts. E.g.

df_covid.plot_animated(filename='covid-19-scatter-chart.gif',
                     kind="scatter",
                     period_label={'x':0.05,'y':0.9},
                     steps_per_period=5
                    )

You shall see the output is similar to the line chart:

create animated charts and gif in python with pandas-alive

 

Conclusion

Pandas-Alive provides very convenient ways to generate all sorts of animated charts from pandas data frame with the underlying support from the Matplotlib library. It accepts most of the parameters you used in matplotlib, so you don’t have to learn a lot of new things before applying it for your charts.

There are many more features beyond the basis I have covered in above, such as supplying custom figures, generating GeoSpatial charts or combining multiple animated charts in one view. You can check more examples from its project page.

 

generate password with python random and python secrets

You Might Have Misused Python Random

Introduction

Python random module provides a convenient way for generating pseudo-random numbers in case you need some unpredictable results for your application such as the computer games, a lucky draw system or even the shuffling logic for your music player. Since it provides various functions to generate results in “unpredictable” manner, developers attempted to use this feature to produce random password or authentication token for security purpose without understanding of it’s fundamental implementation. In this article, we will be discussing how the Python random module has been misunderstood and misused for the scenarios which it shall not be used.

Basic usage of Python random module

Let’s take a look at some basic usage of this module. You can use it to generate random integers, float numbers or bytes as per below:

#Generate a random integer between 1 to 10
random.randint(1,10) 
#2

#generate a random floating point number between 0 to 1
random.random()
#0.3103975786510934

#Generate random number between 1 to 2 in uniform distribution
random.uniform(1, 2) 
#1.9530600469459607 

#Generate random number between 1 to 100, with step as 2
random.randrange(1, 100, 2) 
#43 

#Generate random bytes, available in Python version 3.9
random.randbytes(8)
#b'v\xf7\xb2v`\xc8U]'
#Generate a integer within 8 bits
random.getrandbits(8)
#68

And shuffling or sampling the items in a sequence can be achieved easily with below:

slangs = ["Boomer", "Cap", "Basic", "Retweet", "Fit", "Fr", "Clout"]
random.shuffle(slangs)
#['Fit', 'Basic', 'Fr', 'Clout', 'Cap', 'Retweet', 'Boomer']

random.sample(slangs, k=5)
['Fit', 'Fr', 'Clout', 'Retweet', 'Basic']

You can also use the choice function to choose a random option from a given sequence, for instance:

random.choice(["Boomer", "Cap", "Basic", "Retweet", "Fit", "Fr", "Clout"])
#'Retweet'

With this function, It’s easy to implement a lucky draw logic where all the participants’ name are passed in and a winner is selected randomly which seems to be the perfect use case of it. The problem comes when developers try to use this function to generate  password or security related tokens which they believe it’s random enough and resistant to predication. But it is indeed wrong.

Why the random numbers generated are not random enough?

To answer this question, we will need to take a further look at the implementation of this Python random module. This module uses Mersenne Twister algorithm as the core generator which is designed for modelling and simulation purpose rather than security or cryptography. Some study show it’s not difficult to reconstruct the internal state of the MT to predict the outcome and it can be attacked through this MT randomness.

Actually random module does provide a SystemRandom class which uses the sources from operation system to generate random numbers without relying on the software state and the result is not reproducible. For instance, depends on the actual implementation, some OS uses the atmospheric noise or the exact time of the key presses as the source for generating unpredictable result which is more suitable for cryptography.

You can use it in the same way as the random class except the state is not available:

sys_random = random.SystemRandom()

sys_random.random()
#0.04883551877802528

sys_random.randint(1, 100)
#72

sys_random.choice(["Boomer", "Cap", "Basic", "Retweet", "Fit", "Fr", "Clout"])
#'Clout'

Unfortunately, this class has been overlooked for many years and some developers continue using the functions from random module for generating password or security tokens which exposed a lot of security vulnerability. To address these concerns, an enhancement proposal raised to add a new secrets module for some common security-related functions, so that people won’t mix up it with the random module.

Python secrets module

Let’s take a look at the functions in secrets module:

#Generate random integer between 0 to 50
secrets.randbelow(50)
#2

#Generate random integer with 8 bits
secrets.randbits(8)
#125

#Generate random bytes
secrets.token_bytes(20)
#b'\x8a\xb3\xa92)S:\xf2\xac\x90\xaf\xb1\xb3Q\xc5\xfe\x80\xdb\xc2`'

#Generate random string in hexadecimal with default 32 bytes
secrets.token_hex()
#'cf4f964edf810ca7f6ad6b533038fdcf538c73bd59e11d3340a838e7fd88fdf9'

#Generate URL-safe text string
secrets.token_urlsafe(10)
#z9zQQsxcq6SRug

import string
secrets.choice(string.ascii_letters + string.digits + string.punctuation)
#'k'

The functions are pretty similar to what we have seen in random module, just that results are generated from os.urandom function which meant to be unpredictable enough for cryptographic applications.

Implementing a strong password generator with Python secrets

With all above said, let’s implement a strong password generator with the secrets module. Assuming we have below requirements on our password:

  • Length between 8 to 16
  • At least 1 lowercase
  • At least 1 uppercase
  • At least 1 number
  • At least 1 special characters among !#$%&@_~

We can use the choice function to randomly choose 1 character each time for a random iteration between 8 to 16 times, then test if all the requirements are met for the generated password. Below the sample code:

def generate_strong_password():
    special_characters = '!#$%&@_~'
    password_choices = string.ascii_letters + string.digits + special_characters
    while True:
        password = ''.join(secrets.choice(password_choices) for _ in range(random.randint(8, 16)))
        if (any(c.islower() for c in password)
                and any(c.isupper() for c in password)
                and sum(special_characters.find(c) > -1 for c in password) == 1
                and any(c.isdigit() for c in password)):
            break
    return password

To check the randomness, you can try the below:

[generate_strong_password() for _ in range(10)]
# sample output
['C9A&PbswcLMpJ',
 'DHrbuzZU&io7Io',
 'B1zx4YqKUS@01m',
 'Q!F8tsZJsw',
 '9QVF8oASt_jceq2',
 '2~EKErrHAc9jvbyZ',
 '8ceU_XZbYdqv8b',
 'h8oS@tZ6',
 '3w@OX33tw12J6',
 'FX~mkO0aNatqp']

Conclusion

In this article, we have reviewed through the most commonly used functions in Python random module for generating pseudo-random results for modelling and simulation. Due to misuse of random module for security related applications from the earlier Python developers, Python has introduced a new secrets module to focus on password, authentication or security related tokens. We have also demonstrated an example on how to use secrets module to implement a strong password generator. Although you can generate a strong password with the combination of the alphanumeric and special characters, you shall never store your password in plaintext or simple encrypted format, a good practice is to always use libraries like hashlib or bcrypt to salt and hash it before storing.