ken

reading email from outlook with python pywin32

5 Useful Tips for Reading Email From Outlook In Python

Introduction

Pywin32 is one of the most popular packages for automating your daily work for Microsoft outlook/excel etc. In my previous post, we discussed about how to use this package to read emails and save attachments from outlook. As there were quite many questions raised in the comments which were not covered in the original post, this article is intended to review through some of the advanced topic for reading emails from outlook via Python Pywin32 package.

If you have not yet read through the previous post, you may check it out from here.

Prerequisites:

Assuming you have already installed the latest Pywin32 package and imported below necessary packages in your script, and you shall not encounter any error after executing the GetNamespace method to establish the outlook connection:

import win32com.client

#other libraries to be used in this script 
import os 
from datetime import datetime, timedelta

outlook = win32com.client.Dispatch('outlook.application') 
mapi = outlook.GetNamespace('MAPI')

When using below code to iterate the Accounts property, you shall see whichever accounts you have configured in your outlook:

for account in mapi.Accounts: 
    print(account.DeliveryStore.DisplayName)

#Assuming below accounts have been configured:
#[email protected]
#[email protected]

Now let’s move on to the topics we are going to discuss in this article.

Reading Email from Multiple Outlook Accounts

If you have multiple accounts configured in your outlook application, to access one of the accounts, you can use the Folders method and specify the account name or index of the account, e.g.:

for idx, folder in enumerate(mapi.Folders):
    #index starts from 1
    print(idx+1, folder)

#Assuming below output:
# 1  [email protected]
# 2  [email protected]

And to access the sub folders under a particular email account, you can continue to use the folders method to specify the sub folder name or index of the folder. Before that, you may want to check what are the available sub folders and it’s index value as per below:

for idx, folder in enumerate(mapi.Folders("[email protected]").Folders):
    print(idx+1, folder)
# or using index to access the folder
for idx, folder in enumerate(mapi.Folders(1).Folders): 
    print(idx+1, folder)

You shall see something similar to the below:

reading email from outlook with Python pywin32

With the above folder index and name, you shall be able to access the email messages as per below:

messages = mapi.Folders("[email protected]").Folders("Inbox").Items
# or
messages = mapi.Folders(1).Folders(2).Items
for msg in list(messages):
    print(msg.Subject)

Although the index would not get changed when you move up/down of your folders in outlook, obviously using folder name still is much better than index in terms of readability of the code.

Filter Email Based on Receiving Time Window

When reading emails from outlook inbox, you may want to zoom into the emails within a specific receiving time window rather than scanning through thousands of emails you have received in the inbox. To filter emails based on certain conditions, you can use restrict method together with the logical operators.

For instance, to filter the emails received from 1st day of the current month until today 12am:

today = datetime.today()

# first day of the month
start_time = today.replace(month=1, hour=0, minute=0, second=0).strftime('%Y-%m-%d %H:%M %p')

#today 12am
end_time = today.replace(hour=0, minute=0, second=0).strftime('%Y-%m-%d %H:%M %p')

messages = messages.Restrict("[ReceivedTime] >= '" + start_time
+ "' And [ReceivedTime] <= '" + end_time + "'")

With logical operators like AND, OR and NOT, you are able to combine multiple criteria together. For instance, to check the email with certain subject but not from a particular sender email:

messages = messages.Restrict("[Subject] = 'Sample Report'" 
                             + " And Not ([SenderEmailAddress] = '[email protected]')")

And you can also use the Restrict method as many times as you wish if it makes your code more readable than combining all conditions in one filter, e.g.:

messages = messages.Restrict("[Subject] = 'Sample Report'")
messages = messages.Restrict("Not ([SenderEmailAddress] = '[email protected]')")

Getting First N emails

When using Restrict method for filtering email messages, you would not be able to specify max number of emails you want to read. If you wish to get the first/last N emails based on the receiving time, you can use the Sort method to sort the messages based on certain email properties before you slice the list. Below is the sample code to get the latest 10 email messages based on the receiving time:

messages.Sort("[ReceivedTime]", Descending=True)

#read only the first 10 messages
for message in list(messages)[:10]:
    print(message.Subject, message.ReceivedTime, message.SenderEmailAddress)

Wildcard Matching for Filtering

With the Restrict method, you cannot do wildcard matching such as searching whether the email subject or body contains certain keywords. To be able to achieve that, you will need to use the DASL query.

For instance, with the below DASL query syntax, you can filter email subject which contains “Sample Report” keyword:

messages = messages.Restrict("@SQL=(urn:schemas:httpmail:subject LIKE '%Sample Report%')")

You may want to check here to see what are the fields supported in ADSL query and the correct namespace to be used.

Include/Exclude Multiple Email Domains

To filter the emails only from a particular domain, you can use the ADSL query similar to the previous example:

messages = messages.Restrict("@SQL=(urn:schemas:httpmail:SenderEmailAddress LIKE '%company.com')")

And to exclude the emails from a few domains, you can use multiple conditions with logical operators:

messages = messages.Restrict("@SQL=(Not(urn:schemas:httpmail:senderemail LIKE '%@abc%') \
And Not(urn:schemas:httpmail:senderemail LIKE '%@123%') \
And Not(urn:schemas:httpmail:senderemail LIKE '%@xyz%'))")

Conclusion

In this article, we have reviewed through some advanced usage of the Pywin32 package for filtering emails. You may not find many Python tutorials for this package from online directly, but you shall be able to see the equivalent VBA code from its official website for most of the code you have seen in this article. In the event that you cannot find a solution for you problem, you may check and see whether there is something implemented in VBA code that you can convert it into Python syntax.

Link for the previous post Reading Email From Outlook In Python.

python virtual environment, isolated environment

3 Ways for Managing Python Virtual Environment

 Introduction

Python virtual environment refers to an isolated execution environment for managing Python versions, dependencies, and indirectly permissions. When you have multiple projects working on and there are potential conflicting requirements such as different Python versions or libraries to be used in these projects, you need to consider using a virtual environment so that installing packages for one project will not impact another.

In this project, we will discuss about the different ways to create Python virtual environment for your multiple projects.

Using venv or virtualenv

Since Python 3.3, it introduced a lightweight venv module for you to create virtual environment, so that you do not need to install any additional tools for it. But if you are working on some projects with older Python versions, you will need to install another popular package called virtualenv for managing the virtual environment. Using Windows as an example, the steps to create a virtual environment are as simple as below:

  • Go to your command line window (Win- R)
  • Use “cd” command to switch to a writable folder where you want your Python virtual environment to be created e.g.: a dedicated folder called “py_venv”
  • Use below command to create a virtual environment for your project
python -m venv project_name
  • It takes a few seconds for the above to complete, and you shall be able to see below folder structure created under “project_name”:Python virtual environment, Python isolated environment

The pyvenv.cfg file describes the home directory, Python version etc. If you are using virtualenv, it will also indicate the version of this package you used.

  • Under the “Scripts” folder, you can see the following files:

Python virtual environment, Python isolated environment

Now the virtual environment has been created successfully. From the command line window, you can go to the “Scripts” folder and type “activate” or “activate.bat” and hit enter. You shall see below on your command line:

Python virtual environment, Python isolated environment

When seeing project name prefix at the beginning of the command, it means you are already in the virtual environment mode for this project, now you can proceed to install all the necessary packages, build and debug your code in this isolated environment. There is a system-site-packages parameter which you can specify whether you want to inherit the packages from the global environment, this might be useful when you have some heavy packages that you do not want to re-install them in each of your virtual environment.

To exit from the current virtual environment, you can run the “deactivate.bat”:

venv deactivate

Or if you need to switch from one virtual environment to another, for instance another virtual environment called “test”, you can activate the “test”, then it will exit the “project_name” automatically and switch to “test” environment:

Python virtual environment, Python isolated environment

From the above steps, you may wonder how to specify the different Python version when creating your virtual environment.

Assuming you have already installed Python 3.7 and Python 3.8, and during the installation, you have added the Python installation directories to your PATH variable. When you use “where python”:

Python virtual environment, Python isolated environment

You can see all the different Python versions you have installed. As “venv” does not support to specify the Python version, you will need to use virtualenv for this case, for instance:

virtualenv test1 -p python3.7
virtualenv test2 -p python3.8

This shall create the virtual environments based on the Python version you’ve specified, you can verify the version_info from the pyvenv.cfg file in the folder.

Many people may have question on whether the virtual environment folder shall be created separately or put it under the same place where your source code is placed.

Generally there is no right or wrong where you shall create your virtual environment folder, but you shall exclude this folder when you submit your code to the repository since other people may not be able to re-use whatever packages you’ve installed due to the different OS they are using.

My personal preference is to have a dedicated folder for all the virtual environment setup, and use the below command to export the installed packages when submitting to the repository, so that your source code folders will be cleaner:

python3 -m pip freeze > requirements.txt

Sometimes you may find tedious to switch between your existing virtual environments by using the activate/deactivate script, the virtualenvwrapper provides easier way to switch environment by names. You can read more from its official documents

Managing Python Virtual Environment with Conda

Conda is an open-source package and environment management tool which you can easily use it for handling your Python with conflicting dependency requirements. If you have never used it before, I would suggest you to do a quick review on the user guide from their official website, and down the miniconda based on which OS you are using.

To create a virtual environment with Conda, you can run the below from your command line:

conda create -n test python=3.9

You can see that Conda allows to specify the Python version, and it will check and download the Python version you’ve specified if it is not installed yet, and set up the virtual environment in the below default folder.

Python virtual environment, Python isolated environment

To activate your virtual environment:

conda activate test

Once it’s activated, you shall see the prefix added to your command line. And you can use below command to check what are the packages available in your current environment:

Python virtual environment, Python isolated environment

To deactivate your virtual environment, you can use the below:

conda deactivate
# or
conda activate other_project

Note that you can use both Conda or pip to install new packages, e.g.:

pip install numpy

You can see that when it is installed from pip, the channel will be indicated as pypi:

Python virtual environment, Python isolated environment

For more usage of Conda, you may refer to their official documentation here.

Managing Python Virtual Environment with IDE

If you are using IDE for Python programming, such as PyCharm, Visual Studio Code or Sublime Text, they usually provide an option for you to specify whether you want to set up a virtual environment when creating the new project. This would save you some effort for manually setting up the virtual environment and you do not have to worry about any potential conflict among your multiple projects.

Using PyCharm as an example, when create/import a new project, you are able to use the virtualenv tool to create a project-specific isolated virtual environment. The virtualenv tool comes bundled with PyCharm, so you do not need to install it separately.

Python virtual environment, Python isolated environment

For the detailed steps, you can refer to the PyCharm official document. Similar guide you can find for the other IDEs.

Conclusion

In this article, we have discussed about the purpose of using Python virtual environment and the different ways you can use to set up an isolated environment for your Python project. If you are new to topic, you may get confused as you would see many variant of the virtual environment tools people talked about, such as pyenv, pyvenv, pipenv, pyenv-virtualenv etc. Some of them are already deprecated in the later Python versions, so for a start, you shall concentrate on the built-in module venv, and then explore virtualenv and virtualenvwrapper for more advanced features.

combine data in pandas with merge vs join

Pandas Tricks – Combine Data in Different Ways

Introduction

If you have used pandas for your data analysis work, you may already get some idea on how powerful and flexible it is in terms of data processing. Many times there are more than one way to solve your problem, and choosing the best approach become another tough decision. For instance, in one of my previous article, I tried to summarize the 20 ways to filter records in pandas which definitely is not a complete list for all the possible solutions. In this article, I will be discussing about the different ways to merge/combine data in pandas and when you shall use them since combining data probably is one of the necessary step you shall perform before starting your data analysis.

Prerequisites

If you have not yet installed pandas, you may use the below command to install it from PyPI:

pip install pandas

And import the module at the beginning of your code:

import pandas as pd

Let’s dive into the code examples.

Combine Data with Append vs Concat

Imagine you have below two data frames from different sources, now you would like to merge them into one data frame.

df1 = pd.DataFrame({"ID" : [1, 2, 3, 4, 5], 
"Name" : ["Aaron", "Jimmy", "Zoe", "Jill", "Jenny"]})
df2 = pd.DataFrame({"ID": [6], "Name" : ["Kelly"]})

The most straightforward way would be using the append method from the pandas DataFrame object:

df1.append(df2, ignore_index=True)

The append method allows to add rows to the end of the current data frame, and with the ignore_index parameter as True, the resulting axis will be relabeled starting from 0.

You would see the output as per below:

combine data in pandas with merge vs join

Alternatively, you can use the pandas concat method which is self-explanatory based on its name. It provides a few more parameters to manipulate the resulting data frame such as specifying the axis for the concatenation to be done as well as the join logic for either union or intersection operation.

You can use the below to generate the same output as previously:

pd.concat([df1, df2], ignore_index=True)

And if you would like to retain a reference to the sources in your result, you can use the keys as per below:

pd.concat([df1, df2], keys=["src_1", "src_2"])

This would return a multi-index data frame where you can easily refer back to the data by source (e.g. df.loc[“src_1”]).

combine data in pandas with merge vs join

Adding new data frame as columns can be also done with axis = 1, for instance:

df3 = pd.DataFrame({"Age" : [12, 13, 13, 12, 13]})
pd.concat([df1, df3], axis=1)

The data frame has been added as one column to the caller:

combine data in pandas with merge vs join

As concat method accepts a list of data frames, you can combine multiple data frames at one time, which would be much faster than using append to do one by one.

Merge Data with Join vs Merge

Beside appending rows or columns based on axis, sometimes you may need more sophisticated operations similar to the left/right join in a rational database. For such scenarios, you shall make use of the pandas merge or join method.

For the previous example to append df2 to df1, you can achieve it with merge as well:

df1.merge(df2, how="outer")

Output as following:

combine data in pandas with merge vs join

It would be more tedious if you want to achieve the same via join since it can only join the data frame based on index, so you will have to set the index to the correct columns you would like to use as key. Below is how you can do it via join:

df1.join(df2.set_index(["ID", "Name"]), 
        on=["ID", "Name"], how="outer").reset_index(drop=True)

Assuming you have the below student’s score for each subject, and you want to merge the student information (df1) and the below based on the “Name” column:

df4 = pd.DataFrame({"ID" : [1001, 1002, 1003, 1002, 1001],
                    "Subject": ["Science", "Math", "English", "Math", "Science"], 
                    "Name": ["Aaron", "Jimmy", "Jimmy", "Zoe", "Jenny"], 
                    "Score" : ["A", "B", "C", "B", "B"]})

With merge function, you can specify the joining logic as left join on “Name” column as per below:

df1.merge(df4, on="Name", how="left")

Pandas will automatically add suffix whenever there are columns with duplicate names (e.g. “ID” in df1 and df4) from the two data frames, below is the output you may see:

combine data in pandas with merge vs join

To generate the same output via join, you can use below code which you need to pre-set the index for df4 and specify the suffix for left and right data frame:

df1.join(df4.set_index("Name"), on="Name", lsuffix="_x", rsuffix="_y")

Of course, if you would like to perform the right join for the above two data frames, you can do as per below:

df1.merge(df4, on="Name", how="right")
# or
df1.join(df4.set_index("Name"), on="Name", how="right", lsuffix="_x", rsuffix="_y")

Output as per below:

combine data in pandas with merge vs join

Merge DataFrame with Duplicate Keys

When merging multiple DataFrame objects, you may occasionally encounter the scenario that there are duplicate values for the columns you want to use as keys for joining. For instance, you may have below records if one subject has more than one lecturers:

df5 = pd.DataFrame({"Subject": ["Science", "Science", "Math", "Math", "English"], 
                    "Lecturer": ["Michael", "John", "Tim", "Robert", "Alex"]})

When you merge this information with student score based on the subject with merge or join method:

df4.merge(df5, on="Subject", how="left")
#or 
df4.join(df5.set_index("Subject"), on="Subject", how="left")

You would see the below output with M x N records due to the duplicate key in the df5:

combine data in pandas with merge vs join

If your objective is to perform something similar to excel vlookup to return the first matched value, then you can use the drop_duplicates method to remove the duplicate records before joining. E.g.:

df4.merge(df5.drop_duplicates("Subject"), on="Subject", how="left")

This would allow you to combine the two data frames with the first matched record from df5:

combine data in pandas with merge vs join

And in case you do not want to lose the information from the lecturer data frame, you will need to perform some sort of data aggregation before joining, e.g.:

df4.merge(df5.groupby("Subject").agg({"Lecturer" : lambda x: ','.join(x)}),
 on="Subject", how="left")

With this aggregation on the lecturer values, you would be able to see the below output:

combine data in pandas with merge vs join

Based on the above examples, you may find that merge and join are interchangeable in most of the cases, and you may have to type a bit more when using join method due to the different default arguments used. Since it always works on the index, you will have to preset the index on the key columns before joining.

Conclusion

In this article, we have reviewed through a few methods pandas offered for combining data frames with some sample code. To wrap up, the append and concat are usually used for merging two or more data frames based on the row or column index, and concat has better performance over append when you have multiple data frames to be worked on. If you need some high performance in-memory join operations like SQL joining for rational database, you will need to use merge or join method which can be interchangeable in most of the scenario. In addition, if the data frame you worked on does not have a index on the joining row/column, using merge over join would probably save your some typing.

 

group consecutive rows of same values in pandas

How to group consecutive rows of same values in pandas

Problem Statement

You have a data set which you would like to group consecutive rows if one of the columns has the same values. If there is a different value in between the rows, these records shall be split into separate groups.

To better elaborate the issue, let’s use an example.

Assuming you have the connection log data for some devices such as Bluetooth. It triggers a event when the connection is established as well as when it’s disconnected from the paired device. In between, there may be additional events triggered out for connectivity test. Let’s load the data and visualize it:

import pandas as pd 

df = pd.read_excel("connection log.xlsx")

df.head(10)

You can see the below output from Jupyter Lab:

connection log data

If you would like to check the duration for each device per every connection, you probably want to group these records if the events are triggered during the same connection. To determine whether the records are within the same connection, you shall sort the event date in ascending order and if the device Id is not the same in the consecutive rows, then they must be some events for different connections. So can this be done in pandas?

Solution to group the consecutive rows

Let’s do some sorting to our data first to make sure the records are in chronological order based on the event date:

df.sort_values(["Event Time", "Device ID"], ascending=[True, True], inplace=True)

To compare the value of current row and subsequent row for a particular column, we can use the data series shift method. For instance, we can shift the “Device ID” values to next row and store the result into new column named “Device ID X”:

df["Device ID X"] = df["Device ID"].shift()

After the shifting, you shall see the updated data frame as per below:

shifted rows

If you try to compare the values of both columns:

df["Device ID"] != df["Device ID X"]

You can see the return value of the True/False in a data series form:

compare device id column

Now it is coming to the most critical step. Since we know that there is only one True value when device ID switched to a new ID, we can use the pandas cumsum method to sum up these True values accumulatively as per below:

df["cumsum"] = (df["Device ID"] != df["Device ID X"]).cumsum()

When doing the accumulative summary, the True values will be counted as 1 and False values will be counted as 0. So you would see the below output:

comparison result

You can see that the same values calculated for the rows we would like to group together, and you can make use of this value to re-group the records for further analysis.

You can even simply combine the above steps into one liner to get the earliest and latest event time for each group as per below:

df.groupby((df["Device ID"] != df["Device ID"].shift()).cumsum()).agg({"Event Time" : ["min", "max"]})

Output as per below:

one liner result

You may be also interested in some other similar topic from here.

 

python logging, queuehandler

8 Tips for Using Python Logging

Python Logging is a built-in module typically used for capturing runtime events for diagnostic purposes. If you are using Python to build a tool for yourself or just for proof of concept, you may not necessarily need any logging. But if the code is to be used in production environment, it would be a better idea to add proper logging information to make your life easier later when you need to troubleshoot any production issues.

In this article, I will be sharing with you some tips for using Python logging module in your production code.

Basic Python Logging With basicConfig

When you have a very simple script and need a very basic logging mechanism to capture some runtime information, Python basicConfig is the best fit where everything comes with a default value.

Below is the code to create a root logger to write log to the console:

import logging
logging.basicConfig()
logging.warning('Authentication failure: %s', 'username or password incorrect.')

When running the above code, you shall see the below printed out in your console:

WARNING:root:Authentication failure: username or password incorrect.

It indicates the severity of the log record, the logger hierarchy as well as the actual log message. Usually you may also want to capture the timestamp of the log event. To specify what are the details (refer to the available attributes here) you want to add in your log record, you can provide your own logging format when initializing the root logger.

For instance the below code:

FORMAT = '[{asctime}] [{name}] [{levelname}] - {message}'
logging.basicConfig(format=FORMAT, style="{")

logging.warning('Authentication failure: %s', 'username or password incorrect.')

With the above custom formatting, you shall see the output as per below:

[2021-03-25 21:37:09,158] [root] [WARNING] - Authentication failure: username or password incorrect.

To capture the log into a file rather than printing it out in the console (sys.stderr), you can achieve it with a minor change to specify the filename parameter for the basicConfig method, you would then be able to write logs to the file you’ve specified.

logging.basicConfig(filename="a.log", format=FORMAT, style="{")

When filename parameter is not empty, python logging will create a FileHandler object to handle the I/O related operations.

Create Logger at Module Level

The Python logging module has a logger hierarchy for ancestors and descendant to share the configurations or passing around the messages. In the above examples, the root logger is initialized when calling the basicConfig.

A good practice is to create module-level logger so that you have the flexibility to handle the log messages differently for each module and track which module generated the logs.

Below is an example to create a module-level logger with getLogger method:

logger = logging.getLogger('module.x')
logger.setLevel(logging.DEBUG)

You can create also your own formatter and specify the handler you need for your module-level logger. Below is an example to create a StreamHandler with your own logging format:

FORMAT = '[{asctime}] [{name}] [{levelname}] - {message}'
formatter = logging.Formatter(FORMAT, style="{")

ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
ch.setFormatter(formatter)
logger.addHandler(ch)

To verify if your logger is working as expected, you can log some messages in different levels as per below:

def log_func(msg):
    logger.debug("Debugging now: %s", msg)
    logger.info("For your info only : %s", msg)
    logger.warning("This is a warning : %s", msg)
    logger.error("Error is happening : %s", msg)
    logger.critical("App is going to crash : %s", msg)

log_func("yolo")

You may see the output similar to below:

[2021-03-25 15:17:12,638] [module.x] [INFO] - For your info only : yolo
[2021-03-25 15:17:12,639] [module.x] [WARNING] - This is a warning : yolo
[2021-03-25 15:17:12,642] [module.x] [ERROR] - Error is happening : yolo
[2021-03-25 15:17:12,642] [module.x] [CRITICAL] - App is going to crash : yolo

Similar to Python package hierarchy, by specifying the logger name as “module.x.y”, a child logger will be created under “module.x”.

Pass Extra Information to LogRecord

When calling logger to log some event, a LogRecord instance is created automatically to pass to log handler. If you need some additional information to be added into the LogRecord, you can make use of the extra parameter together with the formatter.

d = {'clientip': '192.168.0.1'}
FORMAT = '[{clientip}] [{asctime}] [{name}] [{levelname}] - {message}'

logging.basicConfig(format=FORMAT, style="{")
logging.warning('Protocol problem: %s', 'connection reset', extra=d)

You shall see the output as per below:

[192.168.0.1] [2021-03-25 15:36:31,449] [root] [WARNING] - Authentication failure: password incorrect.

Create Multiple Log Files with RotatingFileHandler

There are cases where you need to switch to a new log file when the file size reaching to certain limit to avoid the particular log file growing too big. The RotatingFileHandler from the logging.handlers module provides a easy way to limit the max size of the file and number of backup files to be created.

Below is an example to set file size at 2KB and create max 5 backup files for logging:

from logging.handlers import RotatingFileHandler

logger = logging.getLogger("simple app")
logger.setLevel(logging.DEBUG)
fh = RotatingFileHandler(filename="app.log", maxBytes=2000, backupCount=5)
fh.setLevel(logging.DEBUG)
formatter = logging.Formatter('[{asctime}] - [{name}] - [{levelname}] - {message}', style="{")
fh.setFormatter(formatter)

logger.addHandler(fh)

When the app.log file grows to 2KB, the new logs will be automatically directed to a new file, and the old file will be renamed to app.log.X (until X reaches to 5).  If you’ve set backupCount as 1, the log file will be growing without changing to a new file.

Similarly, you can use TimedRotatingFileHandler to rotate the log files at certain time intervals such as every X hours/days etc.

Send Log to Email via SMTPHandler

If you would like to send log events via email to notify the recipients e.g. to notify the system administrator or support team on some critical runtime errors, you can achieve it with SMTPHandler.  Below is an example to use gmail to send out emails. (you will need to replace the email address and password to your own gmail account)

from logging.handlers import SMTPHandler

smtph = SMTPHandler(mailhost=("smtp.gmail.com", 587), 
fromaddr="[email protected]", 
toaddrs="[email protected]", 
subject="Test Error from Python Script", 
credentials=("[email protected]", "password"), 
secure=())
smtph.setLevel(logging.ERROR)
logger.addHandler(smtph)

logger.critical("App is going to crash")

Note that I have passed an empty tuple for the secure parameter in order for SMTPHandler to use TLS encryption for the SMTP connection.

(You will also need to enable the less secure apps for your google account in order to go through the authentication)

Use Queue to Process Log Asynchronously with QueueHandler

The previous example demonstrated how to use the SMTPHandler to send out email. If you have tried the code yourself, you may notice the sluggish behavior especially at the last line of the code. This is because sending email will block the main thread until it finishes, which may not be something you are expecting when you need an immediate response in your web application.

To make sure the performance-critical thread is not blocked by any of these slow operations, you shall consider to use queue to process them in separate threads. The QueueHandler and QueueListener are designed for this purpose. Below is an example to show you how to use queue handler and listener:

from logging.handlers import SMTPHandler, QueueHandler, QueueListener

log_queue = queue.Queue(-1)
queue_handler = QueueHandler(log_queue)

#smtph is the SMTPHandler created from previous example
listener = QueueListener(log_queue, smtph, respect_handler_level=True)
logger.addHandler(queue_handler)

listener.start()

#a log function created in previous example
log_func("Bingoooo")

listener.stop()

Take note that the respect_handler_level has been specified as True in order for the listener to filter the messages based on the severity level set in the handlers. E.g. the handler smtph has been set to handle messages with severity level above error, when respect_handler_level is True, only messages with severity level above error will be passed into the queue. Otherwise if respect_handler_level is False, all the messages regardless of the severity levels are passed into queue for processing.

Use dictConfig for Log Configuration

Setting up logger configurations line by line via code can be tedious sometimes especially when you have more than one handlers or formatters to be set up. The logging module also provides ways to load configurations from config files into dictionary object, and then load the configurations by calling the dictConfig method.

Below is an example for using dictionary to create a logger with StreamHandler and SMTPHandler:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'standard': {
            'format': '%(levelname)s %(asctime)s %(module)s %(message)s'
        }
    },
    'handlers': {
        'console':{
            'level':'DEBUG',
            'class':'logging.StreamHandler',
            'formatter': 'standard'
        },
        'email': {
            'level': 'ERROR',
            'formatter': 'standard',
            'class': 'logging.handlers.SMTPHandler',
            'mailhost': ("smtp.gmail.com", 587),
            'fromaddr': "[email protected]", 
            'toaddrs': "[email protected]", 
            'subject': "Test Error from Python Script",
            'credentials': ("[email protected]", "password"),
            'secure': ()            
        }
    },
    'loggers': {
        'app_log': {
            'handlers': ['console', 'email'],
            'level': 'INFO',
        }
    }
}

logging.config.dictConfig(LOGGING)
log = logging.getLogger("app_log")

In your project, you may put your configurations into a file in INI/JSON/YAML format, and load it as a dictionary to pass into the dictConfig method, so that you do not need to modify any source code whenever you want to change any logging configurations.

Suppress Error in Production Environment

Last but not the least, you shall consider to suppress any exceptions raised during the logging rather than letting it crash your entire application in a production environment. For instance, if your logs are written to a file and the disk is full or someone accidently updated the file permission to read-only, you would not wish such exceptions to bring down the entire application.

To prevent such issue, you can set raiseExceptions to False, so that any logging exceptions would not impact your application functionality.

logging.raiseExceptions = False

With above set to False, you shall not see any error even when there is some TypeError like below:

logging.info("For your info only : %s %s", msg)
# No error message will be showing and nothing will be logged

But make sure to set it as True (the default value) in your test environment, so that you can identify these type of errors before deploying your code into production. (refer to another topic for python suppressing stdout and stderror)

Conclusion

In this article, we have summarized 8 tips for using the Python logging module with some sample codes which hopefully will give you some ideas on how to use this module when you are developing some code for production use.

A lot of details are not covered such as the list of attributes for Formatter, LogRecord, Handler and Filter etc., you may need to go through the official documents if certain details you need is not discussed here in this article.