ken

python read email from outlook and save attachment

How to read email from outlook in python

There are always scenarios that you may wonder how to have a program to automatically read email from outlook and do some processing based on certain criteria. The most common use case is that you want to auto process email attachments when receiving some scheduled reports. In this article, I will be explaining to you how to use python to read email from outlook and save attachment files into the specified folder.

Prerequisites:

In order to be able to access the outlook native application, we will need to make use of the pywin32 library. Make sure you have installed this library and imported into your script.

import win32com.client
#other libraries to be used in this script
import os
from datetime import datetime, timedelta

Let’s get started!

Like communicating with other system or app, you will need to initiate a session in the first place. By calling the GetNamespace function, you can get the outlook session for the subsequent operations.

outlook = win32com.client.Dispatch('outlook.application')
mapi = outlook.GetNamespace("MAPI")

if you have configured multiple accounts in your outlook, you need to pass in the account name when accessing it’s folders, we can cover this topic in another article. For this article, let assume we only have 1 account configured in outlook.

for account in mapi.Accounts:
	print(account.DeliveryStore.DisplayName)

To access the inbox folder, you will need to pass in the folder type – 6 in the below function. You may refer to this doc to understand the full list of folder types, such as the draft, outbox, sent, deleted items folder etc.

inbox = mapi.GetDefaultFolder(6)

What if your email is in a sub folder under your inbox? The GetDefaultFolder has the Folders attribute where you can access to the sub folder by it’s name. For instance, to access the “your_sub_folder” under the inbox folder:

inbox = mapi.GetDefaultFolder(6).Folders["your_sub_folder"]

Read email from outlook

Now you are accessible to the inbox and it’s sub folder. You can view all the messages by getting the items as per below. But you may want filter the messages by certain criteria, such as the receiving date, from, subject etc. To do that, we can apply some filter conditions to the messages.

messages = inbox.Items

Use Restrict function to filter your email message. For instance, we can filter by receiving time in past 24 hours, and email sender as “[email protected]” with subject as “Sample Report”

received_dt = datetime.now() - timedelta(days=1)
received_dt = received_dt.strftime('%m/%d/%Y %H:%M %p')
messages = messages.Restrict("[ReceivedTime] >= '" + received_dt + "'")
messages = messages.Restrict("[SenderEmailAddress] = '[email protected]'")
messages = messages.Restrict("[Subject] = 'Sample Report'")

Save attachment files

With all the above filters, we shall only have the messages that we are interested in. Let’s loop through the message and check for the details.

#Let's assume we want to save the email attachment to the below directory
outputDir = r"C:\attachment"
try:
    for message in list(messages):
	try:
	    s = message.sender
	    for attachment in message.Attachments:
	        attachment.SaveASFile(os.path.join(outputDir, attachment.FileName))
	        print(f"attachment {attachment.FileName} from {s} saved")
	except Exception as e:
		print("error when saving the attachment:" + str(e))
except Exception as e:
		print("error when processing emails messages:" + str(e))

There are other attributes like Body, Size, Subject, LastModificationTime etc., please check this Microsoft documentation for more details.

If the particular problem you are trying to solve is not covered in this article, you may check my another post 5 Tips For Reading Email From Outlook In Python. And you may be also interested to see how to send email from outlook in python, please check this article.

As per always, welcome any comments or questions.

Fix the CompDocError when reading excel file with xlrd

CompDocError

You may have seen this CompDocError before if you used python xlrd library to read the older version of the excel file (.xls). When directly opening the same file from Microsoft Excel, it is able to show the data properly without any issue.

This usually happens if the excel file is generated from 3rd party application, the program did not follow strictly on the Microsoft Excel standard format, although the file is readable by Excel but it fails when opening it with xlrd library due to the non-standard format or missing some meta data. As you may have no control on how the 3rd party application generate the file, you will need to find a way to handle this CompDocError in your code.

 

SOLUTIONS FOR COMPDOCERROR

 

Option 1:

If you look at the error message, the error raised from  the line 427 in the compdoc.py in your xlrd package. Since you confirm there is no problem with the data in your excel file except the minor format issue, you can open the compdoc.py and comment out the lines for raising CompDocError exception.

while s >= 0:
    if self.seen[s]:
        pass
        #print("_locate_stream(%s): seen" % qname, file=self.logfile); dump_list(self.seen, 20, self.logfile)
        #raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s]))

Option 2:

You may notice if you open your file in Microsoft Excel and save it, you will be able to use xlrd to read and no exception will be raised. This is because Excel already fixed the issues for you when saving the file. You can use the same approach in your code to fix this problem.

To do that, you can use the pywin32 library to open the native Excel application and re-save the file.

 

import win32com.client as win32

excel_app = win32.Dispatch('Excel.Application')
wb = excel_app.Workbooks.open("test.xls")
excel_app.DisplayAlerts = False #do not show any alert when closing the excel
wb.Save()
excel_app.quit()

 

Conclusion

 

For option 1, it is good if your program only reads the files generated from the same source. If your program needs to read different excel files from different sources, it may not be a good to always assume the “CompDocError” can be ignored.

 

For option 2, when calling the excel_app.quit(), the entire Excel application will be closed without any alert. If you have other excel files opening at the time, it will be all closed together. So this solution is good if your program will run in a standalone environment or you confirm no other process/people will be using excel when running your code.

 

If you would like to understand more about how to read & write excel file with xlrd, please check this article.

 

python read excel with xlrd

How to use xlrd library to read excel file

Python xlrd is a very useful library when you are dealing with some older version of the excel files (.xls). In this tutorial, I will share with you how to use this library to read data from .xls file.

Let’s get started with xlrd

You will need to install this library with below pip command:

pip install xlrd

and import the library to your source code:

import xlrd
import os

To open your excel file, you will need to pass in the full path of your file into the open_workbook function.It returns the workbook object, and in the next line you will be able to access the sheet in the opened workbook.

workbook = xlrd.open_workbook(r"c:\test.xls")

There are multiple ways for doing it, you can access by sheet name, sheet index, or loop through all the sheets

sheet = workbook.sheet_by_name("Sheet")
#getting the first sheet
sheet_1 = workbook.sheet_by_index(0)

for sh in workbook.sheets():
    print(sh.name)

To get the number of rows and columns in the sheet, you can access the following attributes. By default,
all the rows are padded out with empty cells to make them same size, but in case you want to ignore the
empty columns at the end, you may consider ragged_rows parameter when you call the open_workbook function.

row_count = sheet.nrows
col_count = sheet.ncols
# use sheet.row_len() to get the effective column length when you set ragged_rows = True

With number of rows and columns, you will be able to access the data of each of the cells

for cur_row in range(0, row_count):
    for cur_col in range(0, col_count):
        cell = sheet.cell(cur_row, cur_col)
        print(cell.value, cell.ctype)

Instead of accessing the data cell by cell, you can also access it by row or by column, e.g. assume your first row is the column header, you can get all the headers into a list as below:

header = sheet.row_values(0, start_colx=0, end_colx=None)
# row_slice returns the cell object(both data type and value) in case you also need to check the data type
#row_1 = sheet.row_slice(1, start_colx=0, end_colx=None)

Get the whole column values into a list:

col_a = sheet.col_values(0, start_rowx=0, end_rowx=None)
# col_slice returns the cell object of the specified range
col_a = sheet.col_slice(0, start_rowx=0, end_rowx=None)

There is a quite common error when handling the xls files, please check this article for fixing the CompDocError.

Conclusion

xlrd is a clean and easy to use library for handling xls files, but unfortunately there is no active maintenance for this library as Excel already evolved to xlsx format. There are other libraries such as openpyxl which can handle xlsx files very well for both data and cell formatting. I would suggest you to use xlsx file format in your new project whenever possible, so that more active libraries are supported.

If you would like to understand more about openpyxl , please read my next article about this library.

As per always, welcome to any comments or questions. Thanks.