pywin32

reading email from outlook with python pywin32

5 Useful Tips for Reading Email From Outlook In Python

Introduction

Pywin32 is one of the most popular packages for automating your daily work for Microsoft outlook/excel etc. In my previous post, we discussed about how to use this package to read emails and save attachments from outlook. As there were quite many questions raised in the comments which were not covered in the original post, this article is intended to review through some of the advanced topic for reading emails from outlook via Python Pywin32 package.

If you have not yet read through the previous post, you may check it out from here.

Prerequisites:

Assuming you have already installed the latest Pywin32 package and imported below necessary packages in your script, and you shall not encounter any error after executing the GetNamespace method to establish the outlook connection:

import win32com.client

#other libraries to be used in this script 
import os 
from datetime import datetime, timedelta

outlook = win32com.client.Dispatch('outlook.application') 
mapi = outlook.GetNamespace('MAPI')

When using below code to iterate the Accounts property, you shall see whichever accounts you have configured in your outlook:

for account in mapi.Accounts: 
    print(account.DeliveryStore.DisplayName)

#Assuming below accounts have been configured:
#[email protected]
#[email protected]

Now let’s move on to the topics we are going to discuss in this article.

Reading Email from Multiple Outlook Accounts

If you have multiple accounts configured in your outlook application, to access one of the accounts, you can use the Folders method and specify the account name or index of the account, e.g.:

for idx, folder in enumerate(mapi.Folders):
    #index starts from 1
    print(idx+1, folder)

#Assuming below output:
# 1  [email protected]
# 2  [email protected]

And to access the sub folders under a particular email account, you can continue to use the folders method to specify the sub folder name or index of the folder. Before that, you may want to check what are the available sub folders and it’s index value as per below:

for idx, folder in enumerate(mapi.Folders("[email protected]").Folders):
    print(idx+1, folder)
# or using index to access the folder
for idx, folder in enumerate(mapi.Folders(1).Folders): 
    print(idx+1, folder)

You shall see something similar to the below:

reading email from outlook with Python pywin32

With the above folder index and name, you shall be able to access the email messages as per below:

messages = mapi.Folders("[email protected]").Folders("Inbox").Items
# or
messages = mapi.Folders(1).Folders(2).Items
for msg in list(messages):
    print(msg.Subject)

Although the index would not get changed when you move up/down of your folders in outlook, obviously using folder name still is much better than index in terms of readability of the code.

Filter Email Based on Receiving Time Window

When reading emails from outlook inbox, you may want to zoom into the emails within a specific receiving time window rather than scanning through thousands of emails you have received in the inbox. To filter emails based on certain conditions, you can use restrict method together with the logical operators.

For instance, to filter the emails received from 1st day of the current month until today 12am:

today = datetime.today()

# first day of the month
start_time = today.replace(month=1, hour=0, minute=0, second=0).strftime('%Y-%m-%d %H:%M %p')

#today 12am
end_time = today.replace(hour=0, minute=0, second=0).strftime('%Y-%m-%d %H:%M %p')

messages = messages.Restrict("[ReceivedTime] >= '" + start_time
+ "' And [ReceivedTime] <= '" + end_time + "'")

With logical operators like AND, OR and NOT, you are able to combine multiple criteria together. For instance, to check the email with certain subject but not from a particular sender email:

messages = messages.Restrict("[Subject] = 'Sample Report'" 
                             + " And Not ([SenderEmailAddress] = '[email protected]')")

And you can also use the Restrict method as many times as you wish if it makes your code more readable than combining all conditions in one filter, e.g.:

messages = messages.Restrict("[Subject] = 'Sample Report'")
messages = messages.Restrict("Not ([SenderEmailAddress] = '[email protected]')")

Getting First N emails

When using Restrict method for filtering email messages, you would not be able to specify max number of emails you want to read. If you wish to get the first/last N emails based on the receiving time, you can use the Sort method to sort the messages based on certain email properties before you slice the list. Below is the sample code to get the latest 10 email messages based on the receiving time:

messages.Sort("[ReceivedTime]", Descending=True)

#read only the first 10 messages
for message in list(messages)[:10]:
    print(message.Subject, message.ReceivedTime, message.SenderEmailAddress)

Wildcard Matching for Filtering

With the Restrict method, you cannot do wildcard matching such as searching whether the email subject or body contains certain keywords. To be able to achieve that, you will need to use the DASL query.

For instance, with the below DASL query syntax, you can filter email subject which contains “Sample Report” keyword:

messages = messages.Restrict("@SQL=(urn:schemas:httpmail:subject LIKE '%Sample Report%')")

You may want to check here to see what are the fields supported in ADSL query and the correct namespace to be used.

Include/Exclude Multiple Email Domains

To filter the emails only from a particular domain, you can use the ADSL query similar to the previous example:

messages = messages.Restrict("@SQL=(urn:schemas:httpmail:SenderEmailAddress LIKE '%company.com')")

And to exclude the emails from a few domains, you can use multiple conditions with logical operators:

messages = messages.Restrict("@SQL=(Not(urn:schemas:httpmail:senderemail LIKE '%@abc%') \
And Not(urn:schemas:httpmail:senderemail LIKE '%@123%') \
And Not(urn:schemas:httpmail:senderemail LIKE '%@xyz%'))")

Conclusion

In this article, we have reviewed through some advanced usage of the Pywin32 package for filtering emails. You may not find many Python tutorials for this package from online directly, but you shall be able to see the equivalent VBA code from its official website for most of the code you have seen in this article. In the event that you cannot find a solution for you problem, you may check and see whether there is something implemented in VBA code that you can convert it into Python syntax.

Link for the previous post Reading Email From Outlook In Python.

How to close Windows process with python

When automating some tasks in Windows OS, you may wonder how to automatically close Windows process if you do not have the direct control of the running application or when the application is just running for too long time. In this article, I will be sharing with you how to close the Windows process with some python library, to be more specific, the pywin32 library.

Prerequisites

You will need to install the pywin32 library if you have not yet installed:

pip install pywin32

Find the process name from Windows Task Manager

You will need to first find out the application name which you intend to close, the application name can be found from the Windows task manager. E.g. If you expand the “Windows Command Processor” process, you can see the running process is “cmd.exe”.

python close Windows process

Let’s get started with the code!

Import the below modules that we will be using later:

from win32com.client import GetObject
from datetime import datetime

import os

And we need to get the WMI (Windows Management Instrumentation) service via the below code, where we can further access the window processes. For more information about WMI, please check this.

WMI = GetObject('winmgmts:')

Next, we will use the WMI SQL query to get the processes from the Win32_Process table by passing in the application name. Remember we have already found the application name earlier from the task manager.

 

for p in WMI.ExecQuery('select * from Win32_Process where Name="cmd.exe"'):
    #the date format is something like this 20200613144903.166769+480
    create_dt, *_ = p.CreationDate.split('.')
    diff = datetime.now() - datetime.strptime(create_dt,'%Y%m%d%H%M%S')

There are other properties such as Description, Status, Executable Path, etc. You can check the full list of the process properties from this win32-process documentation. Here we want to base on the creation date to calculate how much time the application has been running to determine if we want to kill it.

Assuming we need to close windows process after it is running for 5 minutes.

    if diff.seconds/60 > 5:		
        print("Terminating PID:", p.ProcessId)
	os.system("taskkill /pid "+str(p.ProcessId))

With this taskkill command, we will be able to terminate all the threads under this Windows process peacefully.

Conclusion

The pywin32 is super powerful python library especially when dealing with the Windows applications. You can use it to read & save attachments from outlook, send emails via outlookopen excel files and some more. Do have a check on these articles.

As per always, welcome any comments or questions.

python read email from outlook and save attachment

How to read email from outlook in python

There are always scenarios that you may wonder how to have a program to automatically read email from outlook and do some processing based on certain criteria. The most common use case is that you want to auto process email attachments when receiving some scheduled reports. In this article, I will be explaining to you how to use python to read email from outlook and save attachment files into the specified folder.

Prerequisites:

In order to be able to access the outlook native application, we will need to make use of the pywin32 library. Make sure you have installed this library and imported into your script.

import win32com.client
#other libraries to be used in this script
import os
from datetime import datetime, timedelta

Let’s get started!

Like communicating with other system or app, you will need to initiate a session in the first place. By calling the GetNamespace function, you can get the outlook session for the subsequent operations.

outlook = win32com.client.Dispatch('outlook.application')
mapi = outlook.GetNamespace("MAPI")

if you have configured multiple accounts in your outlook, you need to pass in the account name when accessing it’s folders, we can cover this topic in another article. For this article, let assume we only have 1 account configured in outlook.

for account in mapi.Accounts:
	print(account.DeliveryStore.DisplayName)

To access the inbox folder, you will need to pass in the folder type – 6 in the below function. You may refer to this doc to understand the full list of folder types, such as the draft, outbox, sent, deleted items folder etc.

inbox = mapi.GetDefaultFolder(6)

What if your email is in a sub folder under your inbox? The GetDefaultFolder has the Folders attribute where you can access to the sub folder by it’s name. For instance, to access the “your_sub_folder” under the inbox folder:

inbox = mapi.GetDefaultFolder(6).Folders["your_sub_folder"]

Read email from outlook

Now you are accessible to the inbox and it’s sub folder. You can view all the messages by getting the items as per below. But you may want filter the messages by certain criteria, such as the receiving date, from, subject etc. To do that, we can apply some filter conditions to the messages.

messages = inbox.Items

Use Restrict function to filter your email message. For instance, we can filter by receiving time in past 24 hours, and email sender as “[email protected]” with subject as “Sample Report”

received_dt = datetime.now() - timedelta(days=1)
received_dt = received_dt.strftime('%m/%d/%Y %H:%M %p')
messages = messages.Restrict("[ReceivedTime] >= '" + received_dt + "'")
messages = messages.Restrict("[SenderEmailAddress] = '[email protected]'")
messages = messages.Restrict("[Subject] = 'Sample Report'")

Save attachment files

With all the above filters, we shall only have the messages that we are interested in. Let’s loop through the message and check for the details.

#Let's assume we want to save the email attachment to the below directory
outputDir = r"C:\attachment"
try:
    for message in list(messages):
	try:
	    s = message.sender
	    for attachment in message.Attachments:
	        attachment.SaveASFile(os.path.join(outputDir, attachment.FileName))
	        print(f"attachment {attachment.FileName} from {s} saved")
	except Exception as e:
		print("error when saving the attachment:" + str(e))
except Exception as e:
		print("error when processing emails messages:" + str(e))

There are other attributes like Body, Size, Subject, LastModificationTime etc., please check this Microsoft documentation for more details.

If the particular problem you are trying to solve is not covered in this article, you may check my another post 5 Tips For Reading Email From Outlook In Python. And you may be also interested to see how to send email from outlook in python, please check this article.

As per always, welcome any comments or questions.

Fix the CompDocError when reading excel file with xlrd

CompDocError

You may have seen this CompDocError before if you used python xlrd library to read the older version of the excel file (.xls). When directly opening the same file from Microsoft Excel, it is able to show the data properly without any issue.

This usually happens if the excel file is generated from 3rd party application, the program did not follow strictly on the Microsoft Excel standard format, although the file is readable by Excel but it fails when opening it with xlrd library due to the non-standard format or missing some meta data. As you may have no control on how the 3rd party application generate the file, you will need to find a way to handle this CompDocError in your code.

 

SOLUTIONS FOR COMPDOCERROR

 

Option 1:

If you look at the error message, the error raised from  the line 427 in the compdoc.py in your xlrd package. Since you confirm there is no problem with the data in your excel file except the minor format issue, you can open the compdoc.py and comment out the lines for raising CompDocError exception.

while s >= 0:
    if self.seen[s]:
        pass
        #print("_locate_stream(%s): seen" % qname, file=self.logfile); dump_list(self.seen, 20, self.logfile)
        #raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s]))

Option 2:

You may notice if you open your file in Microsoft Excel and save it, you will be able to use xlrd to read and no exception will be raised. This is because Excel already fixed the issues for you when saving the file. You can use the same approach in your code to fix this problem.

To do that, you can use the pywin32 library to open the native Excel application and re-save the file.

 

import win32com.client as win32

excel_app = win32.Dispatch('Excel.Application')
wb = excel_app.Workbooks.open("test.xls")
excel_app.DisplayAlerts = False #do not show any alert when closing the excel
wb.Save()
excel_app.quit()

 

Conclusion

 

For option 1, it is good if your program only reads the files generated from the same source. If your program needs to read different excel files from different sources, it may not be a good to always assume the “CompDocError” can be ignored.

 

For option 2, when calling the excel_app.quit(), the entire Excel application will be closed without any alert. If you have other excel files opening at the time, it will be all closed together. So this solution is good if your program will run in a standalone environment or you confirm no other process/people will be using excel when running your code.

 

If you would like to understand more about how to read & write excel file with xlrd, please check this article.