Resources

Best Tips for Python, Data Science and Automation

Resources

common python mistakes for beginners

8 Common Python Mistakes You Shall Avoid

Introduction Python is a very powerful programming language with easily understandable syntax which allows you to learn by yourself even you are not coming from a computer science background. Through out the learning journey, you may still make lots mistakes due to the lack of understanding on certain concepts. Learning how to fix these mistakes […]

Read More
Python one-liners with list comprehension and ternary operation

15 Most Powerful Python One-liners You Can't Skip

Introduction One-liner in Python refers to a short code snippet that achieves some powerful operations. It’s popular and widely used in Python community as it makes the code more concise and easier to understand. In this article, I will be sharing some most commonly used Python one-liners that would definitely speed up your coding without […]

Read More
web scraping with python requests and lxml

Web Scraping From Scratch With 3 Simple Steps

Introduction Web scraping or crawling refers to the technique to extract the information from a website and transform into structured data for later analysis. There are generally a few reasons that you may need to implement a web scraping scripts to automate the data collection process: There isn’t any public API available for you to […]

Read More
gspread read and write google sheet

Read and write Google Sheet with 5 lines of Python code

Introduction Google Sheet is a very powerful tool in terms of collaboration, it allows multiple users to work on the same rows of data simultaneously. It also provides fine-grained APIs in various programming languages for your application to connect and interact with Google Sheet. Sometimes when you just need some simple operations like reading/writing data […]

Read More
python suppress stdout and stderr Photo by Yeshi Kangrang on Unsplash

Python recipes- suppress stdout and stderr messages

Introduction If you have worked on some projects that requires API calls to the external parties or uses 3rd party libraries, you may sometimes run into the problem that you are able to get the correct return results but it also comes back with a lot of noises in the stdout and stderr. For instance, […]

Read More
Photo by Aron Visuals on Unsplash

How to calculate date difference between rows in pandas

Problem: You have some data with date (or numeric) data columns, you already knew you can directly use – operator to calculate the difference between columns, but now you would like to calculate the date difference between the two consecutive rows. For instance, You have some sample data for GPS tracking, and it has the […]

Read More
python cache

Python cache – the must read tips for code performance

Introduction

Most of us may have experienced the scenarios that we need to implement some computationally expensive logic such as recursive functions or need to read from I/O or network multiple times, these functions typically requires more resources and longer CPU time, and eventually can cause performance issue if handle without care. For such case, you shall always pay special attention to it once you have completed all the functional requirements, as the additional costs on the resources and time may eventually lead to the user experience issue. In this article, I will be sharing how we can make use of the cache mechanism (aka memoization) to improve the code performance.

Prerequisites:

To follow the examples in below, you will need to have requests package installed in your working environment, you may use the below pip command to install:

pip install requests

With this ready, let’s dive into the problem we are going to solve today.

As I mentioned before, the computationally expensive logic such as recursive functions or reading from I/O or network usually have the significant impacts to the runtime, and are always the targets for optimization. Let me illustrate with a specific example, for instance, assume we need to call some external API to get the rates:

import requests
import json

def inquire_rate_online(dimension):
    result = requests.get(f"https://postman-echo.com/get?dim={dimension}")
    if result.status_code == requests.codes.OK:
        data = result.json()
        return data["args"]["dim"]
    return ''

This function needs to make a call through the network and return the result (for demo purpose, this API call just echo back the input as result). If you want to provide this as a service to everybody, there is a high chance that different people inquire the rate with same dimension value. And for this case, you may wish to have the result stored at somewhere after the first person inquired, so that later you can just return this result for the subsequent inquiry rather than making an API call again. With this sort of caching mechanism, it should speed up your code.

Implement cache with global dictionary

For the above example, the most straightforward way to implement a cache is to store the arguments and results in a dictionary, and every time we check this dictionary to see if the key exists before calling the external API. We can implement this logic in a separate function as per below:

cached_rate = {}
def cached_inquire(dim):
    if dim in cached_rate:
        print(f"cached value: {cached_rate[dim]}")
        return cached_rate[dim]
    cached_rate[dim]= inquire_rate_online(dim)
    print(f"result from online : {cached_rate[dim]}")
    return cached_rate[dim]

With this code, you can cache the previous key and result in the dictionary, so that the subsequent calls will be directly returned from the dictionary lookup rather than an external API call. This should dramatically improve your code performance since reading from dictionary is much faster than making an API through the network.

You can quickly test it from Jupyter Notebook with the %time magic:

%time cached_inquire(1)

For the first time you call it, you would see the time taken is over 1 seconds (depends on the network condition):

result from online : 1
Wall time: 1.22 s

When calling it again with the same argument, we should expect our cached result start working:

%time cached_inquire(1)

You can see the total time taken dropped to 997 microseconds for this call, which is over 1200 times faster than previously:

cached value: 1
Wall time: 997 µs

With this additional global dictionary, we can see so much improvement on the performance. But some people may have concern about the additional memory used to hold these values in a dictionary, especially if the result is a huge object such as image file or array. Python has a separate module called weakref which solves this problem.

Implement cache with weakref

Python introduced weakref to allow creating weak reference to the object and then garbage collection is free to destroy the objects whenever needed in order to reuse its memory.

For demonstration purpose, let’s modify our earlier code to return a Rate class instance as the inquiry result:

class Rate():
    def __init__(self, dim, amount):
        self.dim = dim
        self.amount = amount
    def __str__(self):
        return f"{self.dim} , {self.amount}"

def inquire_rate_online(dimension):
    result = requests.get(f"https://postman-echo.com/get?dim={dimension}")
    if result.status_code == requests.codes.OK:
        data = result.json()
        return Rate(float(data["args"]["dim"]), float(data["args"]["dim"]))
    return Rate(0.0,0.0)

And instead of a normal Python dictionary, we will be using WeakValueDictionary to hold a weak reference of the returned objects, below is the updated code:

import weakref

wkrf_cached_rate = weakref.WeakValueDictionary()
def wkrf_cached_inquire(dim):
    if dim in wkrf_cached_rate:
        print(f"cached value: {wkrf_cached_rate[dim]}")
        return wkrf_cached_rate[dim]

    result = inquire_rate_online(dim)
    print(f"result from online : {result}")
    wkrf_cached_rate[dim] = result
    return wkrf_cached_rate[dim]

With the above changes, if you run the wkrf_cached_inquire two times, you shall see the significant improvement on the performance:

python weakref cache

And the dictionary does not hold the instance of the Rate, rather a weak reference of it, so you do not have to worry about the extra memory used, because the garbage collection will reclaim it when it’s needed and meanwhile your dictionary will be automatically updated with the particular entry being removed. So subsequently the program can continue to call the external API like the first time.

If you stop your reading here, you will miss the most important part of this article, because what we have gone through above are good but just not perfect due to the below issues:

  • In the example, we only have 1 argument for the inquire_rate_online function, things are getting tedious if you have more arguments, all these arguments have to be stored as the key for the dictionary. In that case, re-implement the caching as a decorator function probably would be easier
  • Sometimes you do not really want to let garbage collection to determine which values to be cached longer than others, rather you want your cache to follow certain logic, for instance, based on the time from the most recent calls to the least recent calls, aka least recent used, to store the cache

If the least recent used cache mechanism makes sense to your use case, you shall consider to make use of the lru_cache decorator from functools module which will save you a lot of effort to reinvent the wheels.

Cache with lru_cache

The lru_cache accepts two arguments :

  • maxsize to limit the size of the cache, when it is None, the cache can grow without bound
  • typed when set it as True, the arguments of different types will be cached separately, e.g. wkrf_cached_inquire(1) and wkrf_cached_inquire(1.0) will be cached as different entries

With the understanding of the lru_cache, let’s decorate our inquire_rate_online function to have the cache capability:

from functools import lru_cache

@lru_cache(maxsize=None)
def inquire_rate_online(dimension):
    result = requests.get(f"https://postman-echo.com/get?dim={dimension}")
    if result.status_code == requests.codes.OK:
        data = result.json()
        return Rate(float(data["args"]["dim"]), float(data["args"]["dim"]))
    return Rate(0.0,0.0)

If we re-run our inquire_rate_online twice, you can see the same effect as previously in terms of the performance improvement:

Python cache with lru_cache

And with this decorator function, you can also see the how the cache is used. The hits shows no. of calls have been returned from the cached results:

inquire_rate_online.cache_info()
#CacheInfo(hits=1, misses=1, maxsize=None, currsize=1)

Or you can manually clear all the cache to reset the hits and misses to 0:

inquire_rate_online.cache_clear()

Limitations:

Let’s also talk about the limitations of the solutions we discussed above:

  • The cache mechanism works best for the deterministic function meaning by given the same set of inputs, it always returns the same set of results. And you would not benefit much if you try to cache the result of a nondeterministic function, e.g.:
def random_x(x):
    return x*random.randint(1,1000)
  • For keyword arguments, if you swap the position of the keywords, the two calls will be cached as separate entries
  • It only works for the arguments that are immutable data type.

Conclusion

In this article, we have discussed about the different ways of creating cache to improve the code performance whenever you have computational expensive operations or heavy I/O or network reads. Although lru_cache decorator provide a easy and clean solution for creating cache but it would be still better that you understand the underline implementation of cache before we just take and use.

We also discussed about the limitations for these solutions that you may need to take note before implementing. Nevertheless, it would still help you in a lot of scenarios where you can make use of these methods to improve your code performance.

split or merge PDF files with PyPDF2

Split or merge PDF files with 5 lines of Python code

There are many cases you want to extract a particular page from a big PDF file or merge PDF files into one due to various reasons. You can make use of some PDF editor tools to do this, but you may realize the split or merge functions are usually not available in the free version, or it is too tedious when there are just so many pages or files to be processed. In this article, I will be sharing a simple solution to split or merge multiple PDF files with a few lines of Python code.

Prerequisite

We will be using a Python library called PyPDF2, so you will need to install this package in your working environment. Below is an example with pip:

pip install PyPDF2

Let’s get started

The PyPDF2 package has 4 major classes PdfFileWriter, PdfFileReader, PdfFileMerger and PageObject which looks quite self explanatory from class name itself. If you need to do something more than split or merge PDF pages, you may want to check this document to find out more about what you can do with this library.

Split PDF file

When you want to extract a particular page from the PDF file and make it a separate PDF file, you can use PdfFileReader to read the original file, and then you will be able to get a particular page by it’s page number (page number starts from 0). With the PdfFileWriter, you can use addPage function to add the PDF page into a new PDF object and save it.

Below is the sample code that extracts the first page of the file1.pdf and split it as a separate PDF file named first_page.pdf

from PyPDF2 import PdfFileWriter, PdfFileReader
input_pdf = PdfFileReader("file1.pdf")
output = PdfFileWriter()
output.addPage(input_pdf.getPage(0))
output.write("first_page.pdf")

The input_pdf.getPage(0) returns the PageObject which allows you to modify some of the attributes related to the PDF page, such as rotate and scale the page etc. So you may want to understand more from here.

Merge PDF files

To merge multiple PDF files into one file, you can use PdfFileMerger to achieve it. Although you can also do with PdfFileWriter, but PdfFileMerger probably is more straightforward when you do not need to edit the pages before merging them.

Below is the sample code which using append function from PdfFileMerger to append multiple PDF files and write into one PDF file named merged.pdf

from PyPDF2 import PdfFileReader, PdfFileMerger
pdf_file1 = PdfFileReader("file1.pdf")
pdf_file2 = PdfFileReader("file2.pdf")
output = PdfFileMerger()
output.append(pdf_file1)
output.append(pdf_file2)
output.write("merged.pdf")

If you do not want to include all pages from your original file, you can specify a tuple with starting and ending page number as pages argument for append function, so that only the pages specified would be add to the new PDF file.

The append function will always add new pages at the end, in case you want to specify the position where you wan to put in your pages, you shall use merge function. It allows you to specify the position of the page where you want to add in the new pages.

Conclusion

PyPDF2 package is a very handy toolkit for editing PDF files. In this article, we have reviewed how we can make use of this library to split or merge PDF files with some sample codes. You can modify these codes to suit your needs in order to automate the task in case you have many files or pages to be processed. There is also a pdfcat script included in this project folder which allows you to split or merge PDF files by calling this script from the command line. You may also want to take a look in case you just simply deal with one or two PDF files each time.

In case you are interested in other topics related to Python automation, you may check here. Thanks for reading.

Pyinstaller upxdir and icon options

In previous article, we have discussed about most of the commonly used options for PyInstaller library. There are two more very useful options but you may encounter some issues when you use them for the first time. In this article, we will discuss about the common issues for using PyInstaller –icon and –upxdir options.

Customize icon for your exe file with –icon

PyInstaller has the –icon option to specify your own icon when creating the executable file. If this option is not given, the exe files will be generated with default icon as per below.

pyinstaller logo

You can use –icon followed by image file name to let PyInstaller to use your own icon. You may see errors when you try to use a normal image format as icon, in this case you can convert your image file into .ico format and run the command again.

For demo purpose, I downloaded an icon from this website into my project folder to use it for my app. And with the below command, I shall be able to get new look for my exe file.

pyinstaller --onefile hello.py --name "SuperHero" --add-data "test.config;." --icon "superhero.icon" --clean

Below is how it looks like when the new exe file generated:

Pyinstaller generate exe with icon

Sometimes, you may also find that the icon did not get changed after you rebuilt the executable file, but when checking the “General” tab in file properties, you are able to see the new icon displayed. This is due to the window icon cache, you may try to delete the cache files from the below directory and retry.

User\AppData\Local\Microsoft\Windows\Explorer\IconCacheToDelete

Or if you specify a new name for your exe file, you shall be able to see the new icon applied.

 

Reduce file size with PyInstaller –upx-dir option

When you used a lot of libraries or resource files, your executable file can grow very big and become difficult for distribution. In this case, you can use upx to compress your exe file.

You can download the upx executable file into your PC and copy the full path as the parameter value for –upx-dir option. E.g.:

pyinstaller --onefile hello.py --name "SuperHero" --add-data "test.config;." --icon "superhero.icon" --upx-dir "c:\upx-3.96-win64" --clean

Sometimes you may find even there is no error when you build the executable file, there can be a runtime error such as the below, which showing that VCRUNTIME140.dll is either not designed to run on Windows or it contains an error.

pyinstaller-VCRUNTIME140.dll-error

This issue is due to PyInstaller modified the dll files during packing and compressing. The workaround is that you use the –upx-exclude to exclude the particular dll files. (No need to specify the path for the dll)

pyinstaller --onefile hello.py --name "SuperHero" --add-data "test.config;." --icon "superhero.icon" --upx-dir "c:\upx-3.96-win64" --upx-exclude "VCRUNTIME140.dll" --clean

Conclusion

Beside the above issues we discussed, you may occasional encounter some other errors, you will need to check  both your Python and PyInstaller versions to see if is it some compatibility issues. And also not all the Python libraries are supported by PyInstaller, you will need to check this list to see if you have used any libraries not in supported by PyInstaller.

python split text with multiple delimiters

Python split text with multiple delimiters

There are cases you want to split a text which possibly use different symbols (delimiters) for separating the different elements, for instance, if the given text is in csv or tsv format, each field can be separated with comma (,) or tab (\t). You will need to write your code logic to support both delimiters. In this article, I will be sharing with you a few possible ways to split text with multiple delimiters in Python.

Checking if certain delimiter exists before splitting

If you are pretty sure the text will only contains one type of delimiter at a time, you can check if such delimiter exists before splitting. e.g. 

text = 'field1,field2,field3,field4'
#or 
text = 'field1;field2;field3;field4'

You can write a one-liner to check if comma exists before splitting by comma, otherwise splitting by semicolon.

text.split(",") if text.find(",") > -1 else text.split(";")

But if there are a lot of possible delimiters can be used in the text, or different delimiters can be mixed in the text, then writing the above if else logic will become very tedious work.  You might have thought about to use the replace function (see the full list of string functions from this article) to replace all the different delimiters into a single delimiter. It may work for your case, but it is far from a elegant solution.

So for such case, let’s move to the second option.

Using re to split text with multiple delimiters

In regular expression module, there is a split function which allows to split by pattern. You can specify all the possible delimiters with “|” to split the text with multiple delimiters at one time.

For instance, the below will extract the field1 to field5 into a list.

import re

text1 = "field1\tfield2,field3;field4 field5"
fields = re.split(r",|;|\s|\t", text1)

The result of fields will be list with all the data fields we want:

['field1', 'field2', 'field3', 'field4', 'field5']

What if you want to also keep these delimiters in the list for later use (e.g. reform back the text) ? You can use the capture groups () in the regular expression, so that the matched patterns will be also showing in the result.

fields = re.split(r'(,|;|\s|\t)', text1)

Result of fields variable:

['field1', '\t', 'field2', ',', 'field3', ';', 'field4', ' ', 'field5']

Conclusion

This quite common that we need write code to split text with multiple delimiters, and there are possibly other ways to solve this problem, but so far using the re.split still the most straightforward and efficient way.

How to close Windows process with python

When automating some tasks in Windows OS, you may wonder how to automatically close Windows process if you do not have the direct control of the running application or when the application is just running for too long time. In this article, I will be sharing with you how to close the Windows process with some python library, to be more specific, the pywin32 library.

Prerequisites

You will need to install the pywin32 library if you have not yet installed:

pip install pywin32

Find the process name from Windows Task Manager

You will need to first find out the application name which you intend to close, the application name can be found from the Windows task manager. E.g. If you expand the “Windows Command Processor” process, you can see the running process is “cmd.exe”.

python close Windows process

Let’s get started with the code!

Import the below modules that we will be using later:

from win32com.client import GetObject
from datetime import datetime

import os

And we need to get the WMI (Windows Management Instrumentation) service via the below code, where we can further access the window processes. For more information about WMI, please check this.

WMI = GetObject('winmgmts:')

Next, we will use the WMI SQL query to get the processes from the Win32_Process table by passing in the application name. Remember we have already found the application name earlier from the task manager.

 

for p in WMI.ExecQuery('select * from Win32_Process where Name="cmd.exe"'):
    #the date format is something like this 20200613144903.166769+480
    create_dt, *_ = p.CreationDate.split('.')
    diff = datetime.now() - datetime.strptime(create_dt,'%Y%m%d%H%M%S')

There are other properties such as Description, Status, Executable Path, etc. You can check the full list of the process properties from this win32-process documentation. Here we want to base on the creation date to calculate how much time the application has been running to determine if we want to kill it.

Assuming we need to close windows process after it is running for 5 minutes.

    if diff.seconds/60 > 5:		
        print("Terminating PID:", p.ProcessId)
	os.system("taskkill /pid "+str(p.ProcessId))

With this taskkill command, we will be able to terminate all the threads under this Windows process peacefully.

Conclusion

The pywin32 is super powerful python library especially when dealing with the Windows applications. You can use it to read & save attachments from outlook, send emails via outlookopen excel files and some more. Do have a check on these articles.

As per always, welcome any comments or questions.

auto switch browser tabs

How to auto switch browser tabs

Imagine you have a big monitor and you would like to display something from multiple web links, would it be nice if there is a way to auto switch between the multiple browser tabs in a fixed period? In this article, I will be sharing with you how to auto switch browser tabs via selenium, an automated testing tool.

There is a very detailed documentation on the python selenium library, you may want to check this document as the starting point. For this article, I will just walk through the complete code for this automation, so that you can use it as a reference in case you are tying to implement something similar.

Let’s get started!

To auto launch the browser, we need to first download the web driver for the browser. For instance, if you are using chrome browser, you may download the driver file here. Do check your browser version to make sure you download the driver for the correct version.

As the prerequisite, you will also need to run the below command to install the selenium package in your working environment.

pip install selenium

Launch the browser

Then import all the necessary modules into your script. For this article, we will need to use the below modules:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import SessionNotCreatedException

import time
import os, sys

Let’s assume we want to display the below 3 links in your browser and make them auto switching between each other:

url_1 = "https://www.google.com/maps/@1.3085909,103.8403575,14z"
url_2 = "https://weather.com/en-SG/weather/today"
url_3 = "https://edition.cnn.com/"

Assuming you’ve already downloaded the chrome driver file and put it into the current script folder. Then let’s start to initiate the web driver to launch the browser:

options = Options()
options.add_experimental_option('useAutomationExtension', False)

try:	
	driver = webdriver.Chrome(executable_path=os.getcwd() + "\\chromedriver.exe", options=options)
except SessionNotCreatedException as e:
	print(e)
	print("please upgrade the chromedriver.exe from https://chromedriver.chromium.org/downloads")
	sys.exit(1)

You may wonder why we need a options parameter here?  It’s actually optional, but you may see the “Loading of unpacked extensions is disabled by the administrator” warning without setting useAutomationExtension to False. There are plenty of other options to control the browser behavior, check here for the documentation.

As frequently you will see there is a new version of chrome, and it may not work with old driver file anymore. So, it’s better we catch this exception and show some error message to guide users to upgrade the driver.

You can set the chrome window position by doing the below, but it does not matter if you wish to maximize the window later.

driver.set_window_position(2000, 1)

Let’s open the first link and maximize our window (This also can be done by options.addArguments("start-maximized")). And we want to execute some JavaScript to zoom out a bit so that we can see clearly.

#open window 1
driver.get(url_1)
driver.maximize_window()
driver.execute_script("document.body.style.zoom='120%'")
time.sleep(1)

To open the second tab, we need to use JavaScript to open a blank tab, and switch the active tab to the second tab. The driver.window_handles keeps a list of handlers for the opened windows, so window_handles[1] refers to the second tab.

driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])

Next, we will open the second link. And for this tab, let’s scroll down 300px to skip the ads second at the page header.

#open second link
driver.get(url_2)
driver.execute_script("document.body.style.zoom='90%'")
driver.execute_script("window.scrollBy(0,300);")
time.sleep(1)

Similarly, we can open the third tab with the below code:

#open window 3
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[2])
driver.get(url_3)		
driver.execute_script("document.body.style.zoom='90%'")
driver.execute_script("window.scrollBy(0,200);")
time.sleep(1)

Auto switch between tabs

Once everything is ready, we shall write the logic to auto switch between the different tabs at certain interval. To do that, we need to know how to perform the below 3 things:

  • Identify what is the active link showing now

We can use driver.title attribute to check if the page title contains certain keyword for the particular website, so that we know which page is active now

  • Switch to a new tab

We can continue to use driver.switch_to.window to switch the tab, but we need to have logic to determine which is the next tab we want to switch to

  • Refresh the page (in case there is any updates)

We can use driver.refresh() to refresh the page, but we will lose the setting such as zooming in/out, so we need to set it again

So let’s take a look at the complete code:

nextIndex = 2

start = time.time()

while True:
	
	#stop running after 5 minutes
	if (time.time() - start >= 5*60):
		break
		
	if "Google Maps" in driver.title:
		driver.refresh()
		driver.execute_script("document.body.style.zoom='120%'")
		time.sleep(3)
		nextIndex = 0 if nextIndex + 1 > 2 else nextIndex + 1
		
	elif "CNN" in driver.title:
		driver.refresh()
		driver.execute_script("document.body.style.zoom='90%'")
		time.sleep(5)
		nextIndex = 0 if nextIndex + 1 > 2 else nextIndex + 1
		
	elif "Weather" in driver.title:
		driver.refresh()
		driver.execute_script("document.body.style.zoom='90%'")
		time.sleep(2)
		nextIndex = 0 if nextIndex + 1 > 2 else nextIndex + 1
		
	driver.switch_to.window(driver.window_handles[nextIndex])

So each of the tab will be active for a few seconds before switching to the next tab. And after 5 minutes, this loop will be stopped.

If we wish to close all tabs at the end of the script, we can perform the below:

for window in driver.window_handles:
	driver.switch_to.window(window)
	driver.close()

So that’s it and congratulations that you have completed a new automation project to auto switch browser tabs for Chrome. As per always, welcome any comments or questions.

python send email with attachment via smtplib

How to send email with attachment via python smtplib

In one of my previous article, I have discussed about how to send email from outlook application. That has assumed you have already installed outlook and configured your email account on the machine where you want to run your script. In this article, I will be sharing with you how to automatically send email with attachments via lower level API, to be more specific, by using python smtplib where you do not need to set up anything in your environment to make it work.

For this article, I will demonstrate to you to send a HTML format email from a gmail account with some attachment. So besides the smtplib module, we will need to use another two modules – ssl and email.

Let’s get started!

First, you will need to find out the SMTP server and port info to send email via google account. You can find this information from this link. For your easy reading, I have captured in the below screenshot.

codeforests - google smtp server configuration info

So we are going to use the server: smtp.gmail.com and port 587 for our case. (you may search online to find out more info about the SSL & TLS, we will not discuss much about it in this article)

Let’s start to import all the modules we need:

import smtplib, ssl
from email.mime.multipart import MIMEMultipart 
from email.mime.text import MIMEText 
from email.mime.application import MIMEApplication

As we are going to send the email in HTML format (which are you able to unlock a lot features such as adding in styles, drawing tables etc.), we will need to use the MIMEText. And also the MIMEMultipart and MIMEApplication for the attachment.

Build up the email message

To build up our email message, we need to create mixed type MIMEMultipart object so that we can send both text and attachment. And next, we shall specify the from, to, cc and subject attributes.

smtp_server = 'smtp.gmail.com'
smtp_port = 587 
#Replace with your own gmail account
gmail = 'yourmail@gmail.com'
password = 'your password'

message = MIMEMultipart('mixed')
message['From'] = 'Contact <{sender}>'.format(sender = gmail)
message['To'] = 'contact@codeforests.com'
message['CC'] = 'contact@codeforests.com'
message['Subject'] = 'Hello'

You probably do not want anybody can see your hard coded password here, you may consider to put this email account info into a separate configuration file. Check my another post on the read/write configuration files.

For the HTML message content, we will wrap it into the MIMEText, and then attach it to our MIMEMultipart message:

msg_content = '<h4>Hi There,<br> This is a testing message.</h4>\n'
body = MIMEText(msg_content, 'html')
message.attach(body)

Let’s assume you want to attach a pdf file from your c drive, you can read it in binary mode and pass it into MIMEApplication with MIME type as pdf. Take note on the additional header where you need to specify the name your attachment file.

attachmentPath = "c:\\sample.pdf"
try:
	with open(attachmentPath, "rb") as attachment:
		p = MIMEApplication(attachment.read(),_subtype="pdf")	
		p.add_header('Content-Disposition', "attachment; filename= %s" % attachmentPath.split("\\")[-1]) 
		message.attach(p)
except Exception as e:
	print(str(e))

If you have a list of the attachments, you can loop through the list and attach them one by one with the above code.

Once everything is set properly, we can convert the message object into to a string:

msg_full = message.as_string()

Send email

Here comes to the most important part, we will need to initiate the TLS context and use it to communicate with SMTP server.

context = ssl.create_default_context()

And we will initialize the connection with SMTP server and set the TLS context, then start the handshaking process.

Next it authenticate our gmail account, and in the send mail method, you can specify the sender, to and cc (as a list), as well as the message string. (cc is optional)

with smtplib.SMTP(smtp_server, smtp_port) as server:
	server.ehlo()  
	server.starttls(context=context)
	server.ehlo()
	server.login(gmail, password)
	server.sendmail(gmail, 
				to.split(";") + (cc.split(";") if cc else []),
				msg_full)
	server.quit()

print("email sent out successfully")

Once sendmail completed, you will disconnect with the server by server.quit().

With all above, you shall be able to receive the email triggered from your code. You may want to wrap these codes into a class, so that you can reuse it as service library in your multiple projects.

 

As per always, please share if you have any questions or comments.

python cache

How to print colored message on command line terminal window

When you are developing a python script with some output messages printed on the terminal window, you may find a little bit boring that all the messages are printed in black and white, especially if some messages are meant for warning, and some just for information only. You may wonder how to print colored message to make them look differently, so that your users are able to pay special attention to those warning or error messages.

In this article, I will be sharing with you a library which allows you to print colored message in your terminal.

Let’s get started!

The library I am going to introduce called colorama, which is a small and clean library for styling your messages in both Windows, Linux and Mac os.

Prerequisite :

You will need to install this library, so that you will be able to run the following code in this article.

pip install colorama

To start using this library, you will need to import the modules, and call the init() method at the beginning of your script or your class initialization method.

import colorama
from colorama import Fore, Back, Style
colorama.init()

Print colored message with colorama

The init method also accepts some **kwargs to overwrite it’s default behaviors. E.g. by default, the style will not be reset back after printing out a message,  and the subsequent messages will be following the same styles. You can pass in autoreset = true to the init method, so that the style will be reset after each printing statement.

Below are the options you can use when formatting the font, background and style.

Fore: BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE, RESET.
Back: BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE, RESET.
Style: DIM, NORMAL, BRIGHT, RESET_ALL

To use it in your message, you can do as per below to wrap your messages with the styles:

print(Fore.CYAN + "Cyan messages will be printed out just for info only" + Style.RESET_ALL)
print(Fore.RED + "Red messages are meant to be to warning or error" + Style.RESET_ALL)
print(Fore.YELLOW + Back.GREEN +  "Yellow messages are debugging info" + Style.RESET_ALL)

This is how it would look like in your terminal:

Python printed colored message with colorama

As I mentioned earlier, if you don’t set the autoreset to true, you will need to reset the style at the end of your each message, so that different message applies different styles.

What if you want to apply the styles when asking user’s input ? Let’s see an example:

print(Fore.YELLOW)
choice = input("Enter YES to confrim:")
print(Style.RESET_ALL)
if str.upper(choice) in ["YES",'Y']:
    print(Fore.GREEN + "You have just confirmed to proceed." + Style.RESET_ALL)
else:
    print(Fore.RED + "You did not enter yes, let's stop here" + Style.RESET_ALL)

By wrapping the input inside Fore.YELLOW and Style.RESET_ALL, whatever output messages from your script or user entry, the same style will be applied.

Let’s put all the above into a script and run it in the terminal to check how it looks like.

Python printed colored message with colorama

Yes, that’s exactly what we want to achieve! Now you can wrap your printing statement into a method e.g.: print_colored_message, so that you do not need to repeat the code everywhere.

As per always, please share if you have any comments or questions.

 

python unpack objects

Python how to unpack tuple, list and dictionary

There are various cases that you want to unpack your python objects such as tuple, list or dictionary into individual variables, so that you can easily access the individual items. In this article I will be sharing with you how to unpack these different python objects and how it can be useful when working with the *args and **kwargs in the function.

Let’s get started.

Unpack python tuple objects

Let’s say we have a tuple object called shape which describes the height, width and channel of an image, we shall be able to unpack it to 3 separate variables by doing below:

shape = (500, 300, 3)
height, width, channel = shape
print(height, width, channel)

And you can see each item inside the tuple has been assigned to the individual variables with a meaningful name, which increases the readability of your code. Below is the output:

500 300 3

It’s definitely more elegant than accessing each items by index, e.g. shape[0], shape[1], shape[2].

What if we just need to access a few items in a big tuple which has many items? Here we need to introduce the _ (unnamed variable) and * (unpack arbitrary number of items)

For example,  if we just want to extract the first and the last item from the below tuple, we can let the rest of the items go into a unnamed variable.

toto_result = (4,11,14,23,28,47,24)
first, *_, last = toto_result
print(first, last)

So the above will give the below output:

4 24

If you are curious what is inside the “_”, you can try to print it out. and you would see it’s actually a list of the rest of items between the first and last item.

[11, 14, 23, 28, 47]

The most popular use case of the packing and unpacking is to pass around as parameters to function which accepts arbitrary number of arguments (*args). Let’s look at an example:

def sum(*numbers):
    total = 0
    for n in numbers:
        total += n
    return total

For the above sum function, it accepts any number of arguments and sum up the values. The * here is trying to pack all the arguments passed to this function and put it into a tuple called numbers. If you are going to sum up the values for all the items in toto_result, directly pass in the toto_result would not work.

toto_resut = (4,11,14,23,28,47,24)
#sum(toto_result) would raise TypeError

So what we can do is to unpack the items from the tuple then pass it the sum function:

total = sum(*toto_resut)
print(total)
#output should be 151

Unpack python list objects

Unpacking the list object is similar to the unpacking operations on tuple object. If we replace the tuple to list in the above example, it should be working perfectly.

shape = [500, 300, 3]
height, width, channel = shape
print(height, width, channel)
#output shall be 500 300 3

toto_result = [4,11,14,23,28,47,24]
first, *_, last = toto_result
print(first, last)
#output shall be 4 24

total = sum(*toto_resut) 
print(total) 
#output should be also 151

Unpack python dictionary objects

Unlike the list or tuple, unpacking the dictionary probably only useful when you wants to pass the dictionary as the keyword arguments into a function (**kwargs).

For instance, in the below function, you can pass in all your keyword arguments one by one.

def print_header(**headers):
    for header in headers:
        print(header, headers[header])

print_header(Host="Mozilla/5.0", referer = "https://www.codeforests.com")

Or if you have a dictionary like below, you can just unpack it and pass to the function:

headers = {'Host': 'www.codeforests.com', 'referer' : 'https://www.codeforests.com'}
print_header(**headers)

It will generate the same result as previously, but the code is more concise.

Host www.codeforests.com
referer https://www.codeforests.com

With this unpacking operator, you can also combine multiple dictionaries as per below:

headers = {'Host': 'www.codeforests.com', 'referer' : 'https://www.codeforests.com'}
extra_header = {'user-agent': 'Mozilla/5.0'}

new_header = {**headers, **extra_header}

The output of the new_header will be like below:

{'Host': 'www.codeforests.com',
 'referer': 'https://www.codeforests.com',
 'user-agent': 'Mozilla/5.0'}

Conclusion

The unpacking operation is very usefully especially when dealing with the *args and **kwargs. There is one thing worth noting on the unamed variable (_) which I mentioned in the previous paragraph. Please use it with caution, as if you notice, the python interactive interpreter also uses _ to store the last executed expression. So do take note on this potential conflict. See the below example:

codeforests interactive interpreter conflicts

As per always, welcome any comments or questions.

Get file names by extension from a directory

Whenever you access the directories and files, you probably will need to implement some function to get file names by file extension from a particular directory. For instance, you may want to check and process all the excel files in a folder, or do a house keeping to remove all the old log files. In this article, I will be explaining to you a few ways of implementing such function.

Let’s get started!

There are actually plenty of libraries/modules you can use to achieve it, but let’s start with the most commonly used libraries/modules.

Option 1

Since you will need to import the os module anyway if you need to handle the file operations, you can make use of the functions from this module.

For instance, you can list out all the files/sub-directories under the current directory,  and check if file name ending with certain file extension as per below:

import os

pyfiles = []
for file in os.listdir("."):
    if file.lower().endswith(".ipynb"):
        pyfiles.append(file)

You can further sort the files by last modified time from latest to the earliest.

pyfiles.sort(key=os.path.getmtime, reverse=True)

What if you want to check multiple file extensions ? Don’t worries, you can still achieve it by some minor change on the if condition:

if file.lower().endswith((".ipynb", ".xlsx")):

Option 2

The os module also has another method scandir which is able to achieve the same, and also returns the file types and file attribute info.

files = []
for file in os.scandir("."):
    if file.name.lower().endswith((".ipynb", ".xlsx")):
        files.append(file.name)

 

Option 3

If you don’t like the way to match the file names in the above code, you can use fnmatch to do this job. for example: 

import fnmatch
files = []
for file in os.listdir("."):
    if fnmatch.fnmatch(file, "*.ipynb") or fnmatch.fnmatch(file, "*.xlsx"):
        files.append(file)

 

Option 4

Python has a glob module you can use the Unix style of pattern to match the files. To match the files with certain extension, you can simply do the below:

import glob
files = glob.glob("*.ipynb")

And then sort by the file creation from the latest to the earliest:

files.sort(key=os.path.getctime, reverse=True)

if you want match for multiple file extensions, you can do something as below:

files = []
file_types = ("*.ipynb", "*.xlsx")
for file_type in file_types:
    files.extend(glob.glob(file_type))

files.sort(key=os.path.getctime, reverse=True)

As I mentioned earlier, there are far more ways of doing it and it would not be possible to list of all them, so I will just stop here, and please leave your comments if you have better ideas.