Automation

gspread read and write google sheet

Read and write Google Sheet with 5 lines of Python code

Introduction

Google Sheet is a very powerful tool in terms of collaboration, it allows multiple users to work on the same rows of data simultaneously. It also provides fine-grained APIs in various programming languages for your application to connect and interact with Google Sheet. Sometimes when you just need some simple operations like reading/writing data from a sheet, you may wonder if there is any higher level APIs that can complete these simple tasks easily. The short answer is yes. In this article, we will be discussing how can we read/write Google Sheet in 5 lines of Python code.

Prerequisites

As the prerequisite, you will need to have a Google service account in order for you to go through the Google cloud service authentication for your API calls. You can follow the guide from here for a free account setup. Once you have completed all the steps, you shall have a JSON file similar to below which contains your private key for accessing the Google cloud services. You may rename it to “client_secret.json” for our later use.

{
  "type": "service_account",
  "project_id": "new_project",
  "private_key_id": "xxxxxxx",
  "private_key": "-----BEGIN PRIVATE KEY-----\xxxxx\n-----END PRIVATE KEY-----\n",
  "client_email": "xxx@developer.gserviceaccount.com",
  "client_id": "xxx",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/xxx%40developer.gserviceaccount.com"
}

From this JSON file, you can also find the email address for your newly created service account, if you need to access your existing Google Sheet files, you will need to grant access of your files to this email address.

Note: There is a limit of 100 requests per every 100 seconds for the free Google service account, you may need to upgrade your account to the paid account if this free quota is not sufficient for your business.

In addition to the service account, we need another two libraries google-auth and gspread which are the core modules to be used for authentication and manipulating the Google Sheet file.

Below is the pip command to install the two libraries:

pip install gspread
pip install google-auth

Lastly, let’s create a Google Sheet file namely “spreadsheet1” with some sample data from US 2020 election result:

gspread write and read google sheet

Once you have all above ready, let’s dive into our code examples.

Read Google Sheet data into pandas

Let’s first import the necessary libraries at the top of our script:

import gspread
from google.oauth2.service_account import Credentials
import pandas as pd

To get the access to Google Sheet, we will need to define the scope (API endpoint). For our case, we specify the scope to read and write the Google Sheet file.  If you would like to restrict your program from updating any data, you can specify spreadsheets.readonly and drive.readonly in the scope.

scope = ['https://www.googleapis.com/auth/spreadsheets',
        'https://www.googleapis.com/auth/drive']

And then we can build a Credentials object with our JSON file and the above defined scope:

creds = Credentials.from_service_account_file("client_secret.json", scopes=scope)

Next, we call the authorize function from gspread library to pass in our credentials:

client = gspread.authorize(creds)

With this one line of code, it will be going through the authentication under the hood. Once authentication passed, it establishes the connection between your application and the Google cloud service. From there, you can send request to open your spreadsheet file by specifying the file name:

google_sh = client.open("spreadsheet1")

Besides opening file by name, you can also use open_by_key with the sheet ID or open_by_url with the URL of the sheet.

If the proper access has been given to your service account, you would be able to gain the control to your Google Sheet, and you can continue to request to open a particular spreadsheet tab.

For instance, below returns the first sheet of the file:

sheet1 = google_sh.get_worksheet(0)

With the above, you can simply read all records into a dictionary with get_all_records function, and pass into a pandas DataFrame:

df = pd.DataFrame(data=sheet1.get_all_records())

Now if you examine the df object, you shall see the below output:

gspread write and read google sheet

So that’s it! With a few lines of code, you’ve successfully downloaded your data from Google Sheet into pandas, and now you can do whatever you need in pandas.

If you have duplicate column names in your Google Sheet, you may consider to use get_all_values function to get all the values into a list, so that duplicate column remains:

df = pd.DataFrame(data=sheet1.get_all_values())

All the column and row labels will default to RangeIndex as per below:

gspread write and read google sheet

Now let’s take a further look at what else we can achieve with the this library.

Add/Delete work sheets

With gspread, you can easily add new sheets or duplicate from the existing sheets. Below is an example to create a new sheet namely “Sheet2” with max number of rows and columns specified. The index parameter tells Google Sheet where you want to insert your new sheet. index=0 indicates the new sheet to be inserted as the first sheet.

sheet2 = google_sh.add_worksheet(title="Sheet2", rows="10", cols="10", index=0)

Duplicating an existing sheet can be done by specifying the source sheet ID and the new sheet name:

google_sh.duplicate_sheet(source_sheet_id=google_sh.worksheet("Sheet1").id, 
    new_sheet_name="Votes Copy")

Similarly, you can delete an existing sheet by passing in the worksheet object as per below:

google_sh.del_worksheet(sheet2)

If you would like to re-order your worksheets, you can do it with reorder_worksheets function. Assuming you want the sheet2 to be shown before sheet1:

google_sh.reorder_worksheets([sheet2, sheet1])

Read/Write Google Sheet cells

The worksheet object has the row_count and col_count properties which indicate the max rows and columns in the sheet file. But it’s not that useful when you want to know how many rows and columns of actual data you have:

print(sheet1.row_count, sheet1.col_count)
#1000, 26

To have a quick view of the number of rows and columns of your data, you can use:

#Assuming the first row and first column have the full data
print("no. of columns:", len(sheet1.row_values(1)))
#no. of columns: 3
print("no. of rows:", len(sheet1.col_values(1)))
#no. of rows: 8

To access the individual cells, you can either specify the row and column indexes, or use the A1 notation. For instance:

#Access the row 1 and column 2
sheet1.cell(1, 2).value
# or using A1 notation
sheet1.acell('B1').value

Note: the row/column index and A1 notation are all one-based numbers which is similar to the MS excel

Similarly, you can update the value for each individual cell as per below:

sheet1.update_cell(1, 2, "BIDEN VOTES")
#or
sheet1.update_acell("B1", "BIDEN VOTES")

To update multiple cells, you shall use the worksheet update function with the list of cells and values to be updated. For instance, below code will replace the values in row 8:

sheet1.update("A8:C8", [["Texas", 5261485, 5261485]])

Or use batch_update to update multiple ranges:

sheet1.batch_update([{"range": "A8:C8", 
                    "values" : [["Texas", 5261485, 5261485]]},
                     {"range": "A9:C9", 
                    "values" : [["Wisconsin", 1630673, 1610065]]},
                    ])

or use append_rows to insert a row at the last row:

sheet1.append_rows(values=[["Pennsylvania", 3458312, 3376499]])

Besides updating cell values, you can also update the cell format such as the font, background etc. For instance, the below will update the background of the 6th row to red color:

sheet1.format("A6:C6", 
              {"backgroundColor": {
                  "red": 1.0,
                  "green": 0.0,
                  "blue": 0.0,
                  "alpha": 1.0
              }
    })

Note that Google is using RGBA color model, so the color values must be numbers between 0-1.

Below is how it looks like in Google Sheet:

gspread write and read google sheet , format google sheet

Sometimes, it might be difficult to locate the exact index of the cell to be updated. You can find the cell by it’s text with the find function. It will return the first item from the matches.

cell = sheet1.find("Michigan")
print(cell.row, cell.col, cell.value)
#6, 1, 'Michigan'

You can also use Python regular express to find all matches. For instance, to find all cells with text ending as “da”:

import re
query = re.compile(".*da")
cells = sheet1.findall(query)
print(cells)
#[<Cell R4C1 'Florida'>, <Cell R7C1 'Nevada'>]

Add/Remove permission for your Google Sheet

Adding or removing permission for a particular Google Sheet file can be also super easy with gspread. Before adding/removing permission, you shall check who are the users currently have access to your file. You can use list_permission function to retrieve the list of users with their roles:

google_sh.list_permissions()

To give access of your file to other users, you can use:

#perm_type can be : user, group or domain
#role can be : owner, writer or reader
google_sh.share('someone@gmail.com', perm_type='user', role='reader')

When you check your file again, you shall see the email address you shared is added into the list of authorized users.

To revert back the access for a particular user, you can use remove_permissions function. By default, it removes all the access that has been granted to the user:

google_sh.remove_permissions('someone@gmail.com', role="writer")

When the role you’ve specifying does not match with the roles the user currently has, the function returns without doing anything.

Conclusion

Google Sheet API provides comprehensive interfaces for manipulating the sheets from the normal operations like reading/writing of the data, to validation, formatting, building pivot tables and charts etc. Often you find that you may just need some simple APIs to read and write Google Sheet files.

In this article, we have reviewed though the gspread package which provides the high level APIs for working with Google Sheets to serve for this purpose. With gspread, you are able to open existing sheet or create new Google Sheet file, read/write the data as well as do simply formatting. In fact, there are a few other libraries such as gspread-formatting and gspread-pandas which offer extensive functionalities for sheet formatting and interacting sheets with pandas dataframe, you may take a look in case you need something more complicated than what we have covered here.

split or merge PDF files with PyPDF2

Split or merge PDF files with 5 lines of Python code

There are many cases you want to extract a particular page from a big PDF file or merge PDF files into one due to various reasons. You can make use of some PDF editor tools to do this, but you may realize the split or merge functions are usually not available in the free version, or it is too tedious when there are just so many pages or files to be processed. In this article, I will be sharing a simple solution to split or merge multiple PDF files with a few lines of Python code.

Prerequisite

We will be using a Python library called PyPDF2, so you will need to install this package in your working environment. Below is an example with pip:

pip install PyPDF2

Let’s get started

The PyPDF2 package has 4 major classes PdfFileWriter, PdfFileReader, PdfFileMerger and PageObject which looks quite self explanatory from class name itself. If you need to do something more than split or merge PDF pages, you may want to check this document to find out more about what you can do with this library.

Split PDF file

When you want to extract a particular page from the PDF file and make it a separate PDF file, you can use PdfFileReader to read the original file, and then you will be able to get a particular page by it’s page number (page number starts from 0). With the PdfFileWriter, you can use addPage function to add the PDF page into a new PDF object and save it.

Below is the sample code that extracts the first page of the file1.pdf and split it as a separate PDF file named first_page.pdf

from PyPDF2 import PdfFileWriter, PdfFileReader
input_pdf = PdfFileReader("file1.pdf")
output = PdfFileWriter()
output.addPage(input_pdf.getPage(0))
with open("first_page.pdf", "wb") as output_stream:
    output.write(output_stream)

The input_pdf.getPage(0) returns the PageObject which allows you to modify some of the attributes related to the PDF page, such as rotate and scale the page etc. So you may want to understand more from here.

Merge PDF files

To merge multiple PDF files into one file, you can use PdfFileMerger to achieve it. Although you can also do with PdfFileWriter, but PdfFileMerger probably is more straightforward when you do not need to edit the pages before merging them.

Below is the sample code which using append function from PdfFileMerger to append multiple PDF files and write into one PDF file named merged.pdf

from PyPDF2 import PdfFileReader, PdfFileMerger
pdf_file1 = PdfFileReader("file1.pdf")
pdf_file2 = PdfFileReader("file2.pdf")
output = PdfFileMerger()
output.append(pdf_file1)
output.append(pdf_file2)

with open("merged.pdf", "wb") as output_stream:
    output.write(output_stream)

If you do not want to include all pages from your original file, you can specify a tuple with starting and ending page number as pages argument for append function, so that only the pages specified would be add to the new PDF file.

The append function will always add new pages at the end, in case you want to specify the position where you wan to put in your pages, you shall use merge function. It allows you to specify the position of the page where you want to add in the new pages.

Conclusion

PyPDF2 package is a very handy toolkit for editing PDF files. In this article, we have reviewed how we can make use of this library to split or merge PDF files with some sample codes. You can modify these codes to suit your needs in order to automate the task in case you have many files or pages to be processed. There is also a pdfcat script included in this project folder which allows you to split or merge PDF files by calling this script from the command line. You may also want to take a look in case you just simply deal with one or two PDF files each time.

In case you are interested in other topics related to Python automation, you may check here. Thanks for reading.

How to close Windows process with python

When automating some tasks in Windows OS, you may wonder how to automatically close Windows process if you do not have the direct control of the running application or when the application is just running for too long time. In this article, I will be sharing with you how to close the Windows process with some python library, to be more specific, the pywin32 library.

Prerequisites

You will need to install the pywin32 library if you have not yet installed:

pip install pywin32

Find the process name from Windows Task Manager

You will need to first find out the application name which you intend to close, the application name can be found from the Windows task manager. E.g. If you expand the “Windows Command Processor” process, you can see the running process is “cmd.exe”.

python close Windows process

Let’s get started with the code!

Import the below modules that we will be using later:

from win32com.client import GetObject
from datetime import datetime

import os

And we need to get the WMI (Windows Management Instrumentation) service via the below code, where we can further access the window processes. For more information about WMI, please check this.

WMI = GetObject('winmgmts:')

Next, we will use the WMI SQL query to get the processes from the Win32_Process table by passing in the application name. Remember we have already found the application name earlier from the task manager.

 

for p in WMI.ExecQuery('select * from Win32_Process where Name="cmd.exe"'):
    #the date format is something like this 20200613144903.166769+480
    create_dt, *_ = p.CreationDate.split('.')
    diff = datetime.now() - datetime.strptime(create_dt,'%Y%m%d%H%M%S')

There are other properties such as Description, Status, Executable Path, etc. You can check the full list of the process properties from this win32-process documentation. Here we want to base on the creation date to calculate how much time the application has been running to determine if we want to kill it.

Assuming we need to close windows process after it is running for 5 minutes.

    if diff.seconds/60 > 5:		
        print("Terminating PID:", p.ProcessId)
	os.system("taskkill /pid "+str(p.ProcessId))

With this taskkill command, we will be able to terminate all the threads under this Windows process peacefully.

Conclusion

The pywin32 is super powerful python library especially when dealing with the Windows applications. You can use it to read & save attachments from outlook, send emails via outlookopen excel files and some more. Do have a check on these articles.

As per always, welcome any comments or questions.

auto switch browser tabs

How to auto switch browser tabs

Imagine you have a big monitor and you would like to display something from multiple web links, would it be nice if there is a way to auto switch between the multiple browser tabs in a fixed period? In this article, I will be sharing with you how to auto switch browser tabs via selenium, an automated testing tool.

There is a very detailed documentation on the python selenium library, you may want to check this document as the starting point. For this article, I will just walk through the complete code for this automation, so that you can use it as a reference in case you are tying to implement something similar.

Let’s get started!

To auto launch the browser, we need to first download the web driver for the browser. For instance, if you are using chrome browser, you may download the driver file here. Do check your browser version to make sure you download the driver for the correct version.

As the prerequisite, you will also need to run the below command to install the selenium package in your working environment.

pip install selenium

Launch the browser

Then import all the necessary modules into your script. For this article, we will need to use the below modules:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import SessionNotCreatedException

import time
import os, sys

Let’s assume we want to display the below 3 links in your browser and make them auto switching between each other:

url_1 = "https://www.google.com/maps/@1.3085909,103.8403575,14z"
url_2 = "https://weather.com/en-SG/weather/today"
url_3 = "https://edition.cnn.com/"

Assuming you’ve already downloaded the chrome driver file and put it into the current script folder. Then let’s start to initiate the web driver to launch the browser:

options = Options()
options.add_experimental_option('useAutomationExtension', False)

try:	
	driver = webdriver.Chrome(executable_path=os.getcwd() + "\\chromedriver.exe", options=options)
except SessionNotCreatedException as e:
	print(e)
	print("please upgrade the chromedriver.exe from https://chromedriver.chromium.org/downloads")
	sys.exit(1)

You may wonder why we need a options parameter here?  It’s actually optional, but you may see the “Loading of unpacked extensions is disabled by the administrator” warning without setting useAutomationExtension to False. There are plenty of other options to control the browser behavior, check here for the documentation.

As frequently you will see there is a new version of chrome, and it may not work with old driver file anymore. So, it’s better we catch this exception and show some error message to guide users to upgrade the driver.

You can set the chrome window position by doing the below, but it does not matter if you wish to maximize the window later.

driver.set_window_position(2000, 1)

Let’s open the first link and maximize our window (This also can be done by options.addArguments("start-maximized")). And we want to execute some JavaScript to zoom out a bit so that we can see clearly.

#open window 1
driver.get(url_1)
driver.maximize_window()
driver.execute_script("document.body.style.zoom='120%'")
time.sleep(1)

To open the second tab, we need to use JavaScript to open a blank tab, and switch the active tab to the second tab. The driver.window_handles keeps a list of handlers for the opened windows, so window_handles[1] refers to the second tab.

driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])

Next, we will open the second link. And for this tab, let’s scroll down 300px to skip the ads second at the page header.

#open second link
driver.get(url_2)
driver.execute_script("document.body.style.zoom='90%'")
driver.execute_script("window.scrollBy(0,300);")
time.sleep(1)

Similarly, we can open the third tab with the below code:

#open window 3
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[2])
driver.get(url_3)		
driver.execute_script("document.body.style.zoom='90%'")
driver.execute_script("window.scrollBy(0,200);")
time.sleep(1)

Auto switch between tabs

Once everything is ready, we shall write the logic to auto switch between the different tabs at certain interval. To do that, we need to know how to perform the below 3 things:

  • Identify what is the active link showing now

We can use driver.title attribute to check if the page title contains certain keyword for the particular website, so that we know which page is active now

  • Switch to a new tab

We can continue to use driver.switch_to.window to switch the tab, but we need to have logic to determine which is the next tab we want to switch to

  • Refresh the page (in case there is any updates)

We can use driver.refresh() to refresh the page, but we will lose the setting such as zooming in/out, so we need to set it again

So let’s take a look at the complete code:

nextIndex = 2

start = time.time()

while True:
	
	#stop running after 5 minutes
	if (time.time() - start >= 5*60):
		break
		
	if "Google Maps" in driver.title:
		driver.refresh()
		driver.execute_script("document.body.style.zoom='120%'")
		time.sleep(3)
		nextIndex = 0 if nextIndex + 1 > 2 else nextIndex + 1
		
	elif "CNN" in driver.title:
		driver.refresh()
		driver.execute_script("document.body.style.zoom='90%'")
		time.sleep(5)
		nextIndex = 0 if nextIndex + 1 > 2 else nextIndex + 1
		
	elif "Weather" in driver.title:
		driver.refresh()
		driver.execute_script("document.body.style.zoom='90%'")
		time.sleep(2)
		nextIndex = 0 if nextIndex + 1 > 2 else nextIndex + 1
		
	driver.switch_to.window(driver.window_handles[nextIndex])

So each of the tab will be active for a few seconds before switching to the next tab. And after 5 minutes, this loop will be stopped.

If we wish to close all tabs at the end of the script, we can perform the below:

for window in driver.window_handles:
	driver.switch_to.window(window)
	driver.close()

So that’s it and congratulations that you have completed a new automation project to auto switch browser tabs for Chrome. As per always, welcome any comments or questions.

python send email with attachment via smtplib

How to send email with attachment via python smtplib

In one of my previous article, I have discussed about how to send email from outlook application. That has assumed you have already installed outlook and configured your email account on the machine where you want to run your script. In this article, I will be sharing with you how to automatically send email with attachments via lower level API, to be more specific, by using python smtplib where you do not need to set up anything in your environment to make it work.

For this article, I will demonstrate to you to send a HTML format email from a gmail account with some attachment. So besides the smtplib module, we will need to use another two modules – ssl and email.

Let’s get started!

First, you will need to find out the SMTP server and port info to send email via google account. You can find this information from this link. For your easy reading, I have captured in the below screenshot.

codeforests - google smtp server configuration info

So we are going to use the server: smtp.gmail.com and port 587 for our case. (you may search online to find out more info about the SSL & TLS, we will not discuss much about it in this article)

Let’s start to import all the modules we need:

import smtplib, ssl
from email.mime.multipart import MIMEMultipart 
from email.mime.text import MIMEText 
from email.mime.application import MIMEApplication

As we are going to send the email in HTML format (which are you able to unlock a lot features such as adding in styles, drawing tables etc.), we will need to use the MIMEText. And also the MIMEMultipart and MIMEApplication for the attachment.

Build up the email message

To build up our email message, we need to create mixed type MIMEMultipart object so that we can send both text and attachment. And next, we shall specify the from, to, cc and subject attributes.

smtp_server = 'smtp.gmail.com'
smtp_port = 587 
#Replace with your own gmail account
gmail = 'yourmail@gmail.com'
password = 'your password'

message = MIMEMultipart('mixed')
message['From'] = 'Contact <{sender}>'.format(sender = gmail)
message['To'] = 'contact@company.com'
message['CC'] = 'contact@company.com'
message['Subject'] = 'Hello'

You probably do not want anybody can see your hard coded password here, you may consider to put this email account info into a separate configuration file. Check my another post on the read/write configuration files.

For the HTML message content, we will wrap it into the MIMEText, and then attach it to our MIMEMultipart message:

msg_content = '<h4>Hi There,<br> This is a testing message.</h4>\n'
body = MIMEText(msg_content, 'html')
message.attach(body)

Let’s assume you want to attach a pdf file from your c drive, you can read it in binary mode and pass it into MIMEApplication with MIME type as pdf. Take note on the additional header where you need to specify the name your attachment file.

attachmentPath = "c:\\sample.pdf"
try:
	with open(attachmentPath, "rb") as attachment:
		p = MIMEApplication(attachment.read(),_subtype="pdf")	
		p.add_header('Content-Disposition', "attachment; filename= %s" % attachmentPath.split("\\")[-1]) 
		message.attach(p)
except Exception as e:
	print(str(e))

If you have a list of the attachments, you can loop through the list and attach them one by one with the above code.

Once everything is set properly, we can convert the message object into to a string:

msg_full = message.as_string()

Send email

Here comes to the most important part, we will need to initiate the TLS context and use it to communicate with SMTP server.

context = ssl.create_default_context()

And we will initialize the connection with SMTP server and set the TLS context, then start the handshaking process.

Next it authenticate our gmail account, and in the send mail method, you can specify the sender, to and cc (as a list), as well as the message string. (cc is optional)

with smtplib.SMTP(smtp_server, smtp_port) as server:
	server.ehlo()  
	server.starttls(context=context)
	server.ehlo()
	server.login(gmail, password)
	server.sendmail(gmail, 
				to.split(";") + (cc.split(";") if cc else []),
				msg_full)
	server.quit()

print("email sent out successfully")

Once sendmail completed, you will disconnect with the server by server.quit().

With all above, you shall be able to receive the email triggered from your code. You may want to wrap these codes into a class, so that you can reuse it as service library in your multiple projects.

 

As per always, please share if you have any questions or comments.