Google Sheet is a very powerful tool in terms of collaboration, it allows multiple users to work on the same rows of data simultaneously. It also provides fine-grained APIs in various programming languages for your application to connect and interact with Google Sheet. Sometimes when you just need some simple operations like reading/writing data from a sheet, you may wonder if there is any higher level APIs that can complete these simple tasks easily. The short answer is yes. In this article, we will be discussing how can we read/write Google Sheet in 5 lines of Python code.
As the prerequisite, you will need to have a Google service account in order for you to go through the Google cloud service authentication for your API calls. You can follow the guide from here for a free account setup. Once you have completed all the steps, you shall have a JSON file similar to below which contains your private key for accessing the Google cloud services. You may rename it to “client_secret.json” for our later use.
"private_key": "-----BEGIN PRIVATE KEY-----\xxxxx\n-----END PRIVATE KEY-----\n",
From this JSON file, you can also find the email address for your newly created service account, if you need to access your existing Google Sheet files, you will need to grant access of your files to this email address.
Note: There is a limit of 100 requests per every 100 seconds for the free Google service account, you may need to upgrade your account to the paid account if this free quota is not sufficient for your business.
In addition to the service account, we need another two libraries google-auth and gspread which are the core modules to be used for authentication and manipulating the Google Sheet file.
Below is the pip command to install the two libraries:
pip install gspread
pip install google-auth
Lastly, let’s create a Google Sheet file namely “spreadsheet1” with some sample data from US 2020 election result:
Once you have all above ready, let’s dive into our code examples.
Read Google Sheet data into pandas
Let’s first import the necessary libraries at the top of our script:
from google.oauth2.service_account import Credentials
import pandas as pd
To get the access to Google Sheet, we will need to define the scope (API endpoint). For our case, we specify the scope to read and write the Google Sheet file. If you would like to restrict your program from updating any data, you can specify spreadsheets.readonly and drive.readonly in the scope.
scope = ['https://www.googleapis.com/auth/spreadsheets',
And then we can build a Credentials object with our JSON file and the above defined scope:
creds = Credentials.from_service_account_file("client_secret.json", scopes=scope)
Next, we call the authorize function from gspread library to pass in our credentials:
client = gspread.authorize(creds)
With this one line of code, it will be going through the authentication under the hood. Once authentication passed, it establishes the connection between your application and the Google cloud service. From there, you can send request to open your spreadsheet file by specifying the file name:
google_sh = client.open("spreadsheet1")
Besides opening file by name, you can also use open_by_key with the sheet ID or open_by_url with the URL of the sheet.
If the proper access has been given to your service account, you would be able to gain the control to your Google Sheet, and you can continue to request to open a particular spreadsheet tab.
For instance, below returns the first sheet of the file:
sheet1 = google_sh.get_worksheet(0)
With the above, you can simply read all records into a dictionary with get_all_records function, and pass into a pandas DataFrame:
df = pd.DataFrame(data=sheet1.get_all_records())
Now if you examine the df object, you shall see the below output:
So that’s it! With a few lines of code, you’ve successfully downloaded your data from Google Sheet into pandas, and now you can do whatever you need in pandas.
If you have duplicate column names in your Google Sheet, you may consider to use get_all_values function to get all the values into a list, so that duplicate column remains:
df = pd.DataFrame(data=sheet1.get_all_values())
All the column and row labels will default to RangeIndex as per below:
Now let’s take a further look at what else we can achieve with the this library.
Add/Delete work sheets
With gspread, you can easily add new sheets or duplicate from the existing sheets. Below is an example to create a new sheet namely “Sheet2” with max number of rows and columns specified. The index parameter tells Google Sheet where you want to insert your new sheet. index=0 indicates the new sheet to be inserted as the first sheet.
sheet2 = google_sh.add_worksheet(title="Sheet2", rows="10", cols="10", index=0)
Duplicating an existing sheet can be done by specifying the source sheet ID and the new sheet name:
Similarly, you can delete an existing sheet by passing in the worksheet object as per below:
If you would like to re-order your worksheets, you can do it with reorder_worksheets function. Assuming you want the sheet2 to be shown before sheet1:
Read/Write Google Sheet cells
The worksheet object has the row_count and col_count properties which indicate the max rows and columns in the sheet file. But it’s not that useful when you want to know how many rows and columns of actual data you have:
To have a quick view of the number of rows and columns of your data, you can use:
#Assuming the first row and first column have the full data
print("no. of columns:", len(sheet1.row_values(1)))
#no. of columns: 3
print("no. of rows:", len(sheet1.col_values(1)))
#no. of rows: 8
To access the individual cells, you can either specify the row and column indexes, or use the A1 notation. For instance:
#Access the row 1 and column 2
# or using A1 notation
Note: the row/column index and A1 notation are all one-based numbers which is similar to the MS excel
Similarly, you can update the value for each individual cell as per below:
sheet1.update_cell(1, 2, "BIDEN VOTES")
sheet1.update_acell("B1", "BIDEN VOTES")
To update multiple cells, you shall use the worksheet update function with the list of cells and values to be updated. For instance, below code will replace the values in row 8:
sheet1.update("A8:C8", [["Texas", 5261485, 5261485]])
Or use batch_update to update multiple ranges:
"values" : [["Texas", 5261485, 5261485]]},
"values" : [["Wisconsin", 1630673, 1610065]]},
or use append_rows to insert a row at the last row:
sheet1.append_rows(values=[["Pennsylvania", 3458312, 3376499]])
Besides updating cell values, you can also update the cell format such as the font, background etc. For instance, the below will update the background of the 6th row to red color:
Note that Google is using RGBA color model, so the color values must be numbers between 0-1.
Below is how it looks like in Google Sheet:
Sometimes, it might be difficult to locate the exact index of the cell to be updated. You can find the cell by it’s text with the find function. It will return the first item from the matches.
cell = sheet1.find("Michigan")
print(cell.row, cell.col, cell.value)
#6, 1, 'Michigan'
You can also use Python regular express to find all matches. For instance, to find all cells with text ending as “da”:
query = re.compile(".*da")
cells = sheet1.findall(query)
#[<Cell R4C1 'Florida'>, <Cell R7C1 'Nevada'>]
Add/Remove permission for your Google Sheet
Adding or removing permission for a particular Google Sheet file can be also super easy with gspread. Before adding/removing permission, you shall check who are the users currently have access to your file. You can use list_permission function to retrieve the list of users with their roles:
To give access of your file to other users, you can use:
#perm_type can be : user, group or domain
#role can be : owner, writer or reader
google_sh.share('email@example.com', perm_type='user', role='reader')
When you check your file again, you shall see the email address you shared is added into the list of authorized users.
To revert back the access for a particular user, you can use remove_permissions function. By default, it removes all the access that has been granted to the user:
When the role you’ve specifying does not match with the roles the user currently has, the function returns without doing anything.
Google Sheet API provides comprehensive interfaces for manipulating the sheets from the normal operations like reading/writing of the data, to validation, formatting, building pivot tables and charts etc. Often you find that you may just need some simple APIs to read and write Google Sheet files.
In this article, we have reviewed though the gspread package which provides the high level APIs for working with Google Sheets to serve for this purpose. With gspread, you are able to open existing sheet or create new Google Sheet file, read/write the data as well as do simply formatting. In fact, there are a few other libraries such as gspread-formatting and gspread-pandas which offer extensive functionalities for sheet formatting and interacting sheets with pandas dataframe, you may take a look in case you need something more complicated than what we have covered here.