python visualize google trends data with word cloud

Python – Visualize Google Trends Data in Word Cloud

Christmas is just around the corner, the snowfall, beautiful festive lights and joyful songs from the last year still floating in your mind. But this year, things are getting unusual due to the Covid-19. A lot of celebration events are cancelled or suspended and people are advised to avoid gathering and stay at home as much as possible. Although staying at home became new norm, there is still a way that we can get to know what people are thinking about during this festive season since nowadays most of us search a lot from Google every day. With a few lines of Python code, we will be able to extract and visualize the data from Google Trends.

Let’s dive into the code examples.

Python to get Google trends data

To get the search trends from Google, we will need to use a Python package – pytrends. It’s not an official API for Google trends but It provides a convenience way to automatically download Google trends data same as what we can do manually from Google Trends website.

You can use the pip command to install the package:

pip install --upgrade pytrends

And import the necessary modules at the beginning of our code:

from pytrends.request import TrendReq

To use it, we can initiate the request object by providing the language for searching as well as the time zone information. For instance, I am specifying English as the language and time zone offset as -480 which is UCT + 8 in the below. The default value for this time zone offset is 360 (CST), so you can roughly see how this offset is calculated based on the UCT time zone.

pytrend = TrendReq(hl='en-US', tz=-480)

To get the search trends for a particular keyword, we shall specify it in a keyword list. For example, we use “christmas” to see what people have searched in Google related to this keyword. There are a few more parameters you need to specify in the build_payload function in order to narrow down the results:

cat – The category you are interested in, you can see the full list here.

timeframe – The date range when the search happened. You can specify the range as past X hours/days/months/years (the list of available options you can see from Google Trends web page) or even a specific start date and end date. For our case, we use “now 7-d” for the past 7 days.

geo –  The geolocation which can be two characters country code or leave it empty to see the results from globally

gprop – The source which you can leave it as empty for web search, other options can be images, news, youtube, or froogle

Let’s build up our query as per below:

kw_list = ["christmas"]
pytrend.build_payload(kw_list, cat=0, timeframe='now 7-d', geo='SG', gprop='')

With all these criteria, we can check what are the related topics people searched in Google from Singapore. The related_queries function will give you a dictionary of both top & rising queries related to the keywords:

trends = pytrend.related_queries()

If you examine the trends variable, you shall see something similar to below:

python visualize google trends data with word cloud


The dictionary consists of results for both “top” and “rising” results in pandas dataframe objects, and you can access the top queries as per below:

df_sg = trends["christmas"]["top"]

Examine the first a few records in df_sg, you can see that people in Singapore are still in celebration mood as most of records are related to greetings, light shows or gifts etc.

python visualize google trends data with word cloud

On the other hand, let’s also take a look at the search trends for UK since It has just announced some new restrictions on travelling recently.

pytrend.build_payload(kw_list, cat=0, timeframe='now 7-d', geo='GB', gprop='')
trends = pytrend.related_queries()
df_gb = trends["christmas"]["top"]

Examining the df_gb variable, you can see some people started worrying about the new rules and restrictions for this Christmas although majority of the searching results are still around of the festival celebration.

python visualize google trends data with word cloud


Visualize the results in word cloud

Since we have all the keywords and popularity that people used for search, the most straightforward to visualize them would be using word cloud to generate a picture. To do so, we will need use another python package – wordcloud which is a pure Python library for generating word cloud image. And you also need to use some supporting packages like PIL and numpy for manipulating the images.

You can use pip command to install these packages if you do not have them yet:

pip install --upgrade wordcloud
pip install Pillow==2.2.2
pip install --upgrade numpy

Let’s import all the necessary modules into our code:

from wordcloud import WordCloud, ImageColorGenerator, STOPWORDS
from PIL import Image
import os
import numpy as np

From previous section, we have already got the search keywords in dataframe. wordcloud supports both text string and words frequencies, for simplicity, let’s convert only keywords into a space separated string and forget about the value (popularity).

text = ' '.join(df_sg["query"].to_list())

And as all the keywords contain “christmas”, we shall filter out this word before generating the word cloud. In wordcloud package, it has a list of predefined words to be excluded,  and you can append more words to be excluded as per the below:

stopwords = set(STOPWORDS) 

Now let’s use this featured image as our background for generating word cloud. We shall load it as a 3-demensional array as the background mask for later use:

bg_mask = np.array(, "christmas tree.jpg")))

With all these ready, we can initiate a word cloud object with below parameters. The name of the parameters are quite self-explanatory, so I will not go through them one by one. You can check the official document from here.

wc = WordCloud(
    width = 600, 
    height = 1000,    
    background_color = 'white',
    colormap = 'rainbow',
    mask = bg_mask,
    stopwords = stopwords,
    max_words = 1000,
    max_font_size = 150,
    min_font_size = 15,
    contour_width = 2, 
    contour_color = 'dodgerblue'

Then we can supply our words to the generate_from_text function which will process the text and generate the image. Next we can save the output into an image file as per below code:


When opening the output image file, you shall see something like the below. Isn’t that cool?

python visualize google trends data with word cloud

Similarly, when you pass the UK searching result and generate the word cloud, you would see “covid” and “rules” are most concerned by UK people.

python visualize google trends data with word cloud

Note: since we are passing through a text string, the frequency is based on how many times the words repeated rather than the popularity from Google.


In this article, we have discussed how to use pytrends to automatically get the Google search data for any particular keyword and then use wordcloud to visualize the information. It only covers some basic usage of these two packages, you may check further on their documents to understand what else are provided in these packages. One thing to take note is that pytrends is using some scrapping techniques to get the data from Google Trends, it may break when there is any structural change in the way that Google makes the requests or sends the response. So frequent code upgrade is required by the project team. By the way, they are looking for maintainers, just in case you are interested.


web scraping with python requests and lxml

Web Scraping From Scratch With 3 Simple Steps


Web scraping or crawling refers to the technique to extract the information from a website and transform into structured data for later analysis. There are generally a few reasons that you may need to implement a web scraping scripts to automate the data collection process:

  • There isn’t any public API available for you to get data from the source sites
  • The information is updated from time to time, such as the exchange rate, you cannot manage it in a manual way
  • The final data you need is piecemeal from multiple sites; and so on

Before you decide to implement a scraping script, you will also need to check to be sure that you are not violating the term of use for the data you are going to scrape. Some sites are against the scraping robot. This article is intended for education purpose to help you to understand the overall processes of web scraping, so we will assume you already know the implication of the web scraping and possible legal issues on how the data is used.

Scraping a website sometimes can be difficult depends on how the target website is designed and where the data is resided. But generally you can split the process into 3 steps. Let’s walk through them one by one.

Understand the structure of your target website

As the first step, you shall take a quick look at your target website to see how the front end interacts with the backend, and how the data is populated to the web page. To keep our example simple, let’s assume user authentication is not required and our target is to extract the price change for the top 20 cryptocurrencies from coindesk for further analysis.

The first thing we shall do is to understand how this information is organized on the website. Below is the screenshot of the data presented on the web page:

web scraping with python requests and lxml

In Chrome browser, if you right click on the web page to inspect the HTML elements, you shall see that the entire data table is under <section class=”cex-table”>…</section>. You can verify this by hovering your mouse to this element, you would see there is a light blue overlay on the data table as per below:web scraping in python with requests and lxml

Next, you may want to inspect each text field on the page to further understand how the table header and records are arranged. For instance, when you check the “Asset” text field, you would see the below HTML structure:

<section class="cex-table">
	<section class="thead">
		<div class="tr-wrapper">
			<div class="tr-left">
				<div class="tr">
					<div style="flex:7" class="th">
						<span class="cell">
						<i class="sorting-icon">
						<span class="cell-text">Asset</span>

And similarly you can find the structure of the first row in the table body as per below:

<section class="tbody">
	<section class="tr-section">
		<a href="/price/bitcoin">
			<div class="tr-wrapper">
				<div class="tr-left">
					<div class="tr">
						<div style="flex:2" class="td">
							<span class="cell cell-rank">
						<div style="flex:7" class="td">
							<span class="cell cell-asset">
							<strong class="cell-asset-title">Bitcoin</strong>
							<span class="cell-asset-iso">BTC</span>

You may notice that majority of these HTML elements does not have a id or name attribute as the unique identifier, but the style sheet (“class” attribute) is quite consistent for the same row of data. So in this case, we shall consider to use the style sheet as a reference to find our data elements.

Locate and parse the target data element with XPath

With the initial understanding on HTML structure of our target website, we shall start to find a way to locate the data elements programmatically.

For this demonstration, we will use requests and lxml libraries to send the http requests and parse the results. There are other package for parsing DOM such as beautifulsoup, but personally I find using XPath expression is more straightforward when locating an element although the syntax may not as intuitive as the way beautifulsoup does.

Below is the pip command if you do not have these two packages installed:

pip install requests
pip install lxml

Let’s import the packages and send a GET request to our target URL:

import requests
from lxml import html

target_url = ""
result = requests.get(target_url)

Our target URL does not require any parameters, in case you need to pass in parameters, you can pass via the params argument as per below:

payload = {"q" : "bitcoin", "s" : "relevant"}
result = requests.get("", params=payload)

The result is a response object which has a status_code attribute to indicate if correct response has been returned from the target website. To simplify the code, let’s assume we can always get the correct response with the return HTML in string format from the text attribute.

We then pass our HTML string to lxml and use it to parse the DOM tree as per below:

tree = html.fromstring(result.text)

Now we come to the most important step, we will need to use XPath syntax to locate the data elements we want and extract the data out.

Since the id or name attributes are not available for these elements, we will need to use the style sheet to locate our data elements. To locate the table header, we need to perform the below:

  • Find the section tag with style sheet class as “cex-table” from the entire DOM
  • Find its child section node with style sheet class as “thead
  • Further find its child div node with style sheet as “tr-wrapper

Below is how the syntax looks like in XPath:

table_header = tree.xpath("//section[@class='cex-table']/section[@class='thead']/div[@class='tr-wrapper']")

It will scan through the entire DOM tree to find if any element matches this structure and return a list of nodes matched.

If everything goes well, the table_header list should only contain 1 element which is the div with “tr-wrapper” style sheet. Sometimes if it returns multiple nodes, you may need recheck your path expression to see how you can fine-tune it to get only the unique node that you need.

From the wrapper div, there are still a few levels before we can reach to the node with the text. But you may notice that all the data fields we need are under the span tag which has a style name “cell-text“. So we can actually locate all these span tags with CSS class and extract its text with text() function. Below is how it works in XPath expression:

headers = table_header[0].xpath(".//span[@class='cell']/span[@class='cell-text']/text()")

Note that “.” means to start from current node, and “//” is to indicate the following path expression is relative path

If you examine the headers now, you can see all the column headers are extracted into a list as per below:

 'Market Cap',
 'Total Exchange Volume',
 'Returns (24h)',
 'Total Supply',
 'Value Proposition',
 'Consensus Mechanism']

Let’s continue to move the table body. Following the same logic, we shall be able to locate to the section with “tr-section” in below syntax:

table_body = tree.xpath("//section[@class='cex-table']/section[@class='tbody']/section[@class='tr-section']")

This means that we have already collected all the nodes for rows in the table body. We can now loop through the rows to get the elements. We will use the style sheet to locate our elements, but for the “Asset” column, it actually contains a few child nodes with different style sheet, so we need to handle them separately from the rest of the columns. Below is the code to extract the data row by row and add it into a record list:

records = []
for row in table_body:    
    tokens = row.xpath(".//span[contains(@class, 'cell-asset-iso')]/text()")
    ranks = row.xpath(".//span[contains(@class, 'cell-rank')]/strong/text()")
    assets = row.xpath(".//span[contains(@class, 'cell-asset')]/strong/text()")
    spans = row.xpath(".//div[contains(@class,'tr-right-wrapper')]/div/span[contains(@class, 'cell')]")
    rest_cols = [span.text_content().strip() for span in spans]
    row_data = ranks + tokens + assets + rest_cols

Note that we are using “contains” in order to match the node with class like cell cell-rank“, and use text_content() to extract all the text from its current nodes and child nodes.

Occasionally you may find that the number of columns we extracted does not tally with the original column header due to header column merged or hidden, such as our above ranking and token ticker column. So let’s also give them column name as “Rank” and “Token”:

column_header = ["Rank", "Token"] + headers

Save the scraping result

With both the header and data ready, we can easily load the data into pandas as per below:

import pandas as pd
df = pd.DataFrame(records, columns=column_header)

You can see the below result in pandas dataframe, which looks pretty good except some formatting to be done to convert all the amount into proper number format.

web scraping to get cryptocurrency price

Or you can also write the scrapped data into a csv file with the csv module:

import csv
with open("token_price.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    for row in records:

Limitations & Constraints

In your real scraping project, you may encounter more complicated scenarios rather than directly getting the data from a GET request. So it’s better to understand how are the constraints/limitations for our above mentioned approach.

  • Go through the authentication process can be time-consuming with requests

If your target website requires authentication before you can retrieve the data, you may need to create a session and send multiple POST/GET requests to the server in order to get yourself authorized. Depends on how complicated the authentication process is, you will need to understand what are the parameters to be supplied and how the requests are chained together. This process may take some time and effort.

  • You cannot trigger JavaScript code to get your data

If the response from your target website returns some JavaScript code to populate the data, or you need to trigger some JavaScript function in order to have the data populated on the web page, you may find requests package simply would not work.

For both scenarios, you may consider to use selenium which I have mentioned in one of my past post. It has a headless mode where you can simulate user’s action such as key in user credentials or click buttons without actually showing the browser, and you can also execute JavaScript code to interact with the web page. The downside is that you will have to periodically upgrade your driver file to match with the browser’s version.


In this article, we have reviewed through a very basic example to scrape data with requests and lxml packages, and we have also discussed a few limitations where you may start looking for alternatives such as selenium or even the scrapy framework in case you have more complicated scenarios to be handled. No matter which libraries you choose to use, the fundamental remains the same. Hope this article gives you some hints on how to start your web scraping journey.


gspread read and write google sheet

Read and write Google Sheet with 5 lines of Python code


Google Sheet is a very powerful tool in terms of collaboration, it allows multiple users to work on the same rows of data simultaneously. It also provides fine-grained APIs in various programming languages for your application to connect and interact with Google Sheet. Sometimes when you just need some simple operations like reading/writing data from a sheet, you may wonder if there is any higher level APIs that can complete these simple tasks easily. The short answer is yes. In this article, we will be discussing how can we read/write Google Sheet in 5 lines of Python code.


As the prerequisite, you will need to have a Google service account in order for you to go through the Google cloud service authentication for your API calls. You can follow the guide from here for a free account setup. Once you have completed all the steps, you shall have a JSON file similar to below which contains your private key for accessing the Google cloud services. You may rename it to “client_secret.json” for our later use.

  "type": "service_account",
  "project_id": "new_project",
  "private_key_id": "xxxxxxx",
  "private_key": "-----BEGIN PRIVATE KEY-----\xxxxx\n-----END PRIVATE KEY-----\n",
  "client_email": "",
  "client_id": "xxx",
  "auth_uri": "",
  "token_uri": "",
  "auth_provider_x509_cert_url": "",
  "client_x509_cert_url": ""

From this JSON file, you can also find the email address for your newly created service account, if you need to access your existing Google Sheet files, you will need to grant access of your files to this email address.

Note: There is a limit of 100 requests per every 100 seconds for the free Google service account, you may need to upgrade your account to the paid account if this free quota is not sufficient for your business.

In addition to the service account, we need another two libraries google-auth and gspread which are the core modules to be used for authentication and manipulating the Google Sheet file.

Below is the pip command to install the two libraries:

pip install gspread
pip install google-auth

Lastly, let’s create a Google Sheet file namely “spreadsheet1” with some sample data from US 2020 election result:

gspread write and read google sheet

Once you have all above ready, let’s dive into our code examples.

Read Google Sheet data into pandas

Let’s first import the necessary libraries at the top of our script:

import gspread
from google.oauth2.service_account import Credentials
import pandas as pd

To get the access to Google Sheet, we will need to define the scope (API endpoint). For our case, we specify the scope to read and write the Google Sheet file.  If you would like to restrict your program from updating any data, you can specify spreadsheets.readonly and drive.readonly in the scope.

scope = ['',

And then we can build a Credentials object with our JSON file and the above defined scope:

creds = Credentials.from_service_account_file("client_secret.json", scopes=scope)

Next, we call the authorize function from gspread library to pass in our credentials:

client = gspread.authorize(creds)

With this one line of code, it will be going through the authentication under the hood. Once authentication passed, it establishes the connection between your application and the Google cloud service. From there, you can send request to open your spreadsheet file by specifying the file name:

google_sh ="spreadsheet1")

Besides opening file by name, you can also use open_by_key with the sheet ID or open_by_url with the URL of the sheet.

If the proper access has been given to your service account, you would be able to gain the control to your Google Sheet, and you can continue to request to open a particular spreadsheet tab.

For instance, below returns the first sheet of the file:

sheet1 = google_sh.get_worksheet(0)

With the above, you can simply read all records into a dictionary with get_all_records function, and pass into a pandas DataFrame:

df = pd.DataFrame(data=sheet1.get_all_records())

Now if you examine the df object, you shall see the below output:

gspread write and read google sheet

So that’s it! With a few lines of code, you’ve successfully downloaded your data from Google Sheet into pandas, and now you can do whatever you need in pandas.

If you have duplicate column names in your Google Sheet, you may consider to use get_all_values function to get all the values into a list, so that duplicate column remains:

df = pd.DataFrame(data=sheet1.get_all_values())

All the column and row labels will default to RangeIndex as per below:

gspread write and read google sheet

Now let’s take a further look at what else we can achieve with the this library.

Add/Delete work sheets

With gspread, you can easily add new sheets or duplicate from the existing sheets. Below is an example to create a new sheet namely “Sheet2” with max number of rows and columns specified. The index parameter tells Google Sheet where you want to insert your new sheet. index=0 indicates the new sheet to be inserted as the first sheet.

sheet2 = google_sh.add_worksheet(title="Sheet2", rows="10", cols="10", index=0)

Duplicating an existing sheet can be done by specifying the source sheet ID and the new sheet name:

    new_sheet_name="Votes Copy")

Similarly, you can delete an existing sheet by passing in the worksheet object as per below:


If you would like to re-order your worksheets, you can do it with reorder_worksheets function. Assuming you want the sheet2 to be shown before sheet1:

google_sh.reorder_worksheets([sheet2, sheet1])

Read/Write Google Sheet cells

The worksheet object has the row_count and col_count properties which indicate the max rows and columns in the sheet file. But it’s not that useful when you want to know how many rows and columns of actual data you have:

print(sheet1.row_count, sheet1.col_count)
#1000, 26

To have a quick view of the number of rows and columns of your data, you can use:

#Assuming the first row and first column have the full data
print("no. of columns:", len(sheet1.row_values(1)))
#no. of columns: 3
print("no. of rows:", len(sheet1.col_values(1)))
#no. of rows: 8

To access the individual cells, you can either specify the row and column indexes, or use the A1 notation. For instance:

#Access the row 1 and column 2
sheet1.cell(1, 2).value
# or using A1 notation

Note: the row/column index and A1 notation are all one-based numbers which is similar to the MS excel

Similarly, you can update the value for each individual cell as per below:

sheet1.update_cell(1, 2, "BIDEN VOTES")
sheet1.update_acell("B1", "BIDEN VOTES")

To update multiple cells, you shall use the worksheet update function with the list of cells and values to be updated. For instance, below code will replace the values in row 8:

sheet1.update("A8:C8", [["Texas", 5261485, 5261485]])

Or use batch_update to update multiple ranges:

sheet1.batch_update([{"range": "A8:C8", 
                    "values" : [["Texas", 5261485, 5261485]]},
                     {"range": "A9:C9", 
                    "values" : [["Wisconsin", 1630673, 1610065]]},

or use append_rows to insert a row at the last row:

sheet1.append_rows(values=[["Pennsylvania", 3458312, 3376499]])

Besides updating cell values, you can also update the cell format such as the font, background etc. For instance, the below will update the background of the 6th row to red color:

              {"backgroundColor": {
                  "red": 1.0,
                  "green": 0.0,
                  "blue": 0.0,
                  "alpha": 1.0

Note that Google is using RGBA color model, so the color values must be numbers between 0-1.

Below is how it looks like in Google Sheet:

gspread write and read google sheet , format google sheet

Sometimes, it might be difficult to locate the exact index of the cell to be updated. You can find the cell by it’s text with the find function. It will return the first item from the matches.

cell = sheet1.find("Michigan")
print(cell.row, cell.col, cell.value)
#6, 1, 'Michigan'

You can also use Python regular express to find all matches. For instance, to find all cells with text ending as “da”:

import re
query = re.compile(".*da")
cells = sheet1.findall(query)
#[<Cell R4C1 'Florida'>, <Cell R7C1 'Nevada'>]

Add/Remove permission for your Google Sheet

Adding or removing permission for a particular Google Sheet file can be also super easy with gspread. Before adding/removing permission, you shall check who are the users currently have access to your file. You can use list_permission function to retrieve the list of users with their roles:


To give access of your file to other users, you can use:

#perm_type can be : user, group or domain
#role can be : owner, writer or reader
google_sh.share('', perm_type='user', role='reader')

When you check your file again, you shall see the email address you shared is added into the list of authorized users.

To revert back the access for a particular user, you can use remove_permissions function. By default, it removes all the access that has been granted to the user:

google_sh.remove_permissions('', role="writer")

When the role you’ve specifying does not match with the roles the user currently has, the function returns without doing anything.


Google Sheet API provides comprehensive interfaces for manipulating the sheets from the normal operations like reading/writing of the data, to validation, formatting, building pivot tables and charts etc. Often you find that you may just need some simple APIs to read and write Google Sheet files.

In this article, we have reviewed though the gspread package which provides the high level APIs for working with Google Sheets to serve for this purpose. With gspread, you are able to open existing sheet or create new Google Sheet file, read/write the data as well as do simply formatting. In fact, there are a few other libraries such as gspread-formatting and gspread-pandas which offer extensive functionalities for sheet formatting and interacting sheets with pandas dataframe, you may take a look in case you need something more complicated than what we have covered here.

The key for understanding python positional and keyword arguments

In one of the previous article, we have summarized the different ways of passing arguments to Python script. In this article, we will be reviewing through the various approaches for defining and passing arguments into a function in python.

First, let’s start from the basis.

Parameter vs argument

By definition, parameters are the identifiers you specified when you define your function, while arguments are the actual values you supplied to the function when you make the function call. Sometimes you may see people mix up the word parameter and argument, but ultimately they are all referring to the same thing.

Basically Python function supports two types of parameters: positional and keyword arguments. Positional argument is designed to accept argument by following its position during the definition time, while for keyword arguments, you will need to specify the identifier (keyword) followed by the values.

You are allowed to use a combination of positional and keyword arguments when you define your function parameters. Below are the 4 types of variations:

Positional or keyword parameter

By default when you define a Python function, you can either pass in the arguments as positional or keyword. For instance, the below function requires two parameters – file_name and separator:

def read_file(file_name, separator):    
    print(f"file_name={file_name}, separator={separator}")
    file_text  = "Assuming this is the first paragraph from the file."
    return file_text.split(separator)

You can make the function call by supplying the arguments by parameter position or providing the parameter keywords:

read_file("text.txt", " ")
read_file(file_name="text.txt", separator=" ")
read_file("text.txt", separator=" ")
read_file(separator=" ", file_name="text.txt")

All above 4 calls would give you the same results as per below:

file_name=text.txt, separator= 
['Assuming', 'this', 'is', 'the', 'first', 'paragraph', 'from', 'the', 'file.']

Python accepts these arguments regardless of the arguments are provided in positional form or keyword form. When all arguments are by keywords, you can provide them in any order. But take note that positional arguments must be placed before the keyword arguments. For instance, the below throws syntax error,

read_file(file_name="text.txt", " ")

which shows “positional argument follows keyword argument”.

python positional argument and keyword argument

keyword only parameter

For clarity, you may want to implement functions that only accept keyword arguments, and your callers are restricted to only use keyword arguments to invoke the function. To achieve that, you can tweak a bit on your function definition with an additional “*” to indicate all parameters after it must be passed as keywords arguments. E.g.:

def write_file(file_name, *, separator, end, flush):
    print(f"file={file_name}, sep={separator}, end={end}, flush flag={flush}")

For the above function, the separator, end, flush parameters will only accept keyword arguments. You can call it as per below:

write_file("test.txt", separator=" ", end="\r\n", flush=True)

And you shall see output as per below:

file=test.txt, sep= , end=
, flush flag=True

But if you try to pass in all as positional arguments:

write_file("test.txt", " ", "\n", False)

You would see the below error message, which shows the last 3 positional arguments were not accepted.

python positional argument and keyword argument

To further restrict all parameters to be keyword only, you just need to shift the “*” to the beginning of all parameters:

def write_file(*, file_name, separator, end, flush):
    print(f"file={file_name}, sep={separator}, end={end}, flush flag={flush}")

This would make all parameters to only accept keyword arguments. And you can unpack an existing dictionary and pass it arguments into the function:

options = dict(file_name="test.txt", separator=",", end="\n", flush=True)

Positional only arguments

Many Python built-in functions only accept positional arguments, for instance the pow, divmod and abs etc. Prior to Python version 3.8, there is no easy way to define your own function with positional-only parameters. But from version 3.8 onward, you can restrict your function to only accept positional arguments by specifying the “/” (PEP 570) in the function definition. All the parameters come before “/” will be positional-only arguments. For example:

def read(file_name, separator, /):
    print(f"file_name={file_name}, separator={separator}")

For these parameters, if you try to pass in the keyword argument as per below:

read("test.txt", separator=",")

You would see error message indicating the particular parameter is positional-only argument.

    read("test.txt", separator=",")
TypeError: read() got some positional-only arguments passed as keyword arguments: 'separator'

Arbitrary number of arguments

There are cases that you have a number of positional or keyword arguments and you do not want to have a long list of parameters in the function definition. For such case, you can use *args and **kwargs to define arbitrary number of arguments to be accepted by the function. For instance:

def log(*args,**kwargs):
    print(f"args={args}, kwargs={kwargs}")

The above function accepts any number of positional arguments and keyword arguments. All the positional arguments will be packed into a tuple, and keyword arguments are packed into a dictionary.

log("start", "debugging", program_name="python", version=3.7)

When you make a function call as per above, you can see the below output:

args=('start', 'debugging'), kwargs={'program_name': 'python', 'version': 3.7}

This is especially useful when you are just trying to capture a snapshot of all the input arguments (such as logging) or implement a wrapper function for decorator where you do not need to know the exact arguments being passed in. The arbitrary arguments give the callers more flexibility on what they want to pass into your function.

The disadvantage are also obvious, it’s unclear to the new reader what are the parameters to be provided in order to get the correct result from the function call; and you shall not expect all the arguments to be present during the call, so you will have to write some logic to handle the various scenarios when the particular parameters are missing or present.

The best practice for function arguments

When you only have 1 or 2 parameters for the function, you won’t typically see any issue with the code readability/clarity. Problems usually emerge when you have more parameters and some are mandatory and some are optional. Consider the below send_email example:

def send_email(subject, to, cc, 
               bcc, message, attachment, 
               onbehalf, important, read_receipt_requested):
    print(f"{subject}, {to}, {cc}, \
          {bcc} {onbehalf}, {message}, \
          {attachment}, {important}, {read_receipt_requested}")

When you try to include as many parameters as possible to make the function generic for everybody, you’ll have to maintain a very long list of the parameters in the function definition. Calling this function by passing the positional arguments can be very confusing, as you will have to follow exactly the same sequence to provide the arguments as per in the function definition without omitting any single optional argument. E.g.:

send_email("hello", "", "",
           None, None, 
           "hello", None, False, True)

Without referring back to the function definition, it would be very hard to know which argument represents for which parameter. The best way to make handle such scenario is to always use keyword arguments and set the optional arguments with a default value. For example, the below improved version:

def send_email(subject, to, message, 
               cc=None, bcc=None, attachment=None, 
               onbehalf=None, important=False, read_receipt_requested=False):
    print(f"{subject}, {to}, {cc}, \
          {bcc} {onbehalf}, {message}, \
          {attachment}, {important}, {read_receipt_requested}")

#specify all parameters with keyword arguments

With keyword arguments, it improves readability of your code and also allows you to specify default values for the optional arguments. Further more, it gives you the flexibility to extend your parameter list in the future without refactoring your existing code, so that it provides the backwards compatibility at the very beginning.


In this article, we have reviewed through the different approaches for defining and passing arguments to Python function as well as their advantages and disadvantages. Based on your own scenario, you may need to evaluate whether to use positional-only, keyword-only, mix of positional and keyword, or even arbitrary arguments. For code clarity, the general recommendation is to use keyword argument as much as possible, so that all the arguments are understandable at the first glance, and reduces the chances of error.

python suppress stdout and stderr Photo by Yeshi Kangrang on Unsplash

Python recipes- suppress stdout and stderr messages


If you have worked on some projects that requires API calls to the external parties or uses 3rd party libraries, you may sometimes run into the problem that you are able to get the correct return results but it also comes back with a lot of noises in the stdout and stderr. For instance, the developer may leave a lot of “for your info” messages in the standard output or some warning or error messages due to the version differences in some of the dependency libraries.

All these messages would flood your console and you have no control on the source code, hence you cannot change its behavior. To reduce these noises, one option is to suppress stdout and stderr messages during making the function call. In this article, we will discuss about some recipes to suppress the messages for such scenarios.

Unexpected messages from stdout and stderr

To further illustrate the issue, let’s take a look at the below example. Assuming we have below check_result function in a python file, and this represents an external library.

import sys

def check_result():
    print("stdout message from externallib")
    print("stderr message from externallib", file=sys.stderr)
    return True

If you import the module and call the check_result function, you would be definitely getting the result as True, but you would see both the stdout and stderr messages from your console as well.

import externallib

result = externallib.check_result()

Both stdout and stderr messages were printed out in the console:

Python suppress stdout and stderr

suppress stdout and stderr with context manager

To stop these messages from printing out, we need to suppress stdout and stderr in the way that it redirects the output into a devnull file (similiar to /dev/null in Linux which is typically used for disposing of unwanted output streams) right before calling the function, and then redirect the outputs back after the call completed.

To do that, the best approach is to use a context manager, so that it is automatically directed/redirected upon the entry and exit of the context manager.

So let’s implement a context manager to perform the below:

  • Use suppress_stdout and suppress_stderr flags to indicate which stream to be suppressed
  • Save the state of the sys.stdout and sys.stderr in the __enter__ function, and redirect them to devnull based on the suppress_stdout and suppress_stderr flags
  • Restore back the state for sys.stdout and sys.stderr in __exit__

Below is the code snippet:

import os, sys

class suppress_output:
    def __init__(self, suppress_stdout=False, suppress_stderr=False):
        self.suppress_stdout = suppress_stdout
        self.suppress_stderr = suppress_stderr
        self._stdout = None
        self._stderr = None

    def __enter__(self):
        devnull = open(os.devnull, "w")
        if self.suppress_stdout:
            self._stdout = sys.stdout
            sys.stdout = devnull

        if self.suppress_stderr:
            self._stderr = sys.stderr
            sys.stderr = devnull

    def __exit__(self, *args):
        if self.suppress_stdout:
            sys.stdout = self._stdout
        if self.suppress_stderr:
            sys.stderr = self._stderr

And if you call the check_result again within this context manager as per below:

with suppress_output(suppress_stdout=True, suppress_stderr=True):
    result = externallib.check_result()

You would not see any messages printed out from check_result function, and the return result would remain as True. This is exactly what we are expecting!

Since we are using context manager, you may wonder to use contextlib to make our code more concise. So let’s make use of the contextlib package, and re-implement the above context manager using decorator as per below:

from contextlib import contextmanager

def nullify_output(suppress_stdout=True, suppress_stderr=True):
    stdout = sys.stdout
    stderr = sys.stderr
    devnull = open(os.devnull, "w")
        if suppress_stdout:
            sys.stdout = devnull
        if suppress_stderr:
            sys.stderr = devnull
        if suppress_stdout:
            sys.stdout = stdout
        if suppress_stderr:
            sys.stderr = stderr

With the above decorator implementation, you shall be able to get the same result when you call the function:

with nullify_output(suppress_stdout=True, suppress_stderr=True):
    result = externallib.check_result()

Everything seems to be good as of now, are we going to conclude here? Wait, there is something else we can still improve – instead of totally discard the messages, can we collect them into logging file?

Suppress stdout and stderr with redirect_stdout and redirect_stderr

If you scroll down the Python contextlib documentation further, you will notice there are two methods related to stdout and stderr: redirect_stdout and redirect_stderr . They are quite self-explanatory by their names, and also accept a file-like object as the redirect target.

With these two functions, we shall be able to make our code even more concise, meanwhile we can easily collect back the output message into our log file.

from contextlib import redirect_stdout, redirect_stderr
import io, logging
logging.basicConfig(filename='error.log', level=logging.DEBUG)

f = io.StringIO()
with redirect_stdout(f), redirect_stderr(f):
    result = externallib.check_result()

If you check the log file, you shall see the stdout and stderr messages were collected correctly.

suppress stdout and stderr with redirect_stdout or redirect_stderr

Of course, if you wish to continue disposing these messages, you can still specify the target file as devnull, so that nothing will be collected.


With all the above examples and explanations, hopefully you are able to use the code snippets and customize it to meet the objective in your own project. Directly disposing the stderr sometimes may not be a good idea in case there are some useful information for your later troubleshooting, so I would recommend to collect it into a log file as much as possible and do proper housekeeping to ensure the logs are not growing too fast.

If you are looking for solution to suppress certain known python exceptions, you may check out the suppress function from contextlib package.