Christmas is just around the corner, the snowfall, beautiful festive lights and joyful songs from the last year still floating in your mind. But this year, things are getting unusual due to the Covid-19. A lot of celebration events are cancelled or suspended and people are advised to avoid gathering and stay at home as much as possible. Although staying at home became new norm, there is still a way that we can get to know what people are thinking about during this festive season since nowadays most of us search a lot from Google every day. With a few lines of Python code, we will be able to extract and visualize the data from Google Trends.
Let’s dive into the code examples.
Python to get Google trends data
To get the search trends from Google, we will need to use a Python package – pytrends. It’s not an official API for Google trends but It provides a convenience way to automatically download Google trends data same as what we can do manually from Google Trends website.
You can use the pip command to install the package:
pip install --upgrade pytrends
And import the necessary modules at the beginning of our code:
from pytrends.request import TrendReq
To use it, we can initiate the request object by providing the language for searching as well as the time zone information. For instance, I am specifying English as the language and time zone offset as -480 which is UCT + 8 in the below. The default value for this time zone offset is 360 (CST), so you can roughly see how this offset is calculated based on the UCT time zone.
pytrend = TrendReq(hl='en-US', tz=-480)
To get the search trends for a particular keyword, we shall specify it in a keyword list. For example, we use “christmas” to see what people have searched in Google related to this keyword. There are a few more parameters you need to specify in the build_payload function in order to narrow down the results:
cat – The category you are interested in, you can see the full list here.
timeframe – The date range when the search happened. You can specify the range as past X hours/days/months/years (the list of available options you can see from Google Trends web page) or even a specific start date and end date. For our case, we use “now 7-d” for the past 7 days.
geo – The geolocation which can be two characters country code or leave it empty to see the results from globally
gprop – The source which you can leave it as empty for web search, other options can be images, news, youtube, or froogle
Let’s build up our query as per below:
kw_list = ["christmas"] pytrend.build_payload(kw_list, cat=0, timeframe='now 7-d', geo='SG', gprop='')
With all these criteria, we can check what are the related topics people searched in Google from Singapore. The related_queries function will give you a dictionary of both top & rising queries related to the keywords:
trends = pytrend.related_queries()
If you examine the trends variable, you shall see something similar to below:
The dictionary consists of results for both “top” and “rising” results in pandas dataframe objects, and you can access the top queries as per below:
df_sg = trends["christmas"]["top"]
Examine the first a few records in df_sg, you can see that people in Singapore are still in celebration mood as most of records are related to greetings, light shows or gifts etc.
On the other hand, let’s also take a look at the search trends for UK since It has just announced some new restrictions on travelling recently.
pytrend.build_payload(kw_list, cat=0, timeframe='now 7-d', geo='GB', gprop='') trends = pytrend.related_queries() df_gb = trends["christmas"]["top"]
Examining the df_gb variable, you can see some people started worrying about the new rules and restrictions for this Christmas although majority of the searching results are still around of the festival celebration.
Visualize the results in word cloud
Since we have all the keywords and popularity that people used for search, the most straightforward to visualize them would be using word cloud to generate a picture. To do so, we will need use another python package – wordcloud which is a pure Python library for generating word cloud image. And you also need to use some supporting packages like PIL and numpy for manipulating the images.
You can use pip command to install these packages if you do not have them yet:
pip install --upgrade wordcloud pip install Pillow==2.2.2 pip install --upgrade numpy
Let’s import all the necessary modules into our code:
from wordcloud import WordCloud, ImageColorGenerator, STOPWORDS from PIL import Image import os import numpy as np
From previous section, we have already got the search keywords in dataframe. wordcloud supports both text string and words frequencies, for simplicity, let’s convert only keywords into a space separated string and forget about the value (popularity).
text = ' '.join(df_sg["query"].to_list())
And as all the keywords contain “christmas”, we shall filter out this word before generating the word cloud. In wordcloud package, it has a list of predefined words to be excluded, and you can append more words to be excluded as per the below:
stopwords = set(STOPWORDS) stopwords.add("christmas")
Now let’s use this featured image as our background for generating word cloud. We shall load it as a 3-demensional array as the background mask for later use:
bg_mask = np.array(Image.open(os.path.join(os.getcwd(), "christmas tree.jpg")))
With all these ready, we can initiate a word cloud object with below parameters. The name of the parameters are quite self-explanatory, so I will not go through them one by one. You can check the official document from here.
wc = WordCloud( width = 600, height = 1000, background_color = 'white', colormap = 'rainbow', mask = bg_mask, stopwords = stopwords, max_words = 1000, max_font_size = 150, min_font_size = 15, contour_width = 2, contour_color = 'dodgerblue' )
Then we can supply our words to the generate_from_text function which will process the text and generate the image. Next we can save the output into an image file as per below code:
When opening the output image file, you shall see something like the below. Isn’t that cool?
Similarly, when you pass the UK searching result and generate the word cloud, you would see “covid” and “rules” are most concerned by UK people.
Note: since we are passing through a text string, the frequency is based on how many times the words repeated rather than the popularity from Google.
In this article, we have discussed how to use pytrends to automatically get the Google search data for any particular keyword and then use wordcloud to visualize the information. It only covers some basic usage of these two packages, you may check further on their documents to understand what else are provided in these packages. One thing to take note is that pytrends is using some scrapping techniques to get the data from Google Trends, it may break when there is any structural change in the way that Google makes the requests or sends the response. So frequent code upgrade is required by the project team. By the way, they are looking for maintainers, just in case you are interested.