Python xlrd is a very useful library when you are dealing with some older version of the excel files (.xls). In this tutorial, I will share with you how to use this library to read data from .xls file.
Let’s get started with xlrd
You will need to install this library with below pip command:
pip install xlrd
and import the library to your source code:
import xlrd import os
To open your excel file, you will need to pass in the full path of your file into the open_workbook function.It returns the workbook object, and in the next line you will be able to access the sheet in the opened workbook.
workbook = xlrd.open_workbook(r"c:\test.xls")
There are multiple ways for doing it, you can access by sheet name, sheet index, or loop through all the sheets
sheet = workbook.sheet_by_name("Sheet") #getting the first sheet sheet_1 = workbook.sheet_by_index(0) for sh in workbook.sheets(): print(sh.name)
To get the number of rows and columns in the sheet, you can access the following attributes. By default,
all the rows are padded out with empty cells to make them same size, but in case you want to ignore the
empty columns at the end, you may consider ragged_rows parameter when you call the open_workbook function.
row_count = sheet.nrows col_count = sheet.ncols # use sheet.row_len() to get the effective column length when you set ragged_rows = True
With number of rows and columns, you will be able to access the data of each of the cells
for cur_row in range(0, row_count): for cur_col in range(0, col_count): cell = sheet.cell(cur_row, cur_col) print(cell.value, cell.ctype)
Instead of accessing the data cell by cell, you can also access it by row or by column, e.g. assume your first row is the column header, you can get all the headers into a list as below:
header = sheet.row_values(0, start_colx=0, end_colx=None) # row_slice returns the cell object(both data type and value) in case you also need to check the data type #row_1 = sheet.row_slice(1, start_colx=0, end_colx=None)
Get the whole column values into a list:
col_a = sheet.col_values(0, start_rowx=0, end_rowx=None) # col_slice returns the cell object of the specified range col_a = sheet.col_slice(0, start_rowx=0, end_rowx=None)
There is a quite common error when handling the xls files, please check this article for fixing the CompDocError.
xlrd is a clean and easy to use library for handling xls files, but unfortunately there is no active maintenance for this library as Excel already evolved to xlsx format. There are other libraries such as openpyxl which can handle xlsx files very well for both data and cell formatting. I would suggest you to use xlsx file format in your new project whenever possible, so that more active libraries are supported.
If you would like to understand more about openpyxl , please read my next article about this library.
As per always, welcome to any comments or questions. Thanks.