Get file names by extension from a directory

Whenever you access the directories and files, you probably will need to implement some function to get file names by file extension from a particular directory. For instance, you may want to check and process all the excel files in a folder, or do a house keeping to remove all the old log files. In this article, I will be explaining to you a few ways of implementing such function.

Let’s get started!

There are actually plenty of libraries/modules you can use to achieve it, but let’s start with the most commonly used libraries/modules.

Option 1

Since you will need to import the os module anyway if you need to handle the file operations, you can make use of the functions from this module.

For instance, you can list out all the files/sub-directories under the current directory,  and check if file name ending with certain file extension as per below:

import os

pyfiles = []
for file in os.listdir("."):
    if file.lower().endswith(".ipynb"):
        pyfiles.append(file)

You can further sort the files by last modified time from latest to the earliest.

pyfiles.sort(key=os.path.getmtime, reverse=True)

What if you want to check multiple file extensions ? Don’t worries, you can still achieve it by some minor change on the if condition:

if file.lower().endswith((".ipynb", ".xlsx")):

Option 2

The os module also has another method scandir which is able to achieve the same, and also returns the file types and file attribute info.

files = []
for file in os.scandir("."):
    if file.name.lower().endswith((".ipynb", ".xlsx")):
        files.append(file.name)

 

Option 3

If you don’t like the way to match the file names in the above code, you can use fnmatch to do this job. for example: 

import fnmatch
files = []
for file in os.listdir("."):
    if fnmatch.fnmatch(file, "*.ipynb") or fnmatch.fnmatch(file, "*.xlsx"):
        files.append(file)

 

Option 4

Python has a glob module you can use the Unix style of pattern to match the files. To match the files with certain extension, you can simply do the below:

import glob
files = glob.glob("*.ipynb")

And then sort by the file creation from the latest to the earliest:

files.sort(key=os.path.getctime, reverse=True)

if you want match for multiple file extensions, you can do something as below:

files = []
file_types = ("*.ipynb", "*.xlsx")
for file_type in file_types:
    files.extend(glob.glob(file_type))

files.sort(key=os.path.getctime, reverse=True)

As I mentioned earlier, there are far more ways of doing it and it would not be possible to list of all them, so I will just stop here, and please leave your comments if you have better ideas.

 

You may also like

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x