How do you sort files numerically?

Posted on

Question :

How do you sort files numerically?

I’m processing some files in a directory and need the files to be sorted numerically. I found some examples on sorting–specifically with using the lambda pattern–at wiki.python.org, and I put this together:

#!env/python
import re

tiffFiles = """ayurveda_1.tif
ayurveda_11.tif
ayurveda_13.tif
ayurveda_2.tif
ayurveda_20.tif
ayurveda_22.tif""".split('n')

numPattern = re.compile('_(d{1,2}).', re.IGNORECASE)

tiffFiles.sort(cmp, key=lambda tFile:
                   int(numPattern.search(tFile).group(1)))

print tiffFiles

I’m still rather new to Python and would like to ask the community if there are any improvements that can be made to this: shortening the code up (removing lambda), performance, style/readability?

Thank you,
Zachary

Answer #1:

This is called “natural sorting” or “human sorting” (as opposed to lexicographical sorting, which is the default). Ned B wrote up a quick version of one.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def alphanum_key(s):
    """ Turn a string into a list of string and number chunks.
        "z23a" -> ["z", 23, "a"]
    """
    return [ tryint(c) for c in re.split('([0-9]+)', s) ]

def sort_nicely(l):
    """ Sort the given list in the way that humans expect.
    """
    l.sort(key=alphanum_key)

It’s similar to what you’re doing, but perhaps a bit more generalized.

Answered By: Zach Young

Answer #2:

Just use :

tiffFiles.sort(key=lambda var:[int(x) if x.isdigit() else x for x in re.findall(r'[^0-9]|[0-9]+', var)])

is faster than use try/except.

Answered By: Daniel DiPaolo

Answer #3:

If you are using key= in your sort method you shouldn’t use cmp which has been removed from the latest versions of Python. key should be equated to a function which takes a record as input and returns any object which will compare in the order you want your list sorted. It doesn’t need to be a lambda function and might be clearer as a stand alone function. Also regular expressions can be slow to evaluate.

You could try something like the following to isolate and return the integer part of the file name:

def getint(name):
    basename = name.partition('.')
    alpha, num = basename.split('_')
    return int(num)
tiffiles.sort(key=getint)
Answered By: dkmatt0

Answer #4:

@April provided a good solution in How is Pythons glob.glob ordered? that you could try

#First, get the files:
import glob
import re

files = glob.glob1(img_folder,'*'+output_image_format)

# Sort files according to the digits included in the filename
files = sorted(files, key=lambda x:float(re.findall("(d+)",x)[0]))
Answered By: Don O’Donnell

Answer #5:

Partition results in Tuple

def getint(name):
    (basename, part, ext) = name.partition('.')
    (alpha, num) = basename.split('_')
    return int(num)
Answered By: yoonghm

Answer #6:

This is a modified version of @Don O’Donnell’s answer, because I couldn’t get it working as-is, but I think it’s the best answer here as it’s well-explained.

def getint(name):
    _, num = name.split('_')
    num, _ = num.split('.')
    return int(num)

print(sorted(tiffFiles, key=getint))

Changes:

1) The alpha string doesn’t get stored, as it’s not needed (hence _, num)

2) Use num.split('.') to separate the number from .tiff

3) Use sorted instead of list.sort, per https://docs.python.org/2/howto/sorting.html

Answered By: Prabhath Kota

Leave a Reply

Your email address will not be published.