Delete multiple files matching a pattern

Posted on

Question :

Delete multiple files matching a pattern

I have made an online gallery using Python and Django. I’ve just started to add editing functionality, starting with a rotation. I use sorl.thumbnail to auto-generate thumbnails on demand.

When I edit the original file, I need to clean up all the thumbnails so new ones are generated. There are three or four of them per image (I have different ones for different occasions).

I could hard-code in the file-varients… But that’s messy and if I change the way I do things, I’ll need to revisit the code.

Ideally I’d like to do a regex-delete. In regex terms, all my originals are named like so:

^(?P<photo_id>d+).jpg$

So I want to delete:

^(?P<photo_id>d+)[^d].*jpg$

(Where I replace photo_id with the ID I want to clean.)

Asked By: Oli

||

Answer #1:

Try something like this:

import os, re

def purge(dir, pattern):
    for f in os.listdir(dir):
        if re.search(pattern, f):
            os.remove(os.path.join(dir, f))

Then you would pass the directory containing the files and the pattern you wish to match.

Answered By: Andrew Hare

Answer #2:

A variation on the glob approach, that will work with Python 3:

import glob, os
for f in glob.glob("P*.jpg"):
    os.remove(f)

Edit: In Python 3.4+ you may want to use pathlib:

from pathlib import Path
for p in Path(".").glob("P*.jpg"):
    p.unlink()
Answered By: Sam Bull

Answer #3:

If you need recursion into several subdirectories, you can use this method:

import os, re, os.path
pattern = "^(?P<photo_id>d+)[^d].*jpg$"
mypath = "Photos"
for root, dirs, files in os.walk(mypath):
    for file in filter(lambda x: re.match(pattern, x), files):
        os.remove(os.path.join(root, file))

You can safely remove subdirectories on the fly from dirs, which contains the list of the subdirectories to visit at each node.

Note that if you are in a directory, you can also get files corresponding to a simple pattern expression with glob.glob(pattern). In this case you would have to substract the set of files to keep from the whole set, so the code above is more efficient.

Answered By: RedGlyph

Answer #4:

How about this?

import glob, os, multiprocessing
p = multiprocessing.Pool(4)
p.map(os.remove, glob.glob("P*.jpg"))

Mind you this does not do recursion and uses wildcards (not regex).

UPDATE
In Python 3 the map() function will return an iterator, not a list. This is useful since you will probably want to do some kind processing on the items anyway, and an iterator will always be more memory-efficient to that end.

If however, a list is what you really need, just do this:

...
list(p.map(os.remove, glob.glob("P*.jpg")))

I agree it’s not the most functional way, but it’s concise and does the job.

Answered By: Valeriu Palo?

Answer #5:

It’s not clear to me that you actually want to do any named-group matching — in the use you describe, the photoid is an input to the deletion function, and named groups’ purpose is “output”, i.e., extracting certain substrings from the matched string (and accessing them by name in the match object). So, I would recommend a simpler approach:

import re
import os

def delete_thumbnails(photoid, photodirroot):
  matcher = re.compile(r'^%sd+D.*jpg$' % photoid)
  numdeleted = 0
  for rootdir, subdirs, filenames in os.walk(photodirroot):
    for name in filenames:
      if not matcher.match(name):
        continue
      path = os.path.join(rootdir, name)
      os.remove(path)
      numdeleted += 1
  return "Deleted %d thumbnails for %r" % (numdeleted, photoid)

You can pass the photoid as a normal string, or as a RE pattern piece if you need to remove several matchable IDs at once (e.g., r'abc[def] to remove abcd, abce, and abcf in a single call) — that’s the reason I’m inserting it literally in the RE pattern, rather than inserting the string re.escape(photoid) as would be normal practice. Certain parts such as counting the number of deletions and returning an informative message at the end are obviously frills which you should remove if they give you no added value in your use case.

Others, such as the “if not … // continue” pattern, are highly recommended practice in Python (flat is better than nested: bailing out to the next leg of the loop as soon as you determine there is nothing to do on this one is better than nesting the actions to be done within an if), although of course other arrangements of the code would work too.

Answered By: Alex Martelli

Answer #6:

My recomendation:

def purge(dir, pattern, inclusive=True):
    regexObj = re.compile(pattern)
    for root, dirs, files in os.walk(dir, topdown=False):
        for name in files:
            path = os.path.join(root, name)
            if bool(regexObj.search(path)) == bool(inclusive):
                os.remove(path)
        for name in dirs:
            path = os.path.join(root, name)
            if len(os.listdir(path)) == 0:
                os.rmdir(path)

This will recursively remove every file that matches the pattern by default, and every file that doesn’t if inclusive is true. It will then remove any empty folders from the directory tree.

Answered By: DRayX

Answer #7:

import os, sys, glob, re

def main():

    mypath = "<Path to Root Folder to work within>"
    for root, dirs, files in os.walk(mypath):
        for file in files:
            p = os.path.join(root, file)
            if os.path.isfile(p):
                if p[-4:] == ".jpg": #Or any pattern you want
                os.remove(p)
Answered By: Charlie

Answer #8:

I find Popen(["rm " + file_name + "*.ext"], shell=True, stdout=PIPE).communicate() to be a much simpler solution to this problem. Although this is prone to injection attacks, I don’t see any issues if your program is using this internally.

Answered By: Kartos

Leave a Reply

Your email address will not be published. Required fields are marked *