Q: how to batch-verify image files

Discussion in 'Design and Graphics' started by ScottR, Sep 21, 2014.

  1. ScottR macrumors member

    Joined:
    May 11, 2007
    #1
    I had to recover a trashed drive with hundreds of thousands of images. An unfortunately large number of the recovered images are unusable: when trying to open them Preview reports "The file 'xxxxx.JPEG' could not be opened. It may be damaged or use a file format that Preview doesn’t recognize." GraphicConverter gives a similar error.

    Is there any utility that'll scan through all those images and help isolate those that are unusable?
     
  2. superscape macrumors 6502a

    superscape

    Joined:
    Feb 12, 2008
    Location:
    East Riding of Yorkshire, UK
    #2
    Hi Scott,

    You could probably use a relatively simple AppleScript to do the job. Here's a simple one which will check a single image, try to open it in Preview and if it fails, moves it into a folder of your choice.

    Code:
    set theFile to choose file with prompt "Please choose a file to verify"
    set theErroredFolder to choose folder with prompt "Where would you like to put the failed images?"
    
    
    tell application "Preview"
    	set theReturnValue to open theFile
    	close every document
    end tell
    
    
    tell application "Finder"
    	if theReturnValue is missing value then
    		move theFile to theErroredFolder
    	end if
    end tell

    Obviously, you'd want to scan more than one image. The best way to do that would depend on where the images are on your drive. e.g. Are they all in a single folder? Are they all over the place not the drive? Or would you want to drop a file or files as required?

    I guess you could probably adapt some of the code above to work as part of an Automator action too.
     
  3. ScottR thread starter macrumors member

    Joined:
    May 11, 2007
    #3
    Thanks for the reply. I think the problem with this approach is that I have literally hundreds of thousands of images to verify. I don't think scripting opening/moving them one at a time is a viable solution
     
  4. superscape macrumors 6502a

    superscape

    Joined:
    Feb 12, 2008
    Location:
    East Riding of Yorkshire, UK
    #4
    Do you have an example of one of the corrupted jpegs that you could share with me?
     
  5. chown33 macrumors 604

    Joined:
    Aug 9, 2009
    #5
    There's a command-line tool that might be useful. Its name is 'sips':
    https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/sips.1.html

    As noted in its man page, its functionality is also accessible by AppleScript's "Image Events". I can also say its functionality is accessible through Automator's image processing actions.

    As a strategy, you could probably use any of the 'sips' query functions to test the validity of the image. If the query fails, then the image file is damaged. If the query succeeds, it might be wise to try a more time-consuming and comprehensive test, such as format conversion, rotation, flip, crop, resample, etc.

    Another quick way to test an image file is the 'file' command:
    https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/file.1.html

    If the output says it's an image file, then at least it's undamaged enough to determine it's an image file. There may well be damage unseen by 'file', so again, it's safest to run a 'sips' check on files that quick checks say are ok.

    The reason for a quick check and a separate slow one is simply speed. If a large portion of the files are found to be damaged using the quick check, then the entire process completes faster.


    As to the approach for processing that many images, a shell script using 'find' to traverse directories would work. An AppleScript can also traverse directories. Step one, however is going to be "Backup all the files", in case anything goes wrong during the traversal.

    Step two is to setup a separate disk or partition with only a small number of known-good and known-bad files, and only run the scripts on that data. A few dozen files of each should suffice. Only after the entire process has been tested and verified on limited data should it be run on the full set of images. That is, make sure it works on a pilot run before full deployment.
     

Share This Page