PDA

View Full Version : Script finding and copying files from large library




svantejerling
Jun 11, 2013, 09:01 PM
Hi, I am working on a photo book project.

I have a folder with subfolders that together contains over 6M photos (jpegs)
I have a list of filenames (roughly 3000) that I want copy to another folder.

There are reoccurring filenames ex. IMG_0042.jpg need both files. The script shouldn't replace the reoccurring filename but rename it.

I can manually remove false positives (ex. 2 out of three IMG_0042)
I have the list of images I need in a text file

Text file:
IMG_6763.jpg
IMG_0025.jpg
DSC_0077.jpg
DSC_0727.jpg
IMG_3091.jpg
IMG_9109.jpg
IMG_4128.jpg

Would really appreciate the help. And so would my poor assistant who would love to spend two weeks of her life doing other things than finding these pictures manually :)



subsonix
Jun 12, 2013, 04:19 AM
So, you have a list of files that resides somewhere in a directory tree, you want to find them all and copy them to a specified location. Is that correct?


There are reoccurring filenames ex. IMG_0042.jpg need both files. The script shouldn't replace the reoccurring filename but rename it.


What exactly do you mean by this? It seems like a completely different objective than finding and copying files.

What is it you want to do exactly?

svantejerling
Jun 12, 2013, 09:12 AM
Sorry for being unclear.
1. They reside somewhere in a directory tree (2 folder deep)


2. Reoccurring filenames: It just means that if there are two source files with the same name, the second one shouldnt overwrite the first.

ex. There might be several IMG_0001 placed in different subfolders.
If that filename is in the list then all such files should be copied into the target folder.

subsonix
Jun 12, 2013, 09:22 AM
Sorry for being unclear.
1. They reside somewhere in a directory tree (2 folder deep)

This will copy the files files.txt from /source/directory to /target/directory.


#!/bin/bash

while read line
do
find /source/directory -name "$line" -exec cp {} /target/directory \;
done < files.txt



2. Reoccurring filenames: It just means that if there are two source files with the same name, the second one shouldnt overwrite the first.

ex. There might be several IMG_0001 placed in different subfolders.
If that filename is in the list then all such files should be copied into the target folder.

If there are several files with the same name all of them will be copied but cp will overwrite existing files with the same name in the target directory with the above script.

There are ways around this, there is a command called pax that can rename the files according to a regex pattern and for example add a numeric suffix to the name, but it's more involved. Have to look at that, the simplest case is if the filenames are unique, if you can fix that somehow.