Is there an app that can cross reference HDD's?

MikeNL

macrumors newbie
Original poster
Aug 20, 2009
13
1
Hey there,

So i've got a NAS that has most of my data. But i still have some old harddrives that have variations of what's on my NAS now. Specifically i once moved 30% of my files to Amazon because of their great unlimited plan. Once i was subscribed to their plan and uploaded a large part of my files, i found out that they were gonna get rid of that plan + all the files i already had uploaded had lost their original timestamp.

So i am looking for an app that can check all the files (byte for bye) on my NAS. and that i can then grab one of those HDD's and it'll check it against the app's database to see if there are still files that i don't have on my NAS and then gives me the option of copying it over.

I've tried the program Duplicate File Finder Pro and the Duplicate Folder function doesn't easily show the source HDD/makes it easy to select. Also, it makes me re-scan my NAS each time i want to check a new HDD, which takes a long long time ;).

I hope there is an app out there which can do this properly, and if so, if anybody can nudge me in the right direction ;)

Thank you!
Mike
 

Partron22

macrumors 68030
Apr 13, 2011
2,521
725
Yes
I use Synkron to keep folders/drives neat and equal between Windows and Mac formatted disks. It may not do precisely what you are asking for, but probably comes pretty close. It comes with lots of options.
-Sourceforge and free, also fast if the differences are minor, and it supplies a full report of what it did.
 

MikeNL

macrumors newbie
Original poster
Aug 20, 2009
13
1
I use Synkron to keep folders/drives neat and equal between Windows and Mac formatted disks. It may not do precisely what you are asking for, but probably comes pretty close. It comes with lots of options.
-Sourceforge and free, also fast if the differences are minor, and it supplies a full report of what it did.
It seems like a good sync app indeed! Just can't do what i need it to do. Since folder structures on the old drives are different. Thank you anyways!
 
  • Like
Reactions: Partron22

chown33

Moderator
Staff member
Aug 9, 2009
8,552
4,612
inter-prandial
You might be able to make something that solves the problem, by combining data from different apps.

First is the issue of cataloging your NAS or other storage, and keeping that in a database of some kind.

The standard way to accomplish this is to calculate a hash for each file of interest, and store that in a database along with the pathname of the file (i.e. where it's located in the folder hierarchy, along with its name). A hash will be unique and distinct for each unique file. If there's a single bit of difference, then the hashes will differ.

Here's an example of a Mac App Store app that can calculate several different hashes, recursively scan directories, and export that to different file formats (e.g. CSV).
https://apps.apple.com/us/app/hash-file/id1338927486?mt=12

So if you set it up to calculate the SHA-256 for everything in a folder, then export to CSV, you can then import that into a spreadsheet.

Repeat as needed for all the folders you want to catalog. If it were me, I'd do things about 50 files at a time, in their sub-folders, and export each CSV to a separate file. If something goes wrong, you can just do the smaller piece again, instead of needing to redo hundreds or thousands of files.


The second issue is finding duplicates.

Start by importing all the CSV files into a single spreadsheet. Next, sort by the column that holds the hash value.

After sorting by hash value, any duplicates will have the same hash, so any lines where 2 or more files have the same hash will tell you which are duplicates, and the pathnames will show you where the files are. You can then manually delete the duplicates, keeping only one version.


Obviously, this would be a lot simpler if it were entirely automated.

In theory, that's certainly possible, since there are command-line tools that can recursively walk folders (man find), calculate hashes (apropos hash), then sort by hash value (man sort). In practice, it's somewhat more complicated than that, but it would certainly be something that a moderately skilled shell programmer could do with a shell script, and then put into an Automator app.

There's also another complication: not all of a file's data will be hashed, only the data "fork". Excluded will be Finder Info, text encoding, tags, and other things that reside in extended attributes (xattr's) (see: man xattr). Ideally, all those should be hashed as well, either with the data, or perhaps individually, so their hashes can be compared individually. Some files store only incidental data in xattr's, while others might store data that's important for the proper use of the file.


I suggest starting by looking at apps that calculate hashes for entire folders, and can output the hash in a form that other apps can use. You may find that some such apps let you compare hashes using a file of previously calculated hashes. I don't know, that just seems like an obvious thing to me, because hashes are fundamentally useful only as values to be compared.

I have no experience with the app I linked above, it's just one that I found that did the necessary things (folder traversal, hash, export). There may be better ones on the Mac App Store.
 
  • Like
Reactions: DrBrush

HDFan

macrumors 68000
Jun 30, 2007
1,787
427
I've tried the program Duplicate File Finder Pro and the Duplicate Folder function doesn't easily show the source HDD/makes it easy to select.
A duplicate file finder can work if you use one directory as a master and then match it to the other directories, adding files to the master that are missing. Araxis Find Duplicate Files allows the comparison of multiple directories at a time. The file paths are included in the results with options like "select first duplicate, select all duplicates" so you can delete them. This is more a corporate than consumer product but it works well.