(Shell) Can rsync be used to reorganize files ?

Discussion in 'Mac Programming' started by 0002378, Jun 3, 2017.

  1. 0002378, Jun 3, 2017
    Last edited: Jun 3, 2017

    0002378 Suspended


    May 28, 2017
    Let me explain the scenario.

    I have a huge folder of documentaries on my internal HDD, and I also keep them backed up on my external drive.

    Let's say that one day, I decide to reorganize my documentaries on my local drive into neat categories (folders). I don't add any new files, I just move 'em around locally.

    So, my internal drive structure looks like:
    blabla.../Documentaries/War/WW2 From Space.mp4
    blabla.../Documentaries/Disaster/Runaway Train.mp4

    Now, those same files already exist on my external (backup) drive, but are not yet neatly organized into categories/subfolders:
    blabla.../Documentaries/WW2 From Space.mp4
    blabla.../Documentaries/Runaway Train.mp4

    So, when I now use rsync --update using local drive as src and ext drive as dest, it will think that /War/WW2 From Space.mp4 doesn't exist ! Because it is not under the same relative path on the external drive. So, it will (needlessly) copy it over under a new /War directory.

    Can I tell rsync to be smarter and look for a src file anywhere in the dest Documentaries folder, and if found, simply move it around, creating new folders as needed, so it is organized the same way as the src (local Documentaries folder) ? This will save me a TON of time, because a copy operation is obviously much more time (and space) consuming than a simple move.

    So far, I have written a Java program that does exactly what I want (Java is my go-to for stuff like this), but I'm wondering if I really need to reinvent the wheel here.

    If not rsync, any other utility ?
  2. chown33 macrumors 604

    Aug 9, 2009
    Sailing beyond the sunset
    I don't think 'rsync' can do this, and a quick search of its man page for the word "move" doesn't suggest anything. I'm not surprised by this, since rsync is designed to sync things, not reorganize them. Part of syncing is maintaining structure, in addition to file contents, so I don't see why rsync would have this capability.

    To move things around within a structure would require that rsync maintain all the hashes for all the files, so it could know which things are actually identical; it certainly can't rely on names alone to do that. The thing is, maintaining all the hashes only has a purpose when moving things around, which implies altering structure rather than maintaining it. And as noted above, rsync is designed to maintain structure, not alter it.

    If I were doing this, I might make an awk script. That's mainly because I'm thinking of the inputs and the actions, not the language to write it in.

    One input would be the list of filenames in their uncategorized locations (plain leaf names). Another input would be the list of files in their categorized locations (partial pathnames, with dirnames). The action is to match each uncategorized name to its categorized name (pattern matching is awk's forté), and emit a 'mv' command-line that performs the move. Then you manually check the output list of 'mv' commands to make sure it's sensible, and finally feed it to a bash shell as input.

    The inputs might be produced with 'find' or 'ls -R' or maybe some other source.

    If you wanted the awk script to match src and dst according to something other than leaf filename, you'd put that as an additional field in both the inputs (uncategorized and categorized name lists). Then awk would match those instead of leaf names.

    You could even write the script so it always matches according to a 2nd field, then structure your inputs so the "hash" is the leaf filename. That way you could use different criteria (metadata, hashes, leaf names, etc.) and structure your inputs, yet the script that actually produces the 'mv' commands would be the same, since all it's doing is matching field-A of input set 1 with field-A of input set 2, and producing a 'mv -fv {field_B_set_1} {field_B_set_2}'. Awk's "associative arrays" would be instrumental here.

    Tools other than awk could also do this. Perl springs to mind, but I don't know it well enough to say how to accomplish this.

    Clearly, a Java app could also do this (Map), since the structured inputs are pretty easy to parse, and outputting a series of mv's is child's play.
  3. 0002378 thread starter Suspended


    May 28, 2017
    Thanks, chown. By "hashes", do you mean the inodes ?

    I've already written a Java program to do this directly using java.io.File objects. It was trivial to do. I was just hoping that someone else has already written something that has been out there forever and hence been tested to death and deemed efficient and reliable.

    Thanks, in any case.
  4. chown33 macrumors 604

    Aug 9, 2009
    Sailing beyond the sunset
    inode numbers will work, but so would anything that's unique to the file. In the schemes I outlined it's just a key, used to match a source "thing" to a destination "place". Reorganization consists of matching keys between 2 sets, then generating the action that transforms one into the other.

    If "reorganization" amounts to "move THIS to THERE" repeated over a set of THIS'es and THERE's, then all that's needed is a unique key for each THIS to match to its corresponding THERE. The circumstances will dictate what happens if a THERE doesn't have a THIS, or there are multiple THIS'es for a THERE, or even multiple THERE's for a THIS. In a corporate reorg, the set of THIS'es with no THERE's are let go (fired).
  5. 0002378 thread starter Suspended


    May 28, 2017
    :eek: ... :p:D
  6. Senor Cuete macrumors 6502

    Nov 9, 2011
    If the files are organized in folders as you want them on the external drive, couldn't you just delete the folder from your internal drive and replace it with the backup?
  7. 0002378 thread starter Suspended


    May 28, 2017
    Yes, I could do that. But, that would be very time consuming, as my movies directory is huge ! I'm talking > 1TB. So, replacing it each time would not be a feasible solution.
  8. superscape macrumors 6502a


    Feb 12, 2008
    East Riding of Yorkshire, UK
    Or, taking a slightly sideways look at the problem, why not dump the idea of cataloging using a folder structure. Maybe you could use tags instead?


    Just a thought!
  9. Les Kern macrumors 68040

    Les Kern

    Apr 26, 2002
    Further, if there's no THERE there and no there here, then it stand to reason the here that's here is not where the here was to begin with. The user must then make the assumption that neither here nor there is actually here or there, but here, not there, and is actually WHERE. See?

Share This Page