Compression and extraction doesn't give the same folder

Discussion in 'macOS' started by gwelmarten, Feb 12, 2013.

  1. gwelmarten macrumors 6502

    Joined:
    Jan 17, 2011
    Location:
    England!
    #1
    Hi There

    I have a very delicate folder that I am trying to upload to a server, via either FTP or FTPS. Speed is not important, but it is critical it does not get corrupted.

    Initially, I thought the more individual transfers via FTP would lead to a higher chance of corruption (is this right?). To deal with this, I zipped it using the 'zip -r' command in terminal. When extracted that again on my computer, I keep seeing a small size difference (1.5gb file, goes down by about 400kb). Surely that means some data is been lost? What is OS X doing with this!?

    What would be the most reliable way of getting this file onto my server? Either using a SSH terminal session or a FTP/FTPS client (FileZilla)?

    Thanks,

    Sam
     
  2. freejazz-man macrumors regular

    Joined:
    May 12, 2010
    #2
    you can just 'scp localfile user@remotehost.net:/path/to/file'

    I prefer scp to FTP, as for compression, I wouldn't worry. What is the remote filesystem? The difference is likely in that - ignoring .files and getting rid of the HFS+ journal and any other metadata. You know - stuff like that.

    If the zip were actually corrupted it wouldn't unarchive properly. If you want to be sure, do an md5sum of the zip before and after the transfer.
     
  3. gwelmarten thread starter macrumors 6502

    Joined:
    Jan 17, 2011
    Location:
    England!
    #3
    So are you saying even if just one bit had changed unexpectedly during compression, the archive would not then unarchive correctly? Isn't that a bit bad for general purposes? Am I more likely in general to get more corruption uploading multiple files separately?
     
  4. freejazz-man macrumors regular

    Joined:
    May 12, 2010
    #4
    Yeah, it shouldn't. And if you were concerned about one bit changing in the transmission, doing the checksum prior to and post transmission will solve that issue.

    I don't think corruption happens anywhere near as often as you seem to think.

    Dropped packets? TCP takes care of that. FTP w/ UDP? FTP should manage that at a higher layer. Most of these protocols are designed so the transmission itself doesn't cause corruption. However, no one can control whether or not a gamma-ray passes through your HDD and flips a bit.

    Where do you think corruption is happening? md5sum before and after and you will find it doesn't happen. I think maybe you are a bit too worried?

    on osx command line 'md5 /path/to/file'
     
  5. gwelmarten thread starter macrumors 6502

    Joined:
    Jan 17, 2011
    Location:
    England!
    #5
    It's a folder so I can't do a checksum without a great deal of trouble I think. I think the corruption may happen during the transfer.
     
  6. freejazz-man macrumors regular

    Joined:
    May 12, 2010
    #6
    zip the folder

    md5 the zip

    transfer the zip

    md5 on the server


    You are going to find there is no corruption
     
  7. gwelmarten thread starter macrumors 6502

    Joined:
    Jan 17, 2011
    Location:
    England!
    #7
    How will I check that there is no corruption in the zipping/unzipping of the folder?
     
  8. freejazz-man macrumors regular

    Joined:
    May 12, 2010
    #8
    you can do individual md5sums on everything

    or... you could get over yourself

    if the folder unzips - it's a pretty good sign there is no corruption

    you are being unnecessarily worrisome and you aren't willing to do what is required to get the reassurance you want. so either md5sum everything so you can get over the act of archiving, or just accept it

    have you recognized that the difference in the archive size is related to the difference in the filesystems on the two different computers? Because that's a key first step to realizing you are having a completely irrational worry
     
  9. gwelmarten thread starter macrumors 6502

    Joined:
    Jan 17, 2011
    Location:
    England!
    #9
    MD5's on everything are out of the question really - there's 8000 files.

    Thanks for the info on unzipping - that gives more confidence. I think I'll take that route because then I can easily do an MD5 on the zip.

    You asked originally why it is so important. This is a Computer Science Research Project (I'm studying at University) and I need to get the files to Switzerland for further analysis by 4PM GMT tonight.
     
  10. freejazz-man macrumors regular

    Joined:
    May 12, 2010
    #10
    Ok - but you realize every linux distro, or developer, puts out code in compressed formats with only a checksum for the compressed document and nothing else? I don't think you have to worry.

    Seeing as you are studying in computer science - why not use the find command to run md5 sum recursively and pipe the output to a file where you can use diff to compare the local and remote versions?
     
  11. balamw Moderator

    balamw

    Staff Member

    Joined:
    Aug 16, 2005
    Location:
    New England
    #11
    You could also use something like parchive to be able to recover in case of transmission errors.

    Exactly. So what if there are 8,000 files. they should all be the same so diffing the two outputs from md5sum would show no errors or a single error would be highlighted?

    B
     
  12. freejazz-man macrumors regular

    Joined:
    May 12, 2010

Share This Page