Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

gwelmarten

macrumors 6502
Original poster
Jan 17, 2011
476
0
England!
Hi There

I have a very delicate folder that I am trying to upload to a server, via either FTP or FTPS. Speed is not important, but it is critical it does not get corrupted.

Initially, I thought the more individual transfers via FTP would lead to a higher chance of corruption (is this right?). To deal with this, I zipped it using the 'zip -r' command in terminal. When extracted that again on my computer, I keep seeing a small size difference (1.5gb file, goes down by about 400kb). Surely that means some data is been lost? What is OS X doing with this!?

What would be the most reliable way of getting this file onto my server? Either using a SSH terminal session or a FTP/FTPS client (FileZilla)?

Thanks,

Sam
 
you can just 'scp localfile user@remotehost.net:/path/to/file'

I prefer scp to FTP, as for compression, I wouldn't worry. What is the remote filesystem? The difference is likely in that - ignoring .files and getting rid of the HFS+ journal and any other metadata. You know - stuff like that.

If the zip were actually corrupted it wouldn't unarchive properly. If you want to be sure, do an md5sum of the zip before and after the transfer.
 
you can just 'scp localfile user@remotehost.net:/path/to/file'

I prefer scp to FTP, as for compression, I wouldn't worry. What is the remote filesystem? The difference is likely in that - ignoring .files and getting rid of the HFS+ journal and any other metadata. You know - stuff like that.

If the zip were actually corrupted it wouldn't unarchive properly. If you want to be sure, do an md5sum of the zip before and after the transfer.

So are you saying even if just one bit had changed unexpectedly during compression, the archive would not then unarchive correctly? Isn't that a bit bad for general purposes? Am I more likely in general to get more corruption uploading multiple files separately?
 
Yeah, it shouldn't. And if you were concerned about one bit changing in the transmission, doing the checksum prior to and post transmission will solve that issue.

I don't think corruption happens anywhere near as often as you seem to think.

Dropped packets? TCP takes care of that. FTP w/ UDP? FTP should manage that at a higher layer. Most of these protocols are designed so the transmission itself doesn't cause corruption. However, no one can control whether or not a gamma-ray passes through your HDD and flips a bit.

Where do you think corruption is happening? md5sum before and after and you will find it doesn't happen. I think maybe you are a bit too worried?

on osx command line 'md5 /path/to/file'
 
Yeah, it shouldn't. And if you were concerned about one bit changing in the transmission, doing the checksum prior to and post transmission will solve that issue.

I don't think corruption happens anywhere near as often as you seem to think.

Dropped packets? TCP takes care of that. FTP w/ UDP? FTP should manage that at a higher layer. Most of these protocols are designed so the transmission itself doesn't cause corruption. However, no one can control whether or not a gamma-ray passes through your HDD and flips a bit.

Where do you think corruption is happening? md5sum before and after and you will find it doesn't happen. I think maybe you are a bit too worried?

on osx command line 'md5 /path/to/file'

It's a folder so I can't do a checksum without a great deal of trouble I think. I think the corruption may happen during the transfer.
 
zip the folder

md5 the zip

transfer the zip

md5 on the server


You are going to find there is no corruption
 
you can do individual md5sums on everything

or... you could get over yourself

if the folder unzips - it's a pretty good sign there is no corruption

you are being unnecessarily worrisome and you aren't willing to do what is required to get the reassurance you want. so either md5sum everything so you can get over the act of archiving, or just accept it

have you recognized that the difference in the archive size is related to the difference in the filesystems on the two different computers? Because that's a key first step to realizing you are having a completely irrational worry
 
you can do individual md5sums on everything

or... you could get over yourself

if the folder unzips - it's a pretty good sign there is no corruption

you are being unnecessarily worrisome and you aren't willing to do what is required to get the reassurance you want. so either md5sum everything so you can get over the act of archiving, or just accept it

have you recognized that the difference in the archive size is related to the difference in the filesystems on the two different computers? Because that's a key first step to realizing you are having a completely irrational worry

MD5's on everything are out of the question really - there's 8000 files.

Thanks for the info on unzipping - that gives more confidence. I think I'll take that route because then I can easily do an MD5 on the zip.

You asked originally why it is so important. This is a Computer Science Research Project (I'm studying at University) and I need to get the files to Switzerland for further analysis by 4PM GMT tonight.
 
Ok - but you realize every linux distro, or developer, puts out code in compressed formats with only a checksum for the compressed document and nothing else? I don't think you have to worry.

Seeing as you are studying in computer science - why not use the find command to run md5 sum recursively and pipe the output to a file where you can use diff to compare the local and remote versions?
 
You could also use something like parchive to be able to recover in case of transmission errors.

Seeing as you are studying in computer science - why not use the find command to run md5 sum recursively and pipe the output to a file where you can use diff to compare the local and remote versions?

Exactly. So what if there are 8,000 files. they should all be the same so diffing the two outputs from md5sum would show no errors or a single error would be highlighted?

B
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.