Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

l008com

macrumors regular
Original poster
Jan 20, 2004
134
7
I have a script that regularly has to duplicate large files, up to 1 GB, on the same disk. So it is reading and writing at the same time on the same drive. Sometimes I'm doing this on an SSD but usually its running on hard drives. So with all that extra seek time, it really bogs down the process.

It occurred to me that with the amount of RAM in computers these days, it would be very easy to queue up the files copies, and do them one at a time where it reads one whole file in to ram, and then writes out the whole file back to disk. So you'd be maxing out throughput in both directions instead of seeking out of control.

Is there an "easy" way to do copies like this?
The only way I can think to do it, is to have my script create a RAM disk, then copy from the drive to the ram disk, then copy back from the ram disk to the drive. That should work and it should be faster, but I've worked with RAM disks before in macos scripts and it tends to be buggy and poorly supported. It would be great if there was some command like `cp` that natively supported copying like this?

I don't even know what to call it, hence the weird name of this post.
 
mbuffer allows to assign the RAM buffer and will likely speed this up using a rotating disk - you need to have Xcode and e.g., home-brew (install the coreutils) installed to compile and install it: decompress the source archive, change into the source folder, run make and sudo make install for that (you can ignore the warnings).

Then run something like:

mbuffer -m 512M -I original_file -o copy_of_original_file

to copy your large file(s). you probably have to experiment with -m parameter to optimize; run mbuffer --help to see all options.

Using the Duplicate-command in the Finder and moving the duplicated file afterwards to whatever location on the same drive might be faster than your current approach. :cool:
 
  • Like
Reactions: Rade0
I have a script that regularly has to duplicate large files, up to 1 GB, on the same disk. So it is reading and writing at the same time on the same drive.
What filesystem? APFS doesn't actually write the whole file twice, just metadata.
See Clones in
 
  • Like
Reactions: chrfr
These are real flies that really need to get copied, and usually the filesystem is hfs+
 
You can create a classical RAM disk, from the hdiutil(1) man page:

Bash:
     Creating a RAM-backed device and filesystem:

           NUMSECTORS=128000       # a sector is 512 bytes
           mydev=`hdiutil attach -nomount ram://$NUMSECTORS`
           newfs_hfs $mydev
           mkdir /tmp/mymount
           mount -t hfs $mydev /tmp/mymount
 
  • Like
Reactions: Rade0
You could also use rsync(1) to copy and try the --block-size option with a large value.
 
You can use vmtouch to copy files, as in:

Bash:
#!/usr/bin/env bash

for file in "$@"
do
    vmtouch -t "$file"
    cp "$file" /new/path/"$file"
    vmtouch -e "$file" /new/path/"$file"
done

In this example here I am using vmtouch -t to "touch" all of the pages of the file into memory, then copying it, then using vmtouch -e to evict all of the files. I do the eviction because there's no reason to hold onto the memory once you've copied, better to inform the OS you won't be needing those pages anymore.

If your files are larger than your available RAM, you can use the -p parameter to specify byte/megabyte/gigabyte ranges.

Like usual, vmtouch is availabe via MacPorts/HomeBrew.
 
Last edited:
I wonder if dd would help your situation? You can specify a block size. 1G = 1 GiB.
Code:
dd if="/inputfilepath" of="/outputfilepath" bs=1G iflag=fullblock status=none
Try with and without the iflag and status options. Compare with the other methods to see which is faster.

If you remove the status=none, then you may get progress messages and it will tell you how long it takes.

For any command, you can use time to measure how long it takes.
Code:
time dd if="/inputfilepath" of="/outputfilepath" bs=1G iflag=fullblock status=none

It also works for a group of commands:
Code:
time (
sleep 2
dd if="/inputfilepath" of="/outputfilepath" bs=1G iflag=fullblock status=none
sleep 1
)
 
  • Like
Reactions: chown33
I hereby declare @!!! to be the winner, unless you're as lazy as I am:
Code:
$ cp inputfile /dev/null  # put inputfile into the buffer cache
$ cp inputfile outputfile # copy will be from the buffer cache
The os will reuse least-recently-used pages from the cache as needed.
 
HAH that totally works! I don't really understand exactly why or how. Is this going to cause any potential problems if I keep doing it over and over thousands of times? Will the system fully manage the buffer without leaking memory and grinding things to a halt? I'll test this out tonight and see if it works at scale. This is the kind of simplicity I was looking for, all the other solutions seemed overly complex for what I was trying to do.

Also on a quick test on a hard drive, I get about 30MB/sec or so read and write at the same time, doing a normal direct copy. And using this method, I'm getting getting about 150 or so MB/sec read and write. Not at the same time, but even if you cut those numbers in half, its still about 75MB/sec, which is MUCH faster. It also copies so fast, its hard to take super accurate states.
 
You have to be careful when timing things. The second cpcan exit before the data has been flushed from the cache to the disk. Use the synccommand to insure that pending writes are complete.
Code:
$ cp inputfile /dev/null  # put inputfile into the buffer cache
$ cp inputfile outputfile # copy will be from the buffer cache
$ sync                    # wait until writes are complete
You could wrap that up in a shell script:
Code:
#!/usr/bin/env bash -e
# usage: copyit inputfile outputfile
cp "$1" /dev/null
cp "$1" "$2"
sync
Save that as copyit, mark it as executable by typing chmod +x ./copyitthen you can time a copy by typing
Code:
/usr/bin/time ./copyit inputfile outputfile
A production script would have error checking, but try it out before getting fancy.
PS - syncis a blunt instrument, it waits until all writes are complete, whether to your disk or otherwise, but it's probably sufficient for your purposes. Also, not tested at all for files whos names contain spaces, you've been warned.
 
150 MB/sec is pretty much the max you could expect for a HDD. And you'd have to have a very high quality drive to get that.

1 GB files actually aren't that large. In my speed test app, my largest test buffer is 1 GB. Originally I was using low-level C I/O, but my results were too good that way. I changed it to use higher-level Apple APIs so it would be more representative of what people would see in the real world.

Using the buffer cache would be a good way to do one read and many writes of the same data. If your files are only 1 GB, then you shouldn't have any memory issues, as long as all you are doing is writing this one file. Any other activity has the potential to bump that data out of cache.

A higher-level script like Perl would allow you to do the same thing, but have more control over the process.

You can easily create a RAM disk using something like: hdiutil attach -nomount ram://20971520
That will give you a device that you can pass to this: diskutil erasevolume HFS+ "RAMDisk" /dev/<whatever>
And detach when done: hdiutil detach /dev/<whatever>

Still not as clever as the buffer cache, but it does give you more control over the process.

Also, you might want to be concerned about metadata, resource forks, etc. Those may get lost with these simple copies. The highest fidelity copy would probably be with a Ramdisk and ditto.
 
  • Like
Reactions: chrfr and throAU
Manually setting up the ram disk was my last resort. But simplly copying to /dev/null first seems to be working perfectly. It does exactly what I want, fully read the file from disk, then fully write the file to disk, no simultaneous read-writes.
Also HDDs seem to be pretty fast tehse days, I can regularly get sustained speeds over 150. Not WAY over but this hard drive out of a 27" iMac reads at 185mb sustained and peaks over 200. Which is quite a boost from sometimes as slow as 30mb/s when the read and write are synced.

Speaking of syncing, Grumpus I don't understand your use of the sync command. Are you saying the cp command will finish before its actually done copying?
 
Speaking of syncing, Grumpus I don't understand your use of the sync command. Are you saying the cp command will finish before its actually done copying?
I'm saying that cp can, and probably does, exit before the os has actually written all of the data to disk. The sync command on macos tells the os to flush any pending writes to disk and waits until that's done before exiting.
 
I have a script that regularly has to duplicate large files, up to 1 GB, on the same disk.
While the cp > /dev/null or vmtouch approach will/should work well with HFS+, I want to encourage you to follow @bogdanw 's advice and look into APFS. (you WILL need an SSD though.)

- Copies with cp are nearly instantenous.
- Copies with cp dont occupy additional space on the disk.
Why? The finder shows you two files, but because their content is identical they are pointing to the same data blocks. Only if you change/edit one of the files, the difference will be written out to the disk.
 
This is a drive stress testing script who's job is specifically to fill the drive, and then run it under load constantly. So the dev null trick under hfs+ is just what I need.
 
  • Like
Reactions: tonmischa
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.