macOS Writeback efficiency

Sydde · Feb 24, 2011

I have organized my document data into discreet pieces that I selectively write back to the original file when it is saved (none of that namby-pamby safe write stuff). Specifically, the file is a hand-crafted tarball. The reason for doing this is that some of the pieces may be quite large: if they can be captured low in the file where they will be unaltered, it will not be necessary to write them back every time.

So, I start writing content (files) at an arbitrary offset (where ever stuff has changed or been inserted — I am trying to keep directories contiguous), up to the end, by constructing the header and appending the data to a NSMutableData object, writing it out, then reusing it for the next piece. Now, it occurs to me that I could run through and build one single data object (basically a memory image of the file) and cast it out in one write, and it seems like that would be somewhat more efficient (up to the point that the data starts to crowd memory out).

Yet, how much actually happens when you write? TBMK, operating systems have for quite a while been caching disk writes (perhaps not so much, as solid state drives become more common), so if I do the single object write, I will be briefly triplicating the data (in the document, in the NSMutableData object and in the disk cache). If I write out discreetly (one piece at a time), does disk caching make that comparable in efficiency to a single write (bearing in mind that the NSMutableData object is being resized with each piece)?

Catfish_Man · Feb 24, 2011

I suggest looking into mmap()ing the file; it was designed for circumstances just such as this.

Sydde · Feb 24, 2011

Catfish_Man said:
I suggest looking into mmap()ing the file; it was designed for circumstances just such as this.

That is pretty nice, I was not aware of it being available. Unfortunately, the Darwin implementation seems to be slightly lacking and perhaps a little extra work.

I could subclass NSData and just implement -bytes, -getBytes, and any other methods that return the actual data.

However, I really do want to keep the pieces (file blocks) in order in the archive (adjacent to their directories), which would require me to do a lot of copying (for insertions). But more than that, I am not seeing how one would go about resizing the file. The Linux man pages list those functions, but not Darwin.

It may prove darn handy in future projects, though.

jiminaus · Feb 24, 2011

Sorry these are suggestions phrased as questions, rather than answers.

How woud it go using NSDataReadingMapped and/or NSDataReadingUncached passed as the option to initWithContentsOfURL:options:error:?

These are defined for NSData but do that effect NSMutableData's writing?

gnasher729 · Feb 25, 2011

Sydde said:
I have organized my document data into discreet pieces that I selectively write back to the original file when it is saved (none of that namby-pamby safe write stuff). Specifically, the file is a hand-crafted tarball. The reason for doing this is that some of the pieces may be quite large: if they can be captured low in the file where they will be unaltered, it will not be necessary to write them back every time.

Have you measured any times? And how big is big? You know that on an average hard drive, the time for a write access is about the same as the time for writing about 600 KBytes, right? So writing 4K, skipping 596K, writing 4K, skipping 596K and so on isn't any faster than writing everything in one go?

Plus, the namby-pamby safe write stuff will safe your users a lot of trouble if the Mac loses power at the wrong moment. Nothing more horrible than overwriting a document that has been worked on for a long time.

Sydde · Feb 25, 2011

jiminaus said:
Sorry these are suggestions phrased as questions, rather than answers.

How woud it go using NSDataReadingMapped and/or NSDataReadingUncached passed as the option to initWithContentsOfURL:options:error:?

These are defined for NSData but do that effect NSMutableData's writing?

Actually, NSData has an -initWithContentsOfMappedFile: which would, I think, do much the same thing. However, getting it to write back to the file is not at all documented. As far as I can tell, this method is only useful for reading, once you have the data in memory, you can alter it all you want, but the init sets the mmap mode to PROT_READ, so nothing ever actually gets written back. Then there is the matter that you cannot lose the file while you have it mapped (e.g., another computer on your network containing the file suddenly drops off).

(Never saw the noparse tag before, that looks kind of handy.)

gnasher729 said:
Have you measured any times? And how big is big? You know that on an average hard drive, the time for a write access is about the same as the time for writing about 600 KBytes, right? So writing 4K, skipping 596K, writing 4K, skipping 596K and so on isn't any faster than writing everything in one go?

Plus, the namby-pamby safe write stuff will safe your users a lot of trouble if the Mac loses power at the wrong moment. Nothing more horrible than overwriting a document that has been worked on for a long time.

Your points are well taken. I always assume that if I am not imposing bounds or limits on what a program can do, it might produce or be asked to handle arbitrarily large files, in which case time conceivably could be a consideration.

The safe thing, well, yeah, I probably should handle that more carefully.

Catfish_Man · Feb 26, 2011

Ah, yeah, the "mmap blows up on removable media" thing sucks. Check the Lion API docs for NSData if you have access 😉

Search

Search

macOS Writeback efficiency

Sydde

macrumors 68030

Catfish_Man

macrumors 68030

Sydde

macrumors 68030

jiminaus

macrumors 65816

gnasher729

Suspended

Sydde

macrumors 68030

Catfish_Man

macrumors 68030

Our Staff