Writeback efficiency

Discussion in 'Mac Programming' started by Sydde, Feb 24, 2011.

  1. Sydde macrumors 68020

    Sydde

    Joined:
    Aug 17, 2009
    #1
    I have organized my document data into discreet pieces that I selectively write back to the original file when it is saved (none of that namby-pamby safe write stuff). Specifically, the file is a hand-crafted tarball. The reason for doing this is that some of the pieces may be quite large: if they can be captured low in the file where they will be unaltered, it will not be necessary to write them back every time.

    So, I start writing content (files) at an arbitrary offset (where ever stuff has changed or been inserted — I am trying to keep directories contiguous), up to the end, by constructing the header and appending the data to a NSMutableData object, writing it out, then reusing it for the next piece. Now, it occurs to me that I could run through and build one single data object (basically a memory image of the file) and cast it out in one write, and it seems like that would be somewhat more efficient (up to the point that the data starts to crowd memory out).

    Yet, how much actually happens when you write? TBMK, operating systems have for quite a while been caching disk writes (perhaps not so much, as solid state drives become more common), so if I do the single object write, I will be briefly triplicating the data (in the document, in the NSMutableData object and in the disk cache). If I write out discreetly (one piece at a time), does disk caching make that comparable in efficiency to a single write (bearing in mind that the NSMutableData object is being resized with each piece)?
     
  2. Catfish_Man macrumors 68030

    Catfish_Man

    Joined:
    Sep 13, 2001
    Location:
    Portland, OR
    #2
    I suggest looking into mmap()ing the file; it was designed for circumstances just such as this.
     
  3. Sydde thread starter macrumors 68020

    Sydde

    Joined:
    Aug 17, 2009
    #3
    That is pretty nice, I was not aware of it being available. Unfortunately, the Darwin implementation seems to be slightly lacking and perhaps a little extra work.

    I could subclass NSData and just implement -bytes, -getBytes, and any other methods that return the actual data.

    However, I really do want to keep the pieces (file blocks) in order in the archive (adjacent to their directories), which would require me to do a lot of copying (for insertions). But more than that, I am not seeing how one would go about resizing the file. The Linux man pages list those functions, but not Darwin.

    It may prove darn handy in future projects, though.
     
  4. jiminaus macrumors 65816

    jiminaus

    Joined:
    Dec 16, 2010
    Location:
    Sydney
    #4
    Sorry these are suggestions phrased as questions, rather than answers.

    How woud it go using NSDataReadingMapped and/or NSDataReadingUncached passed as the option to initWithContentsOfURL:options:error:?

    These are defined for NSData but do that effect NSMutableData's writing?
     
  5. gnasher729 macrumors P6

    gnasher729

    Joined:
    Nov 25, 2005
    #5
    Have you measured any times? And how big is big? You know that on an average hard drive, the time for a write access is about the same as the time for writing about 600 KBytes, right? So writing 4K, skipping 596K, writing 4K, skipping 596K and so on isn't any faster than writing everything in one go?

    Plus, the namby-pamby safe write stuff will safe your users a lot of trouble if the Mac loses power at the wrong moment. Nothing more horrible than overwriting a document that has been worked on for a long time.
     
  6. Sydde thread starter macrumors 68020

    Sydde

    Joined:
    Aug 17, 2009
    #6
    Actually, NSData has an -initWithContentsOfMappedFile: which would, I think, do much the same thing. However, getting it to write back to the file is not at all documented. As far as I can tell, this method is only useful for reading, once you have the data in memory, you can alter it all you want, but the init sets the mmap mode to PROT_READ, so nothing ever actually gets written back. Then there is the matter that you cannot lose the file while you have it mapped (e.g., another computer on your network containing the file suddenly drops off).

    (Never saw the noparse tag before, that looks kind of handy.)
    Your points are well taken. I always assume that if I am not imposing bounds or limits on what a program can do, it might produce or be asked to handle arbitrarily large files, in which case time conceivably could be a consideration.

    The safe thing, well, yeah, I probably should handle that more carefully.
     
  7. Catfish_Man macrumors 68030

    Catfish_Man

    Joined:
    Sep 13, 2001
    Location:
    Portland, OR
    #7
    Ah, yeah, the "mmap blows up on removable media" thing sucks. Check the Lion API docs for NSData if you have access ;)
     

Share This Page