1. Welcome to the new MacRumors forums. See our announcement and read our FAQ

NSMutableSet ridiculously poor performance ?

Discussion in 'Mac Programming' started by phjo, Feb 6, 2008.

  1. macrumors regular

    I have not been able to spend much time programming since ages, but I thought I would be sad not to put my new mac pro and xcode 3 to some use... (Automatic completion is a pleasure to use, and looks much easier than xcode 2 for what I can remember, but I do not intend to use garbage collection before I understand proper memory management...)

    As far as cocoa goes, I certainly do qualify as a newbie, although I did my homework (I spent some time with the Hillegass book, and browsing documentation on the web...)

    (But I did previously some Assembly, Pascal, C++)

    I tackled a small exercice to get a little more accustomed to cocoa memory management, as well as NSSet, NSArray classes. The problem is to write a little procedure taking one NSSet as an argument, and return the set of all subsets of this argument. (Exemple : the subsets of {a,b} are {},{a},{b} and {a,b}...)

    First algorithm is a recursive one : you remove one element in E, you <<calculate>> the subsets of that new set, then the result is obtained by duplicating each subset adding, or not, the element you first substracted.

    After many trials, I came to this procedure :

    NSMutableSet * subsets(NSSet * set)
    	id element;
    	if (element = [set anyObject])
    		NSMutableSet * setMinusOneElement = [[NSMutableSet alloc] initWithSet: set];
    		[setMinusOneElement removeObject:element];
    		NSMutableSet * subsetsOfSetMinusOneElement = [subsets(setMinusOneElement) retain];
    		NSMutableSet * output = [[[NSMutableSet alloc] initWithSet:subsetsOfSetMinusOneElement] autorelease];
    		NSEnumerator * enumerateSet = [subsetsOfSetMinusOneElement objectEnumerator];
    		id setObject;
    		while (setObject = [enumerateSet nextObject])
    			NSMutableSet * newSubset = [[NSMutableSet alloc] initWithSet:setObject];
    			[newSubset addObject:element];
    			[output addObject:newSubset];
    			[newSubset release];
    		[subsetsOfSetMinusOneElement release];
    		[setMinusOneElement release];
    		return output;
    		NSMutableSet * output = [[[NSMutableSet alloc] init] autorelease];
    		[output addObject: [[NSMutableSet alloc] init]];// The subsets set of the empty set is { {} }
    		return output;
    It works, but the performance is ridiculously poor. Granted : the design of this procedure using NSMutableSet is quite bad, as each time you add an object to a NSMutableSet, it has to be checked against every other one.

    Still, I was very surprised when I saw that the time needed to obtain the subsets of a 16 elements set (65536 of them of course, which should not frighten my macpro) was over 2 minutes !

    The same exercise done with Maple V with the same algorithm takes, on the same computer less than half a sec...

    Here is the relevant maple procedure :
    subsets:=proc(S :: set) local i,ret;
     if nops(S)=0 then return({ {} }); fi;
     ret:=subsets(S minus {op(1,S)});
     {seq(op(i,ret),i=1..nops(ret)),seq(op(i,ret) union {op(1,S)},i=1..nops(ret))};
    Note that the Maple code I quoted is interpreted so the former Objective-C code I pasted should have a big advantage (although certainly not concision!), and it looks to me the work being done behind the scenes by Maple is the same than what is being done with the objective-c code I pasted before.

    In my experience, Maple is quite slow, so the results of these tests were a huge surprise.

    Do you see an obvious misstake in the code I pasted that could explain how slow it runs ? (Except in the algorithm itself of course...)

    Note that to minimize the effects of NSMutableSet addObject checkings, I tried another procedure :

    NSMutableArray * nonRecursiveSubsets(NSSet * set)
    	int n = 1 << [set count];
    	NSMutableArray * output = [[[NSMutableArray alloc] init] autorelease];
    	int i;
    	for (i=0; i<n; i++)
    		NSEnumerator * enumerateSet = [set objectEnumerator];
    		NSMutableSet * newSubset = [[NSMutableSet alloc] init];
    		id element;
    		int j=i;
    		while (j) {
    			element = [enumerateSet nextObject];
    			if (j & 1) {
    				[newSubset addObject:element];
    			j >>= 1;
    		[output addObject:newSubset];
    		[newSubset release];
    and this second procedure, this is no surprise, is much faster...

    Any comment is welcome,

  2. macrumors 6502a


    Now you know why certain first-release Cocoa apps run reeeeely slowly. Cocoa isn't the perfect solution to every problem. Sometimes doing things in naked C is best.

    (got my 16GB iPhone - hawwwwwt!)
  3. macrumors regular

    Well, it is not a surprise my first design is not efficient. But to see with the same design a Maple procedure going 250 times faster than objective-c makes me think I must have missed something obvious (apart from the algorithm itself) in my little piece of code... Or it just feels that there must be something really wrong in NSMutableSet implementation.

  4. Moderator emeritus


    Don't be so quick to blame NSMutableSet. Use Shark to profile your code and figure out where it actually is running slowly.

    Is that code called recursively? One thing that isn't good is allocating memory in a tight loop, and you are creating a new NSMutableSet every time. Just a thought.
  5. macrumors 6502

    What kind of objects are you using as the elements of your set? The performance of NSMutableSet depends greatly on the implementation of -hash and -isEqual by the objects in the set.
  6. macrumors regular

    Don't get me wrong : I did put a rather provocative title to this thread, but my best bet is still that I made a serious misstake here or there creating a huge bottleneck which might explain the poor performance...

    I did not find why or where yet...

    I tried, but honestly I don't know what to do with the result... Much of the time is spent on parts of CoreFoundation of course, which is no surprise considering the code I posted.

    (_CFSetFindBuckets1b, _CFStringHash, _CFSetEqual, CFEqual, CFSetGetValueIfPresent...)

    First procedure is recursive but there are only as many calls to the procedure as there are elements to the set...

    As for allocating a lot of NSMutableSet, well I do have to create/allocate one for every subset that will be part the set the procedure is returning, which, I hope, is exactly what is done there.

  7. macrumors regular

    Yes, there might be something there. I had the same idea but dismissed it because I thought that comparing subsets would always be comparing NSSets of NSObjects, and that the nature of the NSObjects themselves would not be important to compare the sets themselves... (Not sure I am quite clear there...)

    So my tests uses (very short) NSStrings, but I shall test NSNumbers too, to see if it makes a huge difference.

    (I just don't have time to do it right now, but will return with the results in a few hours...)

  8. macrumors 68040


    Your code seems to be making a lot of unneccessary copies of sets and sets of sets.

    For example, I think you don't need to make a copy of "subsetsOfSetMinusOneElement" for "output".

    You can just use subsetsOfSetMinusOneElement directly where you are using output now (the 1st output). Though you'll need to adjust your memory management a little, of course, and maybe rename it.
  9. macrumors 6502

    It might make a big difference.

    NSMutableSet uses a hash table internally to speed up the process of locating objects within the set. (Essentially, to avoid having to call -isEqual on every single object in the set every time that it needs to determine whether a particular object is a member of the set.)

    NSObject implements -hash and -isEqual to give results based on the pointer of the object - that's the only thing that really makes sense for a generic, abstract class such as NSObject. Most subclasses of NSObject override these two methods so that they make sense in the context of the value of the object. For example:

    NSString stringA = @"A string.";
    NSString stringB = [NSString stringWithString:stringA];
    if (stringA == stringB)
    	...		//  This may not be true.
    if ([stringA isEqual:stringB])
    	...		//  This is definitely true.
    The performance of NSMutableSet depends greatly on how unique the hash provided by the object is. See here for a detailed explanation of why this is.
  10. macrumors G5


    What about using a proper data structure instead.

    I'll just assume that you are not interested in the powerset of S itself (because that is quite boring, you _know_ its elements, no need to store them), but in arbitrary sets of subsets of S. So we make a class to represent sets of subsets of S.

    If S has N elements, then every subset of S can be represented easily as an array of N bits, or equivalent as a single unsigned N-bit integer. There are 2^N possible subsets of S. Likewise, any set of subsets of S can be represented as an array of (2^N) bits.

    So here is what you do: You count the elements in S, giving a number N. If N > 32 then you give up (N = 33 would require a whole GB to represent every possible set of subsets of S). If N <= 32, you allocate an array of 2^N bits or 2^N / 8 bytes. This is easily done by calling malloc (N < 3 ? 1 : 1ul << (N - 3)).

    Now you have a data structure that can represent _any_ set of subsets of S. To set such a data structure to represent the powerset of S, call memset to set all the bits in the bit array to 1.

    Implement this, and your execution time should go down to maybe a dozen microseconds. Set operations like unions and set differences on this kind of set will take microseconds as well (when you start with a set of 16 elements, giving an array of 8192 bytes).

    There was an article a few weeks ago in Slashdot, about a computer science prof complaining that students learn Java, learn to use a class library, and have no clue what is going on behind their back. This may not be restricted to Java.
  11. macrumors G4


    A comment on the Maple performance, Maple is a mathematical package, so doing this kind of operation will probably be comparatively fast.

    Also NSMutableSet is likely to be slow as its mutable, which is really for convenience for doing simple set operations, like adding a single element.

    If I wanted to find the set of subsets in Cocoa, I'd be working with NSSets, as you shouldn't need to change the size of any sets. An NSArray for the original set of entries is probably better too, as you need to find specific items in the set for the subset.

    Finally as the set of subsets of a given set is usually massive, this code is going to be performance critical so is probably better written C/C++ or even Assembler, however decent performance should be able to be gotten out of Objective-C with decent code.
  12. macrumors regular

    Well, I'd be curious to see the code you suggest for that, because I don't see how to do what you propose... It feels like I just can't at the same time have a NSEnumerator object enumerating elements of output and adding objects to the same variable output...

    I didn't deny it, and I did the tests I mentioned, using NSNumbers instead of NSString for the base elements of the set... And you're right, it does make a significant difference...

    Here is the relevant code :

    int main (int argc, const char * argv[]) {
        NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
        NSMutableSet * set = [[NSMutableSet alloc] init];
    	int i;
    	for (i=0; i<16; i++)
    		//[set addObject: [NSNumber numberWithUnsignedInt:i]];
    		[set addObject: [NSString stringWithFormat:@"%i",i]];
    	NSMutableSet * subsetsOfE = [subsets(set) retain];
    	NSLog(@"There are %i parts",[subsetsOfE count]);// Yes, the answer is known : 2^16...
    	[set release];
    	[subsetsOfE release];
        [pool drain];
        return 0;
    And the results are very, very strange :

    on my mac pro running Leopard :
    about 2mn for the set with NSStrings
    about 4mn for the set with NSNumbers (that's right, twice as slow)

    on my first generation macbook (dual core) running Tiger :
    about 8mn for both tests (a tiny bit faster with NSNumbers, but it's close.)

    That's an interesting article, thanks.

    You missed the point... I don't intend to do anything with this set of subsets. I coded these little procedures to learn a bit about cocoa, and proper memory management. I certainly learned a lot doing it so it is a success, but I am a kind of a curious guy, and seeing how slowly my first procedure ran I wanted to understand why...

    I still do, and am as confused as ever,

  13. macrumors G5


    Tell me again, what is the title of this thread?

    The title is "NSMutableSet ridiculously poor performance". The answer matching the thread title is "don't complain, it's your own fault for using the wrong tool for the task".

    The long answer is: Most software requires trade-offs. The developer will tend to want good performance that she or he sees as typical use-cases. Any use far away from the typical use-case will tend to be slow.

    Your code is atypical in many ways. First of all, it does something in a very inefficient way - that makes it atypical by definition. No developer will care to make their code fast when it is used a non-typical way. Now think about how many times CFRetain is called for each set element. Do you think what you are doing might be a bit non-typical use again? Now look at all the mutable sets you create, 65536 of them (maybe a few more, you can check yourself if you can be bothered). You create a set, add a few elements, and that's it. That is not typical use. In a typical use, a set is created and lives for a long time - that makes it a good idea to use a data structure that is a bit more expensive to set up but allows most operations to be faster. Your use makes sure that this setup cost is mostly wasted.

    Apple developers tend very much to write code that is optimized for typical use-case, not for benchmarks. That leads to strange accusations at times, like "NSMutableSet ridiculously poor performance". Or other accusations of poor performance in some micro-benchmark.
  14. macrumors 68040


    That's a good point. I'd accumulate the new sets into an array (initialized with appropriate capacity) during the enumeration and then add them as a group afterwards.

    The speed of NSNumber vs. NSString on your MP is interesting. And further that there isn't a similar difference on the MB. I wonder if it is something to do with 64-bit vs. 32-bit Mac OS?
  15. macrumors regular

    Well yes, it looks like Maple does not do that bad there, so there must be something in the internal representation of a set in Maple that makes these kinds of operations (adding objects or unions for example) more efficient than what provides NSMutableSet.

    Wrong. You forgot, or chose to ignore, a very important item : the question mark.

    Exactly ! I do create, doing very little for each, a bit over 65536 (small) mutable sets !

    It might sound huge (and certainly stupid, I admit), but honestly, we're talking about cpus running at more than 2Ghz, with gigabytes of memory.

    So my mac pro needs several minutes to do that ?

    There are probably some very good explanations for that and Nutter gave
    me a few things to think about (and learn, again), but I still have no real understanding as to this slowness I perceive.

    To understand my motivations, one of my projects is to code some scientific applications to look for patterns of chaos and strange attractors in ordinary differential equations, and for that, optimization, even on a fast computer, is essential. (I do have some other very cpu intensive projects...)

    I need some scientific graphing library for that purpose as well, and it is where cocoa will be the most important, and where I need to know what is fast and efficient, and what is not...

    I have no deadline, and I do this mostly for fun but the long-term linux user that I am and who switched to osx 18 months ago is torn between coding cocoa or c++ with trolltech qt... (in this last case there is the excellent graphing library Qwt I had good experience with...)

    I had much rather go cocoa/objective-c but this little experiment refrains me a bit I must say...

  16. macrumors 6502a

    I think the large difference in time between your recursive NSSet method and the Maple one could, perhaps, simply be down to the fact that they are not equivalent when it comes to the number of times the set hash calculation is performed (I admit I don't know Maple well enough to compare the two programs or even know if Maple hashes sets at all, but it's a possibility).

    I'd find it interesting to see an estimate of the total number of times NSSet calculates it's hash. Perhaps have a global counter in subsets(), and every time you create a set, or add a member to the set, increment the counter. I guess the number will increase exponentially with the size of the initial set, so perhaps some comparisons of different set sizes might be useful. If I'm right, this exponential growth might be the killer.

    b e n
  17. macrumors regular

    Yes, using arrays is certainly the way to go for better performance (most of the computing time is lost checking if added objects are already there I suppose...)

    (Which led me to the second, not recursive, procedure I pasted, which is several hundreds of times faster, but returning a NSMutableArray instead of a set...)

    My bet would be some difference, optimization?, in the hashing done for NSStrings in leopard, as the comparizon of the times spent between the macbook and the macpro goes much farther than the difference of power of the two cores for NSStrings... (I can expect one core of my 2.8Ghz mac pro to be about 2 times faster than one core of my 2Ghz first gen macbook, but certainly not 4 times !)

  18. macrumors 68040


    Note that my suggestion doesn't change the number of objects inserted into sets--it just avoids creating extra copies of large sets. Nor does it change the algorithm.

    I see what you mean. It would be an interesting problem for someone to run down...

    I wouldn't let this cause you to refrain from using Cocoa and Objective-C.
    One has to gain an understanding of any new toolset in order to use it effectively. Wasn't your second implementation hundreds of times faster than your first? It seems like you're well on your way... :)
  19. macrumors newbie

    Without really looking at your code too carefully, two other things to consider:

    1) You should try Obj-C 2.0's Fast Enumeration. The old style object enumerators are very slow in comparison.

    2) You should try enabling garbage collection just for comparison. You are alloc'ing and autoreleasing lots of objects. There are costs associated with doing so particularly in loops. In a non-GC environment, you might tune things by creating temporary autorelease pools but it might turn out that turning on GC in this case will perform better without you having to deal with all the memory fine tuning.
  20. macrumors G5


    That's what open source is there for. The source code for the core foundation classes (same as the objective C classes) can be found at http://www.opensource.apple.com/darwinsource/10.4.10.x86/ under "CF 368.28". Download it (requires free ADC membership, extract the zip file, check CFSet.c. (CFSet is the same as NSSet and NSMutableSet).

    And then you find that

    1. CFSet requires that the objects stored in a set have a good hash function.
    2. CFSets do not have a good hash function.

    It seems like creating sets of sets is not a typical use case.
  21. macrumors regular

    I just tried fast enumeration. It certainly gives nicer code, and it is a bit faster (2mn 6sec agains 2mn 9sec) but the real bottleneck is elsewhere.

    I have yet to try garbage collection.

    Btw, is there a good book to read about objective-c 2 ?

    Many thanks ! This explains a lot and makes perfect sense.

    The only remaining mystery would be the differences between NSStrings and NSNumbers in sets showing a significant advantage in leopard when using NSStrings there... (For the second procedure I quoted, the penalty for using NSNumbers is not near 100% as with the first one, but is still significant, in the order of 30%)

  22. macrumors newbie

    I do hope you know it's really easy to try. Just change a single setting from off to on in Xcode.

    Not yet. I think some big names in Cocoa are working on some though.

    In the meantime, there are some blogs that discuss them. Check out Scott Stevenson's blog for one. He has a 2-part introduction to it.

    Total speculation, but it could be the default hash function for NSNumbers is slow and inefficient (too many collisions) for your data set.
  23. macrumors G5


    There is a gazillion calls to calculate the hash codes for these NSStrings and NSNumbers. For an NSNumber, the hash code is not much more than the number itself. For the strings, it is a checksum calculated from the first eight and last 16 characters; that takes longer.

    As an experiment, you can do the following: Create 65536 different NSNumbers, measure the time, insert them all into a set, again measure the time. Then the same with 65536 strings, then with 65536 sets each containing just one NSNumber. The last one should be horribly slow compared to the other operations.
  24. macrumors regular

    Yes, I know. So I did try, and for what it's worth, my little tests now run 8% slower, which sounds reasonable.

    I did. Scott Stevenson's blog is a favorite of mine. And after all, apple documentation is quite good.

    Still, it is quite surprising that when looking for books about osx programming, you find only books that are slightly or severely outdated...

    Maybe, but it sounds really weird if the hashing function is indeed just, from what I read, the absolute value of the integer there... There should be no collision... at all.

    As seen later, the weirdest slowness occurs when comparing sets of numbers, not numbers themselves.

    Yeah, what you say corroborates what I read (the link Nutter posted for example)

    I did not exactly the experiment you suggest, but close :

    #import <Foundation/Foundation.h>
    int main (int argc, const char * argv[]) {
        NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
    	NSMutableSet * set = [[NSMutableSet alloc] init];
    	int i,j;
    	for (j = 0; j<4; j++) {
    	for (i = 0 ; i < 10000; ++i)
    		//[set addObject: [NSString stringWithFormat:@"%i",i]];
    		//[set addObject: [NSNumber numberWithUnsignedInt:i]];
    		//[set addObject: [NSSet setWithObject:[NSString stringWithFormat:@"%i",i+10000*j]]];
    		//[set addObject: [NSSet setWithObject:[NSNumber numberWithUnsignedInt:i+10000*j]]];
    		//[set addObject: [NSSet setWithObject:[NSString stringWithFormat:@"%i",i]]];
    		[set addObject: [NSSet setWithObject:[NSNumber numberWithUnsignedInt:i]]];
    	[set release];
        NSLog(@"Set released");
        [pool drain];
        return 0;
    And the results are :

    - first two tests are very quick, and only much larger sets are usable for comparisons, but insertion of NSNumbers is quicker by about 30%.
    - with the fourth remaining tests, with exactly the code I pasted, here are the times needed to run them, on my mac pro running leopard :

    2m2 secs (inserting 40000 singletons with one string)
    3m38 secs (inserting 40000 singletons with one integer nsnumber)
    22 secs (inserting 10000 singletons of string duplicated 4 times each)
    40 secs (inserting 10000 singletons of numbers duplicated 4 times each)

    Of course you're quite right stating that insertion of sets is not typical and very slow... The only thing which is still a mystery to me is why sets of NSNumber take longer to uniquely identify than sets of NSStrings... (under leopard...)

  25. macrumors 6502a


    phjo , out of curiosity I converted your recursive algorithm to use the C++ equivalent std::set. Here are approximate times for NSSet and std::set:-

    For NSSet
    N = 12, time = 1 second
    N = 13, time = 2 seconds
    N = 14, time = 9 seconds
    N = 15, time = 31 seconds
    N = 16, time = 119 seconds

    For std::set
    N = 16, time = 1 second
    N = 17, time = 2 seconds
    N = 18, time = 4 seconds
    N = 19, time = 8 seconds
    N = 20, time = 18 seconds
    N = 21, time = 31 seconds

    Looks like std::set is taking 2^N time to run (which is what I would expect for the recursize algorithm), but NSSet is taking 2^(2N). At that rate it would take one and a half days for NSSet to reach N=21!

    b e n

    By the way N=22 kills the memory in my 4Gig iMac!

Share This Page