macOS C++ XCode v. 4.6 Shuffling an Array

thrill4rishabh · Apr 9, 2013

Hi people,

I am trying to shuffle an array of integers and the function definition I came up with is as follows:

Code:

void shuffle (int array[], int & size)
{
    cout << "\nShuffling cards.... ....\n";
    for (int i = 0; i < size; i++)
    {
        int random = abs(rand()/size);
        swap (array[i], array[random]);
    }
    cout<<"\n cards shuffled \n";
}

It gives me the line : "Shuffling cards.... ....:"

But then there is a 'Bus error : 10'

Could anyone help me please?

Thanks

subsonix · Apr 9, 2013

You probably want to mod your result of rand() with size (%), not divide it. Example if size is 10 and rand() generates 1000, then the index you get is 100, which is out of bounds for your array.

mfram · Apr 9, 2013

Well, for one you probably want:

Code:

int random = rand() % size;

chown33 · Apr 9, 2013

What do you expect the value of the variable 'random' to be? Clearly, you expect it to be random, but within what range of values?

What is the actual value of the variable? Is it within the expected range of values? If not, then break down the calculation into smaller parts (there are only two), and work out where it's going wrong.

The essence of debugging is to confirm your expectations. So write out exactly what you expect. Then confirm it by looking at the actual value, maybe using a suitable print to cout, or by using the debugger.

The above applies to all programs: confirm expectations.

You should also learn a bit about shuffle algorithms. Example:
http://en.wikipedia.org/wiki/FisherYates_shuffle

Found by searching Wikipedia for search terms shuffle algorithm

thrill4rishabh · Apr 9, 2013

That was stupid of me. Thanks.

lloyddean · Apr 9, 2013

Since you're using C++ why not use what's already available in 'algorithm'?

Two versions of 'random_shuffle' the simpler one is exampled below and there is also a version which can be passed a random number generator of your own design.

Code:

#include <algorithm>
#include <random>
#include <iterator>
#include <iostream>
 
int main()
{
    using std::cout;
    using std::endl;
    using std::random_shuffle;
    using std::vector;
    
    vector<int>    cards = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 };
 
    srand(time(NULL));
    random_shuffle(cards.begin(), cards.end());

    for ( vector<int>::iterator itr = cards.begin(); itr != cards.end(); itr++ )
    {
        cout << *itr << " ";
    }

    cout << endl;

    return 0;
}

chown33 · Apr 9, 2013

thrill4rishabh said:
That was stupid of me. Thanks.

The bigger mistake is that you apparently have no debugging strategy. If you have no debugging or testing strategy, and no way to examine variables or confirm that values are as expected, then you have no way to find or fix mistakes except by having others do your debugging for you.

A programmer who can't debug or test their own code is about as useful as a cook whose food is never eaten. It may be possible to make amazing things with lots of ingredients, and it may look wonderful, but if no one tastes the result, then it's not really cooking. "The proof of a pudding is in the eating".

gnasher729 · Apr 9, 2013

thrill4rishabh said:
Hi people,

I am trying to shuffle an array of integers and the function definition I came up with is as follows:

Code:

void shuffle (int array[], int & size) { cout << "\nShuffling cards.... ....\n"; for (int i = 0; i < size; i++) { int random = abs(rand()/size); swap (array[i], array[random]); } cout<<"\n cards shuffled \n"; }

It gives me the line : "Shuffling cards.... ....:"

But then there is a 'Bus error : 10'

Could anyone help me please?

Thanks

1. Why is size an int& and not an int? Doesn't make sense at all.
2. I'd recommend that things like array sizes should have type size_t.
3. I'd recommend using std:: for standard library functions, like std::swap.

4. To see what's going wrong: What is the range of values that rand () could return? If that is divided by size, what is the range of possible values? And what range does random need to be in?

5. Challenge question: If the problem is fixed, and if the variable random were set to a truly random value from 0 to size-1 in each iteration of the loop, would the result permutation of the array be perfectly random, or would some permuations be more likely than others?

ArtOfWarfare · Apr 9, 2013

chown33 said:
A programmer who can't debug or test their own code is about as useful as a cook whose food is never eaten. It may be possible to make amazing things with lots of ingredients, and it may look wonderful, but if no one tastes the result, then it's not really cooking. "The proof of a pudding is in the eating".

This analogy struck me as odd. I think a closer one would be "a cook who can't eat their own food."

Left on their own, the cook can't ever develop new recipes - only around the lunch and dinner rush can they determine that their current techniques are yielding undesirable food.

RiskyMr · Apr 21, 2013

Challenge Question Answer

Challenge question: If the problem is fixed, and if the variable random were set to a truly random value from 0 to size-1 in each iteration of the loop, would the result permutation of the array be perfectly random, or would some permutations be more likely than others?

The result would be perfectly random as each element of the array would be swapped with a random value. (And I'm happy to be proven wrong as I love these kinds of questions.)

I also agree with lloyddean and suggest using the standard library whenever possible.

ArtOfWarfare · Apr 21, 2013

gnasher729 said:
5. Challenge question: If the problem is fixed, and if the variable random were set to a truly random value from 0 to size-1 in each iteration of the loop, would the result permutation of the array be perfectly random, or would some permuations be more likely than others?

Imagine a set of size 2 containing A and B. We have 4 possible sequences of two random numbers being drawn:

00: BA
01: AB
10: AB
11: BA

Looks like an even distribution of permutations to me, but I'll try with a set of size 3 just so I can be a bit more confident.

Imagine a set of size 3 containing A, B, and C. We have 27 possible sequences of these 3 random numbers being drawn

000: CAB
001: BCA
002: BAC
010: CBA
011: ACB
012: ABC

So far we're perfectly distributed but an issue just occurred to me. There's 27 possible sequences of random numbers (with repeats) and 6 permutations of 3 elements. Thus, at best we'd have 3 permutations show up 4 times and 3 permutations show up 5 times

I believe this code is perfectly random:

Code:

for (int i = 0; i < size; i++) {
    int rand = // get random number between i and size-1
    swap (array[i], array[rand]);
}

It has size! possible sequences of random numbers and size! possible permutations... or am I missing something? Is not every permutation possible? Is some permutation repeated?

gnasher729 · Apr 21, 2013

ArtOfWarfare said:
It has size! possible sequences of random numbers and size! possible permutations... or am I missing something? Is not every permutation possible? Is some permutation repeated?

The original algorithm doesn't produce a random sequence. I actually calculated one of the probabilities involved and it was quite wrong: Assume for example we are using this algorithm to shuffle 52 cards. What is the probability that card #52 ends up in position 51? It should be 1/52 (the card should be in any of 52 positions with equal probability); if not then the shuffling isn't random.

In the first 51 exchanges, card 52 will either stay where it is, or move to a different position. If it moves to a different position, it cannot move back to position 52. In each exchange, the probability that the card in position 52 stays in place is 51/52. So the probability that card 52 is in position 52 after 51 moves is (51/52)^51.

In exchange #51, each card will be moved to position 51 with equal probability 1/52. So the probability that card 52 is in position 51 after 51 moves is 1/52.

So what's the probability that card 52 is in position 51 after 52 moves? The chance is 1/52 if it was in position 52 after 51 moves, and 51/52 if it was in position 51 after 51 moves; in all other cases it is 0. So the total probability is

(1/52)(51/52)^51 + (51/52)(1/52) = (1/52) ((51/52)^51 + 51/52) which is quite a bit more than 1/52.

(For n ≥ 3 the original algorithm used n^n random sequences to produce n factorial permuations, which cannot be random because n^n is not divisible by n factorial when n ≥ 3; a much simpler argument. )

Your algorithm should be correct because every permutation can be produced by exactly one sequence of random numbers: To produce the order x0, x1, x2, ..., xn-1, the first random number must be x0, the first random number must be the index where x1 was stored after the first swap, and so on.

lloyddean · Apr 21, 2013

My experience writing large game projects in assembler on various processors a couple of decades ago and then in C/C++ would have me do the loop as -

Code:

for ( size_t i = size; i--; )
{
    int rand = // get random number between i and size-1
    swap (array[i], array[rand]);
}

ArtOfWarfare · Apr 21, 2013

lloyddean said:
My experience writing large game projects in assembler on various processors a couple of decades ago and then in C/C++ would have me do the loop as -

Code:

for ( size_t i = size; i--; ) { int rand = // get random number between i and size-1 swap (array[i], array[rand]); }

Does that save an instruction on each cycle by using BEZ (or do actual ISAs have support for actually doing BGT and BLT and BEQ? My class on computer architecture focused on MIPS - we glanced at ARM and x86 during a few assignments.)

gnasher729 · Apr 22, 2013

lloyddean said:
My experience writing large game projects in assembler on various processors a couple of decades ago and then in C/C++ would have me do the loop as -

Code:

for ( size_t i = size; i--; ) { int rand = // get random number between i and size-1 swap (array[i], array[rand]); }

You deliberately wrote the loop in a highly unreadable style. Why? You also changed the algorithm which was quite obviously correct, and you will have a much much harder time to prove that it is actually correct.

ArtOfWarfare said:
Does that save an instruction on each cycle by using BEZ (or do actual ISAs have support for actually doing BGT and BLT and BEQ? My class on computer architecture focused on MIPS - we glanced at ARM and x86 during a few assignments.)

1. Your compiler will optimize the C code. If there are different ways to write the code, they will all produce the same executable code, except unusual coding styles might not be handled well by the compiler. Like this one.

2. Calculating execution times on a modern processor is basically impossible. You have to measure it. If you don't measure it, forget about optimising.

3. Whatever savings you get by writing a simple loop in an awful style is dwarved by the execution time of your random number generator and therefore not worth it.

ArtOfWarfare · Apr 22, 2013

gnasher729 said:
You deliberately wrote the loop in a highly unreadable style. Why? You also changed the algorithm which was quite obviously correct, and you will have a much much harder time to prove that it is actually correct.

1. Your compiler will optimize the C code. If there are different ways to write the code, they will all produce the same executable code, except unusual coding styles might not be handled well by the compiler. Like this one.

2. Calculating execution times on a modern processor is basically impossible. You have to measure it. If you don't measure it, forget about optimising.

3. Whatever savings you get by writing a simple loop in an awful style is dwarved by the execution time of your random number generator and therefore not worth it.

I would use Sandy Bridge's built in random number generator, which I understand is quite quick. Also, would the compiler see the code as the same? Would it really change a forward iteration to a backwards iteration? It seems like the fact that they'll yield the samel results is the same is unobvious on account of the fact that you're moving elements about in the array that you're iterating over. It seems to me the assembly code will have one line fewer in the loop, which didn't have many lines, so it'll go somewhere around 20% faster. People like talking about how new macs are 20% faster than old macs, so I don't see why they wouldn't want their code to execute 20% faster, too. I understand its not changing the big O class, but O(n) is hard to beat anyways.

Catfish_Man · Apr 22, 2013

ArtOfWarfare said:
I would use Sandy Bridge's built in random number generator, which I understand is quite quick. Also, would the compiler see the code as the same? Would it really change a forward iteration to a backwards iteration? It seems like the fact that they'll yield the samel results is the same is unobvious on account of the fact that you're moving elements about in the array that you're iterating over. It seems to me the assembly code will have one line fewer in the loop, which didn't have many lines, so it'll go somewhere around 20% faster. People like talking about how new macs are 20% faster than old macs, so I don't see why they wouldn't want their code to execute 20% faster, too. I understand its not changing the big O class, but O(n) is hard to beat anyways.

It won't be 20% faster. Not even close. Not all lines of assembly take the same amount of time (in fact, the same line, in a different loop iteration, may take a wildly different amount of time).

lloyddean · Apr 22, 2013

I'm in no mood to argue in these forums. I'm not going to beat my chest trying to convince you of anything. I'm here to learn things myself, if not about the technologies then about the people and their way of thinking. And mostly as an observer.

But your statement ...

gnasher729 said:
You deliberately wrote the loop in a highly unreadable style.

... is very subjective and in my experiences having done systems and embedded (I'm NOT promoting my experience over yours, but for me) I see C/C++ as high-level assemblers and there are times when I want the algorithms to be what I write and NOT what the optimizer mangles it into.

Not all architectures have quality, good or bug free optimizers that can be counted on.

I will agree with your statement -

Calculating execution times on a modern processor is basically impossible. You have to measure it. If you don't measure it, forget about optimising.

That's what I've done and my experiences on "some" platforms leaves me with different conclusions.

Enjoy

ghellquist · Apr 23, 2013

ArtOfWarfare said:
.... it'll go somewhere around 20% faster.

Hi.
The thing about small-scale optimizations is that it should be done when and where it makes a difference. I do find it very difficult to stay away from that kind of optimizations myself, but maybe we should be aware about the problems it can create.

I also started out as an assembler programmer doing hard real time systems software (process control and such). This is a very special area where it sometimes is necessary to write "ugly" but fast code. The problem is that it always becomes expensive to maintain the special things after a while. It is the same in a high level language like C or C++, special code is expensive to maintain.

It is much better to write well-structured code using standard, well-known coding conventions. In my experience C sort of invites you to, sometimes, write rather convoluted code lines doing a lot of things on each line. My suggestion is to avoid that at the first pass. There are many reasons for this:
- first of all, the code actually has to do what you intend for it to do. Better write readable code than optimized. You will most likely have to read the code several times to try to find the bugs.
- secondly most code has to be maintained, often by a different programmer than the one writing it from the beginning.
- yet another aspect is that is very difficult to really understand what effect a source level optimization actually has. Often on a large processor the loop variables can be stored in registers, and the loop code cached in the fastest L1 cache. The part of the code taking long time then might be accessing memory variables (and these might be cached as well). Many optimizations on "small scale" might actually make the code run slower. A typical example is that starting by accessing an array from high adresses down to low might take a lot longer time than accessing the same array starting from low adresses. The reason is that caches intervene and might prefetch data towards higher adresses only. On a different processor, the effect might be totally different.
- the important point as I see it, is to try to measure where an optimization really does a difference. Typically this is not in a "run-once" internal initialization routine like setting up the deck for a card game. A more typical place might instead be where sprites and things are move around on the screen. Doing graphic calls without thinking can create a rather sluggish user experience.

Well, my ramblings, worth about as much as the paper it is written on (what? no paper! ).

// gunnar

subsonix · Apr 23, 2013

softwareguy256 said:
Lloyd, this is why I don't give out anything too valuable on these internet forums. It's an effort in futility.

In this case it's nonsense. All you would have to do is to profile it or generate assembly instead of speculating.

Short for loop:

Code:

LBB3_5:                                 ##   Parent Loop BB3_3 Depth=1
                                        ## =>  This Inner Loop Header: Depth=2
        movabsq $-1, %rax
        movq    -72(%rbp), %rcx
        movq    %rcx, -104(%rbp)        ## 8-byte Spill
        addq    %rax, %rcx
        movq    %rcx, -72(%rbp)
        movq    -104(%rbp), %rax        ## 8-byte Reload
        cmpq    $0, %rax
        je      LBB3_7
## BB#6:                                ##   in Loop: Header=BB3_5 Depth=2
        movl    $1048575, %eax          ## imm = 0xFFFFF
        movl    %eax, -108(%rbp)        ## 4-byte Spill
        callq   _rand
        movl    $-2147481599, %ecx      ## imm = 0xFFFFFFFF80000801
        movl    %eax, -112(%rbp)        ## 4-byte Spill
        imull   %ecx
        movl    -112(%rbp), %eax        ## 4-byte Reload
        leal    (%rdx,%rax), %ecx
        movl    %ecx, -116(%rbp)        ## 4-byte Spill
        shrl    $31, %ecx
        movl    -116(%rbp), %edx        ## 4-byte Reload
        sarl    $19, %edx
        addl    %edx, %ecx
        imull   $1048575, %ecx, %ecx    ## imm = 0xFFFFF
        subl    %ecx, %eax
        movl    -112(%rbp), %ecx        ## 4-byte Reload
        movl    %eax, -52(%rbp)
        movq    -72(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r8
        imulq   %rsi, %r8
        addq    %rdi, %r8
        movslq  -52(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r9
        imulq   %rsi, %r9
        addq    %rdi, %r9
        movq    %r8, %rdi
        movq    %r9, %rsi
        movl    %ecx, -120(%rbp)        ## 4-byte Spill
        callq   _swap
        jmp     LBB3_5

Idiomatic for loop:

Code:

LBB3_5:                                 ##   Parent Loop BB3_3 Depth=1
                                        ## =>  This Inner Loop Header: Depth=2
        movq    -72(%rbp), %rax
        cmpq    $1048576, %rax          ## imm = 0x100000
        jae     LBB3_8
## BB#6:                                ##   in Loop: Header=BB3_5 Depth=2
        movl    $1048575, %eax          ## imm = 0xFFFFF
        movl    %eax, -100(%rbp)        ## 4-byte Spill
        callq   _rand
        movl    $-2147481599, %ecx      ## imm = 0xFFFFFFFF80000801
        movl    %eax, -104(%rbp)        ## 4-byte Spill
        imull   %ecx
        movl    -104(%rbp), %eax        ## 4-byte Reload
        leal    (%rdx,%rax), %ecx
        movl    %ecx, -108(%rbp)        ## 4-byte Spill
        shrl    $31, %ecx
        movl    -108(%rbp), %edx        ## 4-byte Reload
        sarl    $19, %edx
        addl    %edx, %ecx
        imull   $1048575, %ecx, %ecx    ## imm = 0xFFFFF
        subl    %ecx, %eax
        movl    -104(%rbp), %ecx        ## 4-byte Reload
        movl    %eax, -52(%rbp)
        movq    -72(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r8
        imulq   %rsi, %r8
        addq    %rdi, %r8
        movslq  -52(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r9
        imulq   %rsi, %r9
        addq    %rdi, %r9
        movq    %r8, %rdi
        movq    %r9, %rsi
        movl    %ecx, -112(%rbp)        ## 4-byte Spill
        callq   _swap
## BB#7:                                ##   in Loop: Header=BB3_5 Depth=2
        movabsq $1, %rax
        movq    -72(%rbp), %rcx
        addq    %rax, %rcx
        movq    %rcx, -72(%rbp)
        jmp     LBB3_5

I tested both loop types on 100.000.000 iterations of the loop, it made no difference what so ever with a result around 48 - 50ms.

softwareguy256 · Apr 23, 2013

I have no idea what you are talking about or trying to prove here. I have a feeling though you are on the wrong track. Better listen to Lloyd he knows what he is talking about.

subsonix said:

In this case it's nonsense. All you would have to do is to profile it or generate assembly instead of speculating.

Short for loop:

Code:

LBB3_5:                                 ##   Parent Loop BB3_3 Depth=1
                                        ## =>  This Inner Loop Header: Depth=2
        movabsq $-1, %rax
        movq    -72(%rbp), %rcx
        movq    %rcx, -104(%rbp)        ## 8-byte Spill
        addq    %rax, %rcx
        movq    %rcx, -72(%rbp)
        movq    -104(%rbp), %rax        ## 8-byte Reload
        cmpq    $0, %rax
        je      LBB3_7
## BB#6:                                ##   in Loop: Header=BB3_5 Depth=2
        movl    $1048575, %eax          ## imm = 0xFFFFF
        movl    %eax, -108(%rbp)        ## 4-byte Spill
        callq   _rand
        movl    $-2147481599, %ecx      ## imm = 0xFFFFFFFF80000801
        movl    %eax, -112(%rbp)        ## 4-byte Spill
        imull   %ecx
        movl    -112(%rbp), %eax        ## 4-byte Reload
        leal    (%rdx,%rax), %ecx
        movl    %ecx, -116(%rbp)        ## 4-byte Spill
        shrl    $31, %ecx
        movl    -116(%rbp), %edx        ## 4-byte Reload
        sarl    $19, %edx
        addl    %edx, %ecx
        imull   $1048575, %ecx, %ecx    ## imm = 0xFFFFF
        subl    %ecx, %eax
        movl    -112(%rbp), %ecx        ## 4-byte Reload
        movl    %eax, -52(%rbp)
        movq    -72(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r8
        imulq   %rsi, %r8
        addq    %rdi, %r8
        movslq  -52(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r9
        imulq   %rsi, %r9
        addq    %rdi, %r9
        movq    %r8, %rdi
        movq    %r9, %rsi
        movl    %ecx, -120(%rbp)        ## 4-byte Spill
        callq   _swap
        jmp     LBB3_5

Idiomatic for loop:

Code:

LBB3_5:                                 ##   Parent Loop BB3_3 Depth=1
                                        ## =>  This Inner Loop Header: Depth=2
        movq    -72(%rbp), %rax
        cmpq    $1048576, %rax          ## imm = 0x100000
        jae     LBB3_8
## BB#6:                                ##   in Loop: Header=BB3_5 Depth=2
        movl    $1048575, %eax          ## imm = 0xFFFFF
        movl    %eax, -100(%rbp)        ## 4-byte Spill
        callq   _rand
        movl    $-2147481599, %ecx      ## imm = 0xFFFFFFFF80000801
        movl    %eax, -104(%rbp)        ## 4-byte Spill
        imull   %ecx
        movl    -104(%rbp), %eax        ## 4-byte Reload
        leal    (%rdx,%rax), %ecx
        movl    %ecx, -108(%rbp)        ## 4-byte Spill
        shrl    $31, %ecx
        movl    -108(%rbp), %edx        ## 4-byte Reload
        sarl    $19, %edx
        addl    %edx, %ecx
        imull   $1048575, %ecx, %ecx    ## imm = 0xFFFFF
        subl    %ecx, %eax
        movl    -104(%rbp), %ecx        ## 4-byte Reload
        movl    %eax, -52(%rbp)
        movq    -72(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r8
        imulq   %rsi, %r8
        addq    %rdi, %r8
        movslq  -52(%rbp), %rsi
        movq    -48(%rbp), %rdi
        movabsq $4, %r9
        imulq   %rsi, %r9
        addq    %rdi, %r9
        movq    %r8, %rdi
        movq    %r9, %rsi
        movl    %ecx, -112(%rbp)        ## 4-byte Spill
        callq   _swap
## BB#7:                                ##   in Loop: Header=BB3_5 Depth=2
        movabsq $1, %rax
        movq    -72(%rbp), %rcx
        addq    %rax, %rcx
        movq    %rcx, -72(%rbp)
        jmp     LBB3_5

I tested both loop types on 100.000.000 iterations of the loop, it made no difference what so ever with a result around 48 - 50ms.

Catfish_Man · Apr 23, 2013

softwareguy256 said:
I have no idea what you are talking about or trying to prove here. I have a feeling though you are on the wrong track. Better listen to Lloyd he knows what he is talking about.

That's one of the most depressing replies I've seen on this forum.

"I have no idea what you said, but you're wrong" is so aggressively ignorant that I actually hope you're just trying to provoke him, and not actually serious.

subsonix · Apr 24, 2013

softwareguy256 said:
I have no idea what you are talking about or trying to prove here.

We are discussing the impact of: 'for(int i = SIZE; i--; )' over: 'for(int i = 0; i < SIZE; i++)'

What I suggested was to output the assembly produced by the compiler for each loop type, and I also provided it in the post. The first example is the shorter for loop with both the evaluation and decrement step in the middle part as Lloyd suggested. This loop produces assembly with all branching code in the beginning, up until the ##BB#6 comment, (not including the jmp instruction at the very end). The idiomatic for loop does the incrementation at the end of the loop after ##BB#7 and the branching in the beginning before ##BB#6, much like the loop in C. However, the idiomatic for loop appears to actually use 1 less instruction for this.

When I tested the loop types I used an array of size 1.000.000, and shuffled it 100 times which gives 100.000.000 iterations in total. I added a timestamp with mach_absolute_time() before and after the loop and converted it to nano seconds with AbsoluteToNanoseconds(), I then divided the result with 100 to get an average time spent shuffling an array of 1.000.000. The reason for this last step is to suppress the impact the OS and scheduler may have on our process. Skipping this last step would however show a difference (if any) more clearly afaik.

Edit:

Running the shorter for loop test 5 times:
Elapsed time: 43960619ns
Elapsed time: 44968580ns
Elapsed time: 44714770ns
Elapsed time: 45216508ns
Elapsed time: 45167888ns

Running the idiomatic for loop test 5 times:
Elapsed time: 43724876ns
Elapsed time: 43464096ns
Elapsed time: 43796514ns
Elapsed time: 43297202ns
Elapsed time: 45620971ns

gnasher729 · Apr 24, 2013

softwareguy256 said:
I have no idea what you are talking about or trying to prove here. I have a feeling though you are on the wrong track. Better listen to Lloyd he knows what he is talking about.

1. He measured. That's the most important thing of all. If you don't measure, you don't know whether the code you want to make faster is actually worth spending your time on, and you don't know whether the time you spent to make it faster is actually making it faster, so it's just a waste of time. So A says "I write the code this way because it runs faster". B says "I checked, it doesn't".

2. He looked at the assembler code. _If_ you go down to that level, then writing unreadable code in the hope that it runs faster, without actually checking, is again just a waste of time. He looked at the code, and there was no benefit from writing a loop that was harder to understand. Actually, that loop performed two branches, which are most likely to hurt performance. So A says "I write the code this way because the assembler code is better". B says "I checked, it isn't".

Micro-optimisations on that level rarely ever pay. Here's what might pay: Use vectors.

Code:

typedef unsigned char vec_uint8 __attribute__((__vector_size__(16)));
typedef int vec_int32 __attribute__((__vector_size__(16)));
typedef float vec_float __attribute__((__vector_size__(16)));

This declares types that are vectors of 16 bytes, 4 ints, or 4 floats. Both MacOS X and iOS compilers fully support these types, so you can do up to 16 operations in a single instruction. That can give you a huge factor in speed.

Use multiple threads with GCD. No problem running 8 threads on a 15" MBP, or two threads on an iPhone. 2 to 8 times less time without problems.

Make sure that you know about your caches. When you have lots of data, do all the work on a subset that fits into cache, then another subset that fits into cache, and so on. This can make a _huge_ difference.

All these things operate on a much higher level, and that's where the money is.

ArtOfWarfare · Apr 24, 2013

gnasher729 said:
Here's what might pay: Use vectors.

Code:

typedef unsigned char vec_uint8 __attribute__((__vector_size__(16))); typedef int vec_int32 __attribute__((__vector_size__(16))); typedef float vec_float __attribute__((__vector_size__(16)));

This declares types that are vectors of 16 bytes, 4 ints, or 4 floats. Both MacOS X and iOS compilers fully support these types, so you can do up to 16 operations in a single instruction. That can give you a huge factor in speed.

How would you utilize these in an array shuffling algorithm? Or are we past the point of actually talking about shuffling arrays at all?

It's interesting that I've never heard of these types before. I can see how they'd be very helpful with performing vector operations... but it seems like I rarely need to do anything with those without an API separating me from the vector.

macOS C++ XCode v. 4.6 Shuffling an Array

macrumors newbie

macrumors 68040

Contributor

Moderator

macrumors newbie

macrumors 65816

Moderator

Suspended

macrumors G3

macrumors newbie

macrumors G3

Suspended

macrumors 65816

macrumors G3

Suspended

macrumors G3

macrumors 68030

macrumors 65816

macrumors regular

macrumors 68040

macrumors regular

macrumors 68030

macrumors 68040

Suspended

macrumors G3

Our Staff