Register FAQ / Rules Forum Spy Search Today's Posts Mark Forums Read
 MacRumors Forums sorting question.

 Feb 5, 2013, 06:14 PM #1 farmerdoug macrumors 6502a   Join Date: Sep 2008 sorting question. I need the top ten values in a set of results. I could either store all the results as they are calculated and then sort; or I could test each value as it is calculated. If is larger than the largest ten up to that point, I could discard the smallest. Looking for suggestions on the best way. Thanks. 0
Feb 5, 2013, 06:31 PM   #2
SnowLeopard2008
macrumors 603

Join Date: Jul 2008
Location: California
Quote:
 Originally Posted by farmerdoug I need the top ten values in a set of results. I could either store all the results as they are calculated and then sort; or I could test each value as it is calculated. If is larger than the largest ten up to that point, I could discard the smallest. Looking for suggestions on the best way. Thanks.
Pick a sorting method. Selection, bubble, etc. I would process the top ten after calculation. Simplifies things a bit.
__________________
MacBook Air | iPhone 5 | iPad mini | Apple TV | AirPort Extreme 802.11ac | iPad signed by Steve Wozniak
0
 Feb 5, 2013, 06:34 PM #3 Giuly macrumors 68040     Join Date: Oct 2009 Location: That depends whether you ask for timezone, state of mind or GPS coordinates. Depending on how large your dataset is and which language you're using, you might as well make use of the higher-level sort functions of arrays, dictionaries, lists, triples, what have you and then cut the first 10 elements. Last edited by Giuly; Feb 5, 2013 at 07:32 PM. 0
 Feb 5, 2013, 10:48 PM #4 dmi macrumors member   Join Date: Dec 2010 I'd suggest scanning through the results while maintaining a ten element heap structure with the top values seen so far. You could also do a recursive K-median selection algorithm analogous to quicksort, but although the expected time is O(n), unfortunate choices could make it O(n^2) A median of medians algorithm can guarantee O(n) performance, but the complication may not be worth it compared to maintaining a heap of just 10 elements. Last edited by dmi; Feb 5, 2013 at 11:57 PM. 0
 Feb 6, 2013, 07:14 AM #5 farmerdoug Thread Starter macrumors 6502a   Join Date: Sep 2008 At the moment I have been doing what dmi has suggest, or close. I scan the top ten every time I have a new value; placing in the right position if is large enough to be included; and shifting the rest down. My guess is that the number of greater values is far less the very large number of total values. The language is c. 0
 Feb 6, 2013, 08:42 AM #6 dmi macrumors member   Join Date: Dec 2010 With a min heap structure, you can make the scan of the top ten with O( log(10) ) comparisons and shifts, instead of O(10) comparisons and shifts, but 10 may be small enough that it the difference is not worth worrying about. 0
Feb 6, 2013, 09:13 AM   #7
Efrem
macrumors member

Join Date: Jul 2009
Quote:
 Originally Posted by farmerdoug At the moment I have been doing what dmi has suggest, or close. I scan the top ten every time I have a new value; placing in the right position if is large enough to be included; and shifting the rest down. My guess is that the number of greater values is far less the very large number of total values. The language is c.
If, as you seem to have said, you keep the top ten in order, then you don't have to scan them. Just compare the new value with the bottom one in the current Top 10 list.
0
Feb 6, 2013, 02:58 PM   #8
dmi
macrumors member

Join Date: Dec 2010
Quote:
 Originally Posted by Efrem Just compare the new value with the bottom one in the current Top 10 list.
And if it is greater than the bottom one, it would have to be inserted in order.
This is easy with a heap structure, but a top 10 list is small enough that shifting the everything down instead may not be too bad.
0
 Feb 6, 2013, 03:02 PM #9 ytk macrumors regular   Join Date: Jul 2010 The simplest way I can think of to do this is to maintain an array of 11 items, and initially populate it with MINVAL for your data type. As you read each item, place it into a temporary variable. Iterate backwards over the array, starting at position 9 (the tenth element). If the value at that position is less than the read value, move it to the next position up and continue iterating. If it is equal or greater, place the read value at the next position up, and repeat. If you reach the bottom of the list, place the read value in position 0. This approach is nice because it doesn't require any temporary variables to actually perform the sort other than the one holding the read value and an index variable, and it's probably pretty fast because it only needs to read and write elements that potentially need to be shifted. If you wanted to improve performance a tiny bit (and we're getting really obsessive here ), you could eliminate the last conditional (testing whether you've reached the bottom of the list). One way to do this would be with a goto statement (instead of a break) after you've placed the value in its place. The other would be to make your array size 12 items, and populating the the first item with MAXVAL, and all the rest with MINVAL. Then when you are done, you just take [1..10] as your top ten list. 0
 Feb 6, 2013, 03:31 PM #10 farmerdoug Thread Starter macrumors 6502a   Join Date: Sep 2008 You guys have done yeoman service here. I thought about just looking at the last value or something similar and for some (obviously incorrect) reason rejected it. However, I should do it. Regarding some of the other suggestions, I need to think about them. I didn't grasp the ideas at first reading but will check in to them. Thanks. 0
Feb 6, 2013, 03:34 PM   #11
chown33
macrumors 603

Join Date: Aug 2009
Quote:
 Originally Posted by dmi And if it is greater than the bottom one, it would have to be inserted in order.
His point is that it's not necessary to test the new candidate against every member of the top-ten list.

For a fast-rejection, it's only necessary to compare the new candidate with the smallest item already in the top-ten list. If the candidate value is less, then it's rejected because it will certainly be less than every other value already in the top-ten list. We can deduce this because the initial comparison is performed on the smallest value already in the list. Reject/accept is thus determined with one comparison for each new candidate value

This is in direct contrast to the OP's algorithm: "I scan the top ten every time I have a new value". I.e. ten comparisons for each new candidate value.

After reject/accept has been decided, the new candidate replaces the smallest value already in the list (it will certainly be the item removed, since it's smallest). The list is then resorted. I'd probably start with insertion-sort because the list is so small, and only worry about a faster sort if performance was measurably bad. But after eliminating 9X pointless comparisons, my first guess is the gain from that would be worth more than trying to find a faster sort than insertion-sort.

Last edited by chown33; Feb 8, 2013 at 12:20 PM. Reason: lowest -> smallest
0
Feb 8, 2013, 02:14 AM   #12
dmi
macrumors member

Join Date: Dec 2010
Quote:
 Originally Posted by chown33 This is in direct contrast to the OP's algorithm: "I scan the top ten every time I have a new value".
When you scan an ordered top ten list,
(and being ordered seemed to be implied by the statement about placing it in the right position and shifting everything down)
you can stop as soon as you know that everything else in the list is greater than the value you are scanning for.
If the values come in a random order, the expected number of comparisons needed to check n new values would be about n+10*log(n).
The worst case would be 10*n with a linear scan, or log(10)*n with a heap or binary search.

If you keep the list in a heap order rather than a linear order, then instead of 10*log(10) operations to re-sort the list when you replace an element, you 'd only need log(10) operations to restore the heap order.
But, again, 10 may be small enough that even 10^2 operations to re-sort may not be an excessive burden.
0
 Feb 8, 2013, 06:45 AM #13 farmerdoug Thread Starter macrumors 6502a   Join Date: Sep 2008 I am only comparing to the last item on list; then if I have a new item to insert I figure out where it goes; move everything down; and insert it. I also checked my numbers, I'm saving the top ten out of millions. Saving then sort would definitely be a bad idea. What would help now is if you could tell me how to move a structure from one location to another without doing it element by element. Thanks. 0
Feb 8, 2013, 08:57 AM   #14
gnasher729
macrumors G5

Join Date: Nov 2005
Quote:
 Originally Posted by farmerdoug I am only comparing to the last item on list; then if I have a new item to insert I figure out where it goes; move everything down; and insert it. I also checked my numbers, I'm saving the top ten out of millions. Saving then sort would definitely be a bad idea. What would help now is if you could tell me how to move a structure from one location to another without doing it element by element. Thanks.
For example:

Code:
`memmove (&array [3], &array [2], 7 * sizeof (array [2]));`
The first parameter is the destination, the second parameter is the source, and the third parameter is the number of bytes. This particular call will move seven array elements starting with element #2 to the location starting with element #3, and handles overlapping source and destination correctly. In other words, it moves #2 to #8 into the places #3 to #9.

On MacOS X, memmove, memcpy and memset use hand-optimised assembler code that is actually optimised for the particular machine it is running on. There is no way to do the same thing faster. (memcpy cannot be used here because source and destination are overlapping).
0
Feb 8, 2013, 11:02 AM   #15
chown33
macrumors 603

Join Date: Aug 2009
Quote:
 Originally Posted by farmerdoug I am only comparing to the last item on list; ...
If "last" means "the largest of the top ten", then that's inefficient. If "last" means "the smallest of the top ten", then that's much better. If the reason isn't clear, reread post #11.

A sorted array with the smallest item first and the largest last is called "ascending" order. An array with the largest first and the smallest last is called "descending" order.

Quote:
 What would help now is if you could tell me how to move a structure from one location to another without doing it element by element.
To move a structure (i.e. a single struct), simply assign it:
Code:
```struct example {
int foo;
char bar[10];
};

struct example arry[10];
arry[3] = arry[8];  // copies 8 to 3

struct example * a;
a = &arry[4];
*a = arry[9];  // copies 9 to 4```
To move one or more structs in a single move, use memmove or memcpy.

The distinction between the two things -- a single structure vs. one or more -- is that one is a single item, and the other is an array of items.

For the single struct case, it doesn't matter that a single struct contains an array (of char). The struct itself is a single item.

Last edited by chown33; Feb 8, 2013 at 12:18 PM. Reason: highest&lowest -> largest&smallest
0
 Feb 9, 2013, 06:49 AM #16 dmi macrumors member   Join Date: Dec 2010 In C++ you can maintain a heap with Code: `std::pop_heap` and Code: `std::push_heap` you can either reverse the Compare operator or negate the values to make it a min heap instead of a max heap In C, you could build your own like Code: ```#define heap_size 10 int heap[heap_size]; void down_heap(int p,int root){ int child; while( (child=root*2+1) < heap_size ){ if( child+1 < heap_size && heap[child+1] < heap[child] ){ ++child; } if( heap[child] >= p ){ return; } heap[root]=heap[child]; heap[child]=p; root=child; } }``` and add maintain it like Code: ```i=heap_size; while( scanf("%d",&p ) ){ if( i==0?heap[0]

 MacRumors Forums