Register FAQ / Rules Forum Spy Search Today's Posts Mark Forums Read
Go Back   MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Reply
 
Thread Tools Search this Thread Display Modes
Old Feb 5, 2013, 05:14 PM   #1
farmerdoug
macrumors 6502a
 
Join Date: Sep 2008
sorting question.

I need the top ten values in a set of results. I could either store all the results as they are calculated and then sort; or I could test each value as it is calculated. If is larger than the largest ten up to that point, I could discard the smallest.
Looking for suggestions on the best way.
Thanks.
farmerdoug is offline   0 Reply With Quote
Old Feb 5, 2013, 05:31 PM   #2
SnowLeopard2008
macrumors 603
 
SnowLeopard2008's Avatar
 
Join Date: Jul 2008
Location: Silicon Valley
Send a message via AIM to SnowLeopard2008
Quote:
Originally Posted by farmerdoug View Post
I need the top ten values in a set of results. I could either store all the results as they are calculated and then sort; or I could test each value as it is calculated. If is larger than the largest ten up to that point, I could discard the smallest.
Looking for suggestions on the best way.
Thanks.
Pick a sorting method. Selection, bubble, etc. I would process the top ten after calculation. Simplifies things a bit.
__________________
YouTube | @beautifulcode
Mac Pro | Thunderbolt Display | iPhone 5 | iPad mini | Apple TV | AirPort Extreme | iPad signed by Steve Wozniak
SnowLeopard2008 is offline   0 Reply With Quote
Old Feb 5, 2013, 05:34 PM   #3
Giuly
macrumors 68040
 
Giuly's Avatar
 
Join Date: Oct 2009
Location: That depends whether you ask for timezone, state of mind or GPS coordinates.
Depending on how large your dataset is and which language you're using, you might as well make use of the higher-level sort functions of arrays, dictionaries, lists, triples, what have you and then cut the first 10 elements.

Last edited by Giuly; Feb 5, 2013 at 06:32 PM.
Giuly is offline   0 Reply With Quote
Old Feb 5, 2013, 09:48 PM   #4
dmi
macrumors member
 
Join Date: Dec 2010
I'd suggest scanning through the results while maintaining a ten element heap structure with the top values seen so far.

You could also do a recursive K-median selection algorithm analogous to quicksort, but although the expected time is O(n), unfortunate choices could make it O(n^2)
A median of medians algorithm can guarantee O(n) performance, but the complication may not be worth it compared to maintaining a heap of just 10 elements.

Last edited by dmi; Feb 5, 2013 at 10:57 PM.
dmi is offline   0 Reply With Quote
Old Feb 6, 2013, 06:14 AM   #5
farmerdoug
Thread Starter
macrumors 6502a
 
Join Date: Sep 2008
At the moment I have been doing what dmi has suggest, or close. I scan the top ten every time I have a new value; placing in the right position if is large enough to be included; and shifting the rest down. My guess is that the number of greater values is far less the very large number of total values.

The language is c.
farmerdoug is offline   0 Reply With Quote
Old Feb 6, 2013, 07:42 AM   #6
dmi
macrumors member
 
Join Date: Dec 2010
With a min heap structure, you can make the scan of the top ten with O( log(10) ) comparisons and shifts, instead of O(10) comparisons and shifts, but 10 may be small enough that it the difference is not worth worrying about.
dmi is offline   0 Reply With Quote
Old Feb 6, 2013, 08:13 AM   #7
Efrem
macrumors member
 
Join Date: Jul 2009
Quote:
Originally Posted by farmerdoug View Post
At the moment I have been doing what dmi has suggest, or close. I scan the top ten every time I have a new value; placing in the right position if is large enough to be included; and shifting the rest down. My guess is that the number of greater values is far less the very large number of total values.

The language is c.
If, as you seem to have said, you keep the top ten in order, then you don't have to scan them. Just compare the new value with the bottom one in the current Top 10 list.
Efrem is offline   0 Reply With Quote
Old Feb 6, 2013, 01:58 PM   #8
dmi
macrumors member
 
Join Date: Dec 2010
Quote:
Originally Posted by Efrem View Post
Just compare the new value with the bottom one in the current Top 10 list.
And if it is greater than the bottom one, it would have to be inserted in order.
This is easy with a heap structure, but a top 10 list is small enough that shifting the everything down instead may not be too bad.
dmi is offline   0 Reply With Quote
Old Feb 6, 2013, 02:02 PM   #9
ytk
macrumors regular
 
Join Date: Jul 2010
The simplest way I can think of to do this is to maintain an array of 11 items, and initially populate it with MINVAL for your data type. As you read each item, place it into a temporary variable. Iterate backwards over the array, starting at position 9 (the tenth element). If the value at that position is less than the read value, move it to the next position up and continue iterating. If it is equal or greater, place the read value at the next position up, and repeat. If you reach the bottom of the list, place the read value in position 0.

This approach is nice because it doesn't require any temporary variables to actually perform the sort other than the one holding the read value and an index variable, and it's probably pretty fast because it only needs to read and write elements that potentially need to be shifted.

If you wanted to improve performance a tiny bit (and we're getting really obsessive here ), you could eliminate the last conditional (testing whether you've reached the bottom of the list). One way to do this would be with a goto statement (instead of a break) after you've placed the value in its place. The other would be to make your array size 12 items, and populating the the first item with MAXVAL, and all the rest with MINVAL. Then when you are done, you just take [1..10] as your top ten list.
ytk is offline   0 Reply With Quote
Old Feb 6, 2013, 02:31 PM   #10
farmerdoug
Thread Starter
macrumors 6502a
 
Join Date: Sep 2008
You guys have done yeoman service here. I thought about just looking at the last value or something similar and for some (obviously incorrect) reason rejected it. However, I should do it. Regarding some of the other suggestions, I need to think about them. I didn't grasp the ideas at first reading but will check in to them.

Thanks.
farmerdoug is offline   0 Reply With Quote
Old Feb 6, 2013, 02:34 PM   #11
chown33
macrumors 603
 
Join Date: Aug 2009
Quote:
Originally Posted by dmi View Post
And if it is greater than the bottom one, it would have to be inserted in order.
His point is that it's not necessary to test the new candidate against every member of the top-ten list.

For a fast-rejection, it's only necessary to compare the new candidate with the smallest item already in the top-ten list. If the candidate value is less, then it's rejected because it will certainly be less than every other value already in the top-ten list. We can deduce this because the initial comparison is performed on the smallest value already in the list. Reject/accept is thus determined with one comparison for each new candidate value

This is in direct contrast to the OP's algorithm: "I scan the top ten every time I have a new value". I.e. ten comparisons for each new candidate value.

After reject/accept has been decided, the new candidate replaces the smallest value already in the list (it will certainly be the item removed, since it's smallest). The list is then resorted. I'd probably start with insertion-sort because the list is so small, and only worry about a faster sort if performance was measurably bad. But after eliminating 9X pointless comparisons, my first guess is the gain from that would be worth more than trying to find a faster sort than insertion-sort.

Last edited by chown33; Feb 8, 2013 at 11:20 AM. Reason: lowest -> smallest
chown33 is offline   0 Reply With Quote
Old Feb 8, 2013, 01:14 AM   #12
dmi
macrumors member
 
Join Date: Dec 2010
Quote:
Originally Posted by chown33 View Post
This is in direct contrast to the OP's algorithm: "I scan the top ten every time I have a new value".
When you scan an ordered top ten list,
(and being ordered seemed to be implied by the statement about placing it in the right position and shifting everything down)
you can stop as soon as you know that everything else in the list is greater than the value you are scanning for.
If the values come in a random order, the expected number of comparisons needed to check n new values would be about n+10*log(n).
The worst case would be 10*n with a linear scan, or log(10)*n with a heap or binary search.

If you keep the list in a heap order rather than a linear order, then instead of 10*log(10) operations to re-sort the list when you replace an element, you 'd only need log(10) operations to restore the heap order.
But, again, 10 may be small enough that even 10^2 operations to re-sort may not be an excessive burden.
dmi is offline   0 Reply With Quote
Old Feb 8, 2013, 05:45 AM   #13
farmerdoug
Thread Starter
macrumors 6502a
 
Join Date: Sep 2008
I am only comparing to the last item on list; then if I have a new item to insert I figure out where it goes; move everything down; and insert it. I also checked my numbers, I'm saving the top ten out of millions. Saving then sort would definitely be a bad idea. What would help now is if you could tell me how to move a structure from one location to another without doing it element by element.

Thanks.
farmerdoug is offline   0 Reply With Quote
Old Feb 8, 2013, 07:57 AM   #14
gnasher729
macrumors G5
 
gnasher729's Avatar
 
Join Date: Nov 2005
Quote:
Originally Posted by farmerdoug View Post
I am only comparing to the last item on list; then if I have a new item to insert I figure out where it goes; move everything down; and insert it. I also checked my numbers, I'm saving the top ten out of millions. Saving then sort would definitely be a bad idea. What would help now is if you could tell me how to move a structure from one location to another without doing it element by element.

Thanks.
For example:

Code:
memmove (&array [3], &array [2], 7 * sizeof (array [2]));
The first parameter is the destination, the second parameter is the source, and the third parameter is the number of bytes. This particular call will move seven array elements starting with element #2 to the location starting with element #3, and handles overlapping source and destination correctly. In other words, it moves #2 to #8 into the places #3 to #9.

On MacOS X, memmove, memcpy and memset use hand-optimised assembler code that is actually optimised for the particular machine it is running on. There is no way to do the same thing faster. (memcpy cannot be used here because source and destination are overlapping).
gnasher729 is offline   0 Reply With Quote
Old Feb 8, 2013, 10:02 AM   #15
chown33
macrumors 603
 
Join Date: Aug 2009
Quote:
Originally Posted by farmerdoug View Post
I am only comparing to the last item on list; ...
If "last" means "the largest of the top ten", then that's inefficient. If "last" means "the smallest of the top ten", then that's much better. If the reason isn't clear, reread post #11.

A sorted array with the smallest item first and the largest last is called "ascending" order. An array with the largest first and the smallest last is called "descending" order.


Quote:
What would help now is if you could tell me how to move a structure from one location to another without doing it element by element.
To move a structure (i.e. a single struct), simply assign it:
Code:
struct example {
  int foo;
  char bar[10];
};

struct example arry[10];
arry[3] = arry[8];  // copies 8 to 3

struct example * a;
a = &arry[4];
*a = arry[9];  // copies 9 to 4
To move one or more structs in a single move, use memmove or memcpy.

The distinction between the two things -- a single structure vs. one or more -- is that one is a single item, and the other is an array of items.

For the single struct case, it doesn't matter that a single struct contains an array (of char). The struct itself is a single item.

Last edited by chown33; Feb 8, 2013 at 11:18 AM. Reason: highest&lowest -> largest&smallest
chown33 is offline   0 Reply With Quote
Old Feb 9, 2013, 05:49 AM   #16
dmi
macrumors member
 
Join Date: Dec 2010
In C++ you can maintain a heap with
Code:
std::pop_heap
and
Code:
std::push_heap
you can either reverse the Compare operator or negate the values to make it a min heap instead of a max heap

In C, you could build your own like
Code:
#define heap_size 10
int heap[heap_size];

void down_heap(int p,int root){
  int child;
  while( (child=root*2+1) < heap_size ){
    if( child+1 < heap_size && heap[child+1] < heap[child] ){
      ++child;
    }
    if( heap[child] >= p ){ return; }
    heap[root]=heap[child];
    heap[child]=p;
    root=child;
  }
}
and add maintain it like
Code:
i=heap_size;
while( scanf("%d",&p ) ){
     if( i==0?heap[0]<p:i-- ){ heap[i]=p; down_heap(p,i); }
}
dmi is offline   0 Reply With Quote

Reply
MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Similar Threads
thread Thread Starter Forum Replies Last Post
Sorting Tech198 Apple TV and Home Theater 3 Dec 10, 2013 12:16 PM
iMovie Sorting downingp Mac Applications and Mac App Store 0 Nov 24, 2013 10:30 PM
Lack of sorting? JakobAloudMadge Mac Basics and Help 1 Aug 26, 2013 06:19 PM
Double question about storage/sorting? JakobAloudMadge MacBook Pro 12 Jul 25, 2013 01:22 PM
Resolved: Array Sorting Question larswik iPhone/iPad Programming 2 Sep 3, 2012 08:56 PM

Forum Jump

All times are GMT -5. The time now is 02:58 PM.

Mac Rumors | Mac | iPhone | iPhone Game Reviews | iPhone Apps

Mobile Version | Fixed | Fluid | Fluid HD
Copyright 2002-2013, MacRumors.com, LLC