SQL question

Discussion in 'Mac Programming' started by iShater, May 13, 2009.

  1. iShater macrumors 604

    iShater

    Joined:
    Aug 13, 2002
    Location:
    Chicagoland
    #1
    Hopefully somebody has some thoughts or can point me to a location with information.

    I have a fairly complex query that ends up with results that are considered duplicate when 4 columns are the same. Each duplicate has a unique PK ID, and I need to get the greatest one.

    I need that result set to be compared against itself to keep only one of the dupe rows with the max PK_ID.

    Any thoughts?
     
  2. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #2
    select max(id),uniq1,uniq2,uniq3,uniq4 from (...) group by uniq1,uniq2,uniq3,uniq4;
    where ... is your existing query. This should group the rows by the four columns that must be unique, then you can perform an aggregate function over each group.

    -Lee
     
  3. iShater thread starter macrumors 604

    iShater

    Joined:
    Aug 13, 2002
    Location:
    Chicagoland
    #3
    Hi Lee,

    Thanks for your reply. You just confirmed the route I ended up using. We had a crazy where clause instead and the performance was pretty bad. I hoping this will work better, we are testing right now.

    P.S. I knew you would be responding. :p
     
  4. lee1210 macrumors 68040

    lee1210

    Joined:
    Jan 10, 2005
    Location:
    Dallas, TX
    #4
    If you end up with some sort of self-join in a situation like this, you're probably not going to end up with an ideal solution even if it works. I have had to write things with self-joins, but if the table has more than a few hundred/thousand rows the cartesian product of the self join is going to be huge, and doing any sort of aggregates against it will be very expensive. If you've already filtered down the data to what you need, there's no shame in doing a subquery instead of finding the end-all, be-all join and where setup. At least not in my book.

    -Lee
     
  5. AlmostThere macrumors 6502a

    #5
    Depending on your database and hardware (i.e. something that supports, IIRC, SQL-99 and multi-processor hardware) you may find you get better results using the RANK function partitioned over your uniq fields, selecting only rows where the rank is = 1.

    e.g.
    Code:
    select * from
    (
      select pk, uniq1, uniq2, uniq3, uniq4, blah, bleh, meh, 
      rank() over (partition by uniq1, uniq2, uniq3, uniq4 order by pk desc)  as rk
      from ...
    ) as subq
    where rk = 1
    
    Without knowing the table structure, the other columns you want and their types, DBMS and hardware it is a little hard to be more precise.

    [edit]
    Here is a link for an SQL Server server centric discussion and illustration
    http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx
     
  6. iShater thread starter macrumors 604

    iShater

    Joined:
    Aug 13, 2002
    Location:
    Chicagoland
    #6
    This is an Oracle 10g database, no information available to me about the HW. There are enough joins early on in the query that my head is spinning. :eek:
     
  7. AlmostThere macrumors 6502a

    #7
    It sounds like something to try then.

    I don't have Oracle 10 but this article sounds like it should support the necessary functionality http://www.oracle.com/technology/oramag/code/tips2005/062005.html .

    As the database knows what you are really asking, it will be able to optimise the execution path and where possible distribute the aggregation for each partition. Also indexes and running table statistics can lead to order of magnitude improvements. And use stored procedures.

    For optimising your SQL, assuming you are on a Mac client, you could trial http://www.aquafold.com/ which has a straight forward query analyser (it's a good general tool for multiple dbms) , or use the Oracle optimisation tools. I don't know what you are using...
     

Share This Page