macOS SQL question

iShater · May 13, 2009

Hopefully somebody has some thoughts or can point me to a location with information.

I have a fairly complex query that ends up with results that are considered duplicate when 4 columns are the same. Each duplicate has a unique PK ID, and I need to get the greatest one.

I need that result set to be compared against itself to keep only one of the dupe rows with the max PK_ID.

Any thoughts?

lee1210 · May 13, 2009

select max(id),uniq1,uniq2,uniq3,uniq4 from (...) group by uniq1,uniq2,uniq3,uniq4;
where ... is your existing query. This should group the rows by the four columns that must be unique, then you can perform an aggregate function over each group.

-Lee

iShater · May 13, 2009

lee1210 said:
select max(id),uniq1,uniq2,uniq3,uniq4 from (...) group by uniq1,uniq2,uniq3,uniq4;
where ... is your existing query. This should group the rows by the four columns that must be unique, then you can perform an aggregate function over each group.

-Lee

Hi Lee,

Thanks for your reply. You just confirmed the route I ended up using. We had a crazy where clause instead and the performance was pretty bad. I hoping this will work better, we are testing right now.

P.S. I knew you would be responding. 😛

lee1210 · May 13, 2009

iShater said:
Hi Lee,

Thanks for your reply. You just confirmed the route I ended up using. We had a crazy where clause instead and the performance was pretty bad. I hoping this will work better, we are testing right now.

P.S. I knew you would be responding. 😛

If you end up with some sort of self-join in a situation like this, you're probably not going to end up with an ideal solution even if it works. I have had to write things with self-joins, but if the table has more than a few hundred/thousand rows the cartesian product of the self join is going to be huge, and doing any sort of aggregates against it will be very expensive. If you've already filtered down the data to what you need, there's no shame in doing a subquery instead of finding the end-all, be-all join and where setup. At least not in my book.

-Lee

AlmostThere · May 13, 2009

Depending on your database and hardware (i.e. something that supports, IIRC, SQL-99 and multi-processor hardware) you may find you get better results using the RANK function partitioned over your uniq fields, selecting only rows where the rank is = 1.

e.g.

Code:

select * from
(
  select pk, uniq1, uniq2, uniq3, uniq4, blah, bleh, meh, 
  rank() over (partition by uniq1, uniq2, uniq3, uniq4 order by pk desc)  as rk
  from ...
) as subq
where rk = 1

Without knowing the table structure, the other columns you want and their types, DBMS and hardware it is a little hard to be more precise.

[edit]
Here is a link for an SQL Server server centric discussion and illustration
http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx

iShater · May 14, 2009

This is an Oracle 10g database, no information available to me about the HW. There are enough joins early on in the query that my head is spinning. 😱

AlmostThere · May 14, 2009

It sounds like something to try then.

I don't have Oracle 10 but this article sounds like it should support the necessary functionality http://www.oracle.com/technology/oramag/code/tips2005/062005.html .

As the database knows what you are really asking, it will be able to optimise the execution path and where possible distribute the aggregation for each partition. Also indexes and running table statistics can lead to order of magnitude improvements. And use stored procedures.

For optimising your SQL, assuming you are on a Mac client, you could trial http://www.aquafold.com/ which has a straight forward query analyser (it's a good general tool for multiple dbms) , or use the Oracle optimisation tools. I don't know what you are using...

Search

Search

macOS SQL question

iShater

macrumors 604

lee1210

macrumors 68040

iShater

macrumors 604

lee1210

macrumors 68040

AlmostThere

macrumors 6502a

iShater

macrumors 604

AlmostThere

macrumors 6502a

Our Staff