SQL SELECT question

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
Hello, I'm using PostgreSQL, and have a table like this:

Code:
TABLE PartOrders
    partNumber : integer
    orderStatus : integer
    orderTime : timestamp
So, there can be many rows with the same partNumber, because over time there will be reorders for that product. And every row will not necessarily have a unique orderTime, because orders can be inputted in batches. Although, those same times will probably correspond to different parts.

I'm trying to come up with an SQL query that will give me a result set with one row for each partNumber, where that row has the latest orderTime for the part. So, the latest order for each part.

I tried GROUP BY partNumber, but it complains that I haven't used an aggregate on orderStatus. And all the examples I find are for returning a single column, but I need to return every column in the table. See, there are more columns than I mentionned, but I've simplified the example.

Right now I'm getting the distinct partNumbers in one query, and looping over that, doing separate queries per partNumber, and then just taking the row with the latest orderTime. This is a total hack!

Can someone please help me? Or at least point me to some complex examples that might help? I don't do much SQL so any help would be appreciated.
 

mwpeters8182

macrumors 6502
Apr 16, 2003
411
0
Boston, MA
MarkCollette said:
Hello, I'm using PostgreSQL, and have a table like this:

Code:
TABLE PartOrders
    partNumber : integer
    orderStatus : integer
    orderTime : timestamp
So, there can be many rows with the same partNumber, because over time there will be reorders for that product. And every row will not necessarily have a unique orderTime, because orders can be inputted in batches. Although, those same times will probably correspond to different parts.

I'm trying to come up with an SQL query that will give me a result set with one row for each partNumber, where that row has the latest orderTime for the part. So, the latest order for each part.

I tried GROUP BY partNumber, but it complains that I haven't used an aggregate on orderStatus. And all the examples I find are for returning a single column, but I need to return every column in the table. See, there are more columns than I mentionned, but I've simplified the example.

Right now I'm getting the distinct partNumbers in one query, and looping over that, doing separate queries per partNumber, and then just taking the row with the latest orderTime. This is a total hack!

Can someone please help me? Or at least point me to some complex examples that might help? I don't do much SQL so any help would be appreciated.
Try the MAX() function:

SELECT partNumber, orderStatus, MAX(orderTime) FROM PartOrders ORDER BY partNumber

That might work, but it's worth a shot.
 

whocares

macrumors 65816
Oct 9, 2002
1,496
0
:noitаɔo˩
mwpeters8182 said:
Try the MAX() function:

SELECT partNumber, orderStatus, MAX(orderTime) FROM PartOrders ORDER BY partNumber

That might work, but it's worth a shot.
That should list the whole table, with the newest order time in each row. I think you need to GROUP BY partNumber, so:

SELECT partNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber

This probably won't work, but it's also worth a try (if you haven't already) ;)
Also, I would be a good idea to add a column to the table calles something like "id" that auto-increments with each new row added. This will make every entry unique. :)


[Edit:] Just tried that query on a similarly structured MySQL table and it worked.
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
whocares said:
That should list the whole table, with the newest order time in each row. I think you need to GROUP BY partNumber, so:

SELECT partNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber

This probably won't work, but it's also worth a try (if you haven't already) ;)
Also, I would be a good idea to add a column to the table calles something like "id" that auto-increments with each new row added. This will make every entry unique. :)


[Edit:] Just tried that query on a similarly structured MySQL table and it worked.
But did you try getting orderStatus as well, particularly from a dataset where orderStatus has the same value a lot of the time? This is what makes PostgreSQL complain that I need to use an aggregate function (things like MAX) on orderStatus.

- Mark
 

whocares

macrumors 65816
Oct 9, 2002
1,496
0
:noitаɔo˩
MarkCollette said:
But did you try getting orderStatus as well, particularly from a dataset where orderStatus has the same value a lot of the time? This is what makes PostgreSQL complain that I need to use an aggregate function (things like MAX) on orderStatus.

- Mark
Sorry, I missed that point, try:

SELECT partNumber, MAX(orderTime), DISTINCT(orderStatus) FROM PartOrders GROUP BY partNumber

I don't use PostgreSQL, but I guess it works the same as MySQL...
 

firestarter

macrumors 603
Dec 31, 2002
5,496
108
Green and pleasant land
whocares said:
Sorry, I missed that point, try:

SELECT partNumber, MAX(orderTime), DISTINCT(orderStatus) FROM PartOrders GROUP BY partNumber

I don't use PostgreSQL, but I guess it works the same as MySQL...
Nope, that's not what you want. Distinct isn't an aggregate either.
 

firestarter

macrumors 603
Dec 31, 2002
5,496
108
Green and pleasant land
MarkCollette said:
But did you try getting orderStatus as well, particularly from a dataset where orderStatus has the same value a lot of the time? This is what makes PostgreSQL complain that I need to use an aggregate function (things like MAX) on orderStatus.

- Mark
You can't do that in a single select.

I've never used postgres - but I do Sybase for a living.

Firtly, you're on to a looser if you have no unique id on your table. You really need this. On Sybase you can call a column an identity, and have the database assign it unique successive values as you post to the table. Not sure what the same is in Postgres.

Let's assume then that you've modified your table, and now it's:
TABLE PartOrders
orderNumber: integer
partNumber : integer
orderStatus : integer
orderTime : timestamp

Whocares is right with this solution:
SELECT partNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber

And answers your original question - but now you want the order status too. The problem with this, is that the 'group by' and aggregate functions kind of decouple the orderTime row from the partNumber. It's possible to show the max orderStatus and max orderTime for a given partNumber, but in a single select it's not possible to show the exact orderStatus related to a given max orderTime.

With the modified table (given that orderTime is useless at selecting unique rows) we can get the orderNumber for each partNumber with maximum orderTime. (Actually max orderNumber for a given partNumber should correspond to partNumber with max orderTime, if orderNumber is assigned consecutively).

select max(orderNumber) from PartOrders group by partNumber

This should return a list of orderNumbers, the max for each partNumber. This can be then used as a subquery to drive the main query, and give you the partNumber, orderStatus and orderTime that you want:

select partNumber, orderStatus, orderTime
from PartOrders
where orderNumber in
(select max(orderNumber) from PartOrders group by partNumber)

I hope that:
1/ You follow
2/ Postgres can do this
3/ I didn't make any stupid goof :D

Not at work at the moment, so I'm afraid I haven't been able to check this.
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
firestarter said:
You can't do that in a single select.

I've never used postgres - but I do Sybase for a living.

<snip>
Thanks a lot. I'll try this tonight. In PostgreSQL there's an implied "oid" column that's globally unique, not just to that table. I'll try what you described with s/orderNumber/oid

Unfortunately the version of PostgreSQL I use does not do sequences easily, so I tend to use oid, even though it makes the code less portable.
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
Hmm, firestarter, I was thinking that maybe if the system is down, and old orders have to be manually inputted, that the sequencing of orderTime and orderNumber (or oid) might not be the same.

How about:

SELECT * FROM PartOrders WHERE orderNumber IN
( SELECT orderNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber );


This might sound dumb, but in the sub-select, would the orderNumber come from the same row as the maximum orderTime?

Thanks again.
 

firestarter

macrumors 603
Dec 31, 2002
5,496
108
Green and pleasant land
MarkCollette said:
Hmm, firestarter, I was thinking that maybe if the system is down, and old orders have to be manually inputted, that the sequencing of orderTime and orderNumber (or oid) might not be the same.
A valid concern.

Would it be more appropriate just to key on date (no time) and order entry number - with orders fulfilled in strict entry order?

Who provides the timestamp, the computer or the order entry clerk?

If orders get entered in a random order after the system was offline, and the computer provides timestamp - then really you're hosed both ways.

There's a fine line between paranoid engineering (good) and overengineering (bad).

How about:

SELECT * FROM PartOrders WHERE orderNumber IN
( SELECT orderNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber );


This might sound dumb, but in the sub-select, would the orderNumber come from the same row as the maximum orderTime?

Thanks again.
You'd need a group by on the orderNumber too (everything in the select of a statement using group by must be in the group, or an aggregate function - ignore this and you're in danger of getting big cartesian products).

Another problem here is that the nested select should just produce one column of values - 'IN' won't work with the two used here.

I think to do what you want to do in this more complex scenario it gets more complex still. Break it down, use stored procedures and temporary tables to store the interim result if necessary (and cleaner).
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
firestarter said:
A valid concern.

Would it be more appropriate just to key on date (no time) and order entry number - with orders fulfilled in strict entry order?

Who provides the timestamp, the computer or the order entry clerk?

If orders get entered in a random order after the system was offline, and the computer provides timestamp - then really you're hosed both ways.

There's a fine line between paranoid engineering (good) and overengineering (bad).
The system provides the time, so yeah, that's probably overengineering.

firestarter said:
Code:
SELECT * FROM PartOrders WHERE orderNumber IN
( SELECT orderNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber );
You'd need a group by on the orderNumber too (everything in the select of a statement using group by must be in the group, or an aggregate function - ignore this and you're in danger of getting big cartesian products).

Another problem here is that the nested select should just produce one column of values - 'IN' won't work with the two used here.

I think to do what you want to do in this more complex scenario it gets more complex still. Break it down, use stored procedures and temporary tables to store the interim result if necessary (and cleaner).
Thanks for setting me straight on this. I think now that I've accepted insertion order at my definition of latest, your SELECT suggestion a few posts back is quite a good fit. I lost sight that the timestamp was for system arrival time, and not some arbitrary timing.

Thanks. You've inspired me to go to the bookstore and try to get a better SQL book for my intro level.
 

ChrisA

macrumors G4
Jan 5, 2006
11,598
380
Redondo Beach, California
Join the table with itself

You will have to join the table with a subset of itself.

SELECT *
FROM partOrders,
(
SELECT partNumber as PN ,
max(orderTime) f
FROM PartOrders
GROUP BY (partNumber)
} as latest

WHERE partOrders.partNumber = latest.PN;

So you have two tables (1) partNumber and (2) a subset of partNumber that contains only the latest orders.

I don't know your application but I worry you may have a bug. Why do you want the lastest orders? Unless you are working inside a transaction the rows returned may not remain the "latest" as the table could be updated by another user while your querry is processing. If you are simply driving a screen display of the "hotest parts" then is OK but if you use this information to modify the database your in trouble




MarkCollette said:
Hello, I'm using PostgreSQL, and have a table like this:

Code:
TABLE PartOrders
    partNumber : integer
    orderStatus : integer
    orderTime : timestamp
So, there can be many rows with the same partNumber, because over time there will be reorders for that product. And every row will not necessarily have a unique orderTime, because orders can be inputted in batches. Although, those same times will probably correspond to different parts.

I'm trying to come up with an SQL query that will give me a result set with one row for each partNumber, where that row has the latest orderTime for the part. So, the latest order for each part.

I tried GROUP BY partNumber, but it complains that I haven't used an aggregate on orderStatus. And all the examples I find are for returning a single column, but I need to return every column in the table. See, there are more columns than I mentionned, but I've simplified the example.

Right now I'm getting the distinct partNumbers in one query, and looping over that, doing separate queries per partNumber, and then just taking the row with the latest orderTime. This is a total hack!

Can someone please help me? Or at least point me to some complex examples that might help? I don't do much SQL so any help would be appreciated.
 

zimv20

macrumors 601
Jul 18, 2002
4,388
7
toronto
i've never been the world's greatest DB designer or SQL programmer, but by the time we're doing either two calls to a table or a join to itself, shouldn't we be looking at the design of table and asking ourselves if perhaps the table wasn't properly designed to do what we need it to do?
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
zimv20 said:
i've never been the world's greatest DB designer or SQL programmer, but by the time we're doing either two calls to a table or a join to itself, shouldn't we be looking at the design of table and asking ourselves if perhaps the table wasn't properly designed to do what we need it to do?
I guess I could split the PartOrders table into HistoricalPartOrders and LatestPartOrders, and do INSERTs into HistoricalPartOrders and UPDATEs into LatestPartOrders. But that sounds like it's have a bunch of tradeoffs, and I'd have to do a query from both to get all the data out.

Anyways, I still have to try out everyone's suggestions (had to fix some other code first).
 

zimv20

macrumors 601
Jul 18, 2002
4,388
7
toronto
all right, granted i'm not involved in the project and have no idea the full requirements of the system, but from the little i've seen and applying my rusty and shoddy DB design skills, here's where i'd start:

i'd have tables to track Parts and Orders, then another table to track which parts go on which order, like this (using very generic data types):

Parts
------
part no: id
description: string
weight: double
manufacturer: id
num in stock: int
most recently ordered: date

Orders
--------
order no: id
customer: id
status: id
when placed: date
when fulfilled: date

OrderedParts
---------------
order no: id
part no: id

in practice, there would be one row for each unique part, one row for each unique order, and for each order containing n parts, n rows in OrderedParts.

further, when the entry to Order is made (and the entries to OrderedParts are made), each part for that order has its 'most recently ordered' field updated in Parts. a little kludgy, but at least that query is straightforward.

thoughts?
 

firestarter

macrumors 603
Dec 31, 2002
5,496
108
Green and pleasant land
firestarter said:
3/ I didn't make any stupid goof :D

Not at work at the moment, so I'm afraid I haven't been able to check this.
OK, I checked and my grouped subquery works great (in Sybase!)

Code:
1> select * from  ProductOrders
2> 
 orderId     orderStatus productCode 
 ----------- ----------- ----------- 
           1           0         100 
           2           0         100 
           3           1         100 
           4           1         100 
           5           1         200 
           6           0         200 

(6 rows affected)
1> select max(orderId) from ProductOrders group by productCode
2> 
             
 ----------- 
           4 
           6 

(2 rows affected)
1> select * from ProductOrders where orderId in (select max(orderId) from ProductOrders group by productCode)
2> 
 orderId     orderStatus productCode 
 ----------- ----------- ----------- 
           4           1         100 
           6           0         200 

(2 rows affected)
 

savar

macrumors 68000
Jun 6, 2003
1,954
0
District of Columbia
zimv20 said:
i've never been the world's greatest DB designer or SQL programmer, but by the time we're doing either two calls to a table or a join to itself, shouldn't we be looking at the design of table and asking ourselves if perhaps the table wasn't properly designed to do what we need it to do?
YES! This is exactly what I was about to say.

Sub-queries are horrible and should never, ever be used.*


*OF course, when you're just a low-level coder and the database design is already written in stone, then you have to resort to hacks to get things working.
 

firestarter

macrumors 603
Dec 31, 2002
5,496
108
Green and pleasant land
savar said:
YES! This is exactly what I was about to say.

Sub-queries are horrible and should never, ever be used.*

*OF course, when you're just a low-level coder and the database design is already written in stone, then you have to resort to hacks to get things working.
BS!

If you're finding the latest of anything on a table, you'll be using groups and aggregates, and you will possibly have to use a subquery (or a temp table or suchlike) to make use of the results.

Subqueries give you a load of powerful techniques (and a good speedup with 'not exists' statements). They're essential in non-toy applications (and in this instance, still needed on the improved schema cited).
 

devman

macrumors 65816
Apr 19, 2004
1,240
4
AU
firestarter said:
BS!

If you're finding the latest of anything on a table, you'll be using groups and aggregates, and you will possibly have to use a subquery (or a temp table or suchlike) to make use of the results.

Subqueries give you a load of powerful techniques (and a good speedup with 'not exists' statements). They're essential in non-toy applications
You are correct. And so are the others that said the schema should be fixed. That's the correct place to start.
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
ChrisA said:
You will have to join the table with a subset of itself.

SELECT *
FROM partOrders,
(
SELECT partNumber as PN ,
max(orderTime) f
FROM PartOrders
GROUP BY (partNumber)
} as latest

WHERE partOrders.partNumber = latest.PN;
Sorry, but that didn't give the right results.
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
zimv20 said:
Parts
------
part no: id
description: string
weight: double
manufacturer: id
num in stock: int
most recently ordered: date

Orders
--------
order no: id
customer: id
status: id
when placed: date
when fulfilled: date

OrderedParts
---------------
order no: id
part no: id
I was reading that kind of thing in my SQL book, about reducing redundancy in a given table, by splitting things into separate tables. So, where I had PartOrders, you have Orders + OrderedParts (I already had Parts). One drawback is that it looks like the SQL queries would be harder to come up with, since every query would involve some kind of joining.

Thank you for the suggestion, but I think that's outside of my capabilities right now.
 

zimv20

macrumors 601
Jul 18, 2002
4,388
7
toronto
MarkCollette said:
Thank you for the suggestion, but I think that's outside of my capabilities right now.
joins aren't that difficult. most of the ones i write a pretty much boilerplate, so i only had to figure out the first one.

here's an example from one of my sites, where given the id of an album, i display the songs from that album. note that the songs table lists all existing songs, and because a given song can exist on more than one album, there's no column 'album' in the songs table:

PHP:
    function displaySongsPerAlbum($album_id, $conn)
    {
        // grab the songs for the given album

        $row_query = 'SELECT a.song_name, a.version, a.song_id'
            . ' FROM songs a, albums b, bands c, band_album_assign d,'
            . ' album_song_assign e'
            . ' WHERE b.album_id = '. $album_id . ' AND d.band_id = c.band_id AND'
            . ' d.album_id = b.album_id AND e.album_id = d.album_id AND'
            . ' e.song_id = a.song_id'
            . ' ORDER BY e.play_order';

        if (!($row_result = @ mysql_query($row_query, $conn)))
            showerror();

        // display each song

        while ($row = mysql_fetch_array($row_result))
        {
            displaySong($row, $conn);
        }
    }
 

MarkCollette

macrumors 68000
Original poster
Mar 6, 2003
1,555
32
Toronto, Canada
Abbreviated quote:

zimv20 said:
joins aren't that difficult. most of the ones i write a pretty much boilerplate, so i only had to figure out the first one.

PHP:
        $row_query = 'SELECT a.song_name, a.version, a.song_id'
            . ' FROM songs a, albums b, bands c, band_album_assign d,'
            . ' album_song_assign e'
            . ' WHERE b.album_id = '. $album_id . ' AND d.band_id = c.band_id AND'
            . ' d.album_id = b.album_id AND e.album_id = d.album_id AND'
            . ' e.song_id = a.song_id'
            . ' ORDER BY e.play_order';
It looks like a consequence of your pulling the pairing out into a separate table, is that instead a 1-n correspondence, yours supports m-n. Like if multiple bands worked together on a single album, or a song in in multiple albums, like greatest hits collections, or best of the 80s collections.

In my case, it would allow me to make meta-orders that could include sub-orders for multiple parts.

Thanks, I'm going to save your SELECT example, in case that dynamism is needed.
 

zimv20

macrumors 601
Jul 18, 2002
4,388
7
toronto
MarkCollette said:
It looks like a consequence of your pulling the pairing out into a separate table, is that instead a 1-n correspondence, yours supports m-n. Like if multiple bands worked together on a single album, or a song in in multiple albums, like greatest hits collections, or best of the 80s collections.
yes, that's it exactly. as i said, i'm not really a DB designer, but that's the flexibility i put in the design. i've written some php code for it (bands member can submit songs, create albums, and put songs on that album, plus all the music pages are working), but i've come nowhere near writing all the code to take advantage of that flexibility.

maybe some day...