SQL SELECT question

Discussion in 'Mac Programming' started by MarkCollette, Jan 30, 2006.

  1. macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #1
    Hello, I'm using PostgreSQL, and have a table like this:

    Code:
    TABLE PartOrders
        partNumber : integer
        orderStatus : integer
        orderTime : timestamp
    
    So, there can be many rows with the same partNumber, because over time there will be reorders for that product. And every row will not necessarily have a unique orderTime, because orders can be inputted in batches. Although, those same times will probably correspond to different parts.

    I'm trying to come up with an SQL query that will give me a result set with one row for each partNumber, where that row has the latest orderTime for the part. So, the latest order for each part.

    I tried GROUP BY partNumber, but it complains that I haven't used an aggregate on orderStatus. And all the examples I find are for returning a single column, but I need to return every column in the table. See, there are more columns than I mentionned, but I've simplified the example.

    Right now I'm getting the distinct partNumbers in one query, and looping over that, doing separate queries per partNumber, and then just taking the row with the latest orderTime. This is a total hack!

    Can someone please help me? Or at least point me to some complex examples that might help? I don't do much SQL so any help would be appreciated.
     
  2. macrumors 6502

    Joined:
    Apr 16, 2003
    Location:
    Boston, MA
    #2
    Try the MAX() function:

    SELECT partNumber, orderStatus, MAX(orderTime) FROM PartOrders ORDER BY partNumber

    That might work, but it's worth a shot.
     
  3. macrumors 65816

    whocares

    Joined:
    Oct 9, 2002
    Location:
    :noitаɔo˩
    #3
    That should list the whole table, with the newest order time in each row. I think you need to GROUP BY partNumber, so:

    SELECT partNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber

    This probably won't work, but it's also worth a try (if you haven't already) ;)
    Also, I would be a good idea to add a column to the table calles something like "id" that auto-increments with each new row added. This will make every entry unique. :)


    [Edit:] Just tried that query on a similarly structured MySQL table and it worked.
     
  4. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #4
    But did you try getting orderStatus as well, particularly from a dataset where orderStatus has the same value a lot of the time? This is what makes PostgreSQL complain that I need to use an aggregate function (things like MAX) on orderStatus.

    - Mark
     
  5. macrumors 65816

    whocares

    Joined:
    Oct 9, 2002
    Location:
    :noitаɔo˩
    #5
    Sorry, I missed that point, try:

    SELECT partNumber, MAX(orderTime), DISTINCT(orderStatus) FROM PartOrders GROUP BY partNumber

    I don't use PostgreSQL, but I guess it works the same as MySQL...
     
  6. macrumors 603

    firestarter

    Joined:
    Dec 31, 2002
    Location:
    Green and pleasant land
    #6
    Nope, that's not what you want. Distinct isn't an aggregate either.
     
  7. macrumors 603

    firestarter

    Joined:
    Dec 31, 2002
    Location:
    Green and pleasant land
    #7
    You can't do that in a single select.

    I've never used postgres - but I do Sybase for a living.

    Firtly, you're on to a looser if you have no unique id on your table. You really need this. On Sybase you can call a column an identity, and have the database assign it unique successive values as you post to the table. Not sure what the same is in Postgres.

    Let's assume then that you've modified your table, and now it's:
    TABLE PartOrders
    orderNumber: integer
    partNumber : integer
    orderStatus : integer
    orderTime : timestamp

    Whocares is right with this solution:
    SELECT partNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber

    And answers your original question - but now you want the order status too. The problem with this, is that the 'group by' and aggregate functions kind of decouple the orderTime row from the partNumber. It's possible to show the max orderStatus and max orderTime for a given partNumber, but in a single select it's not possible to show the exact orderStatus related to a given max orderTime.

    With the modified table (given that orderTime is useless at selecting unique rows) we can get the orderNumber for each partNumber with maximum orderTime. (Actually max orderNumber for a given partNumber should correspond to partNumber with max orderTime, if orderNumber is assigned consecutively).

    select max(orderNumber) from PartOrders group by partNumber

    This should return a list of orderNumbers, the max for each partNumber. This can be then used as a subquery to drive the main query, and give you the partNumber, orderStatus and orderTime that you want:

    select partNumber, orderStatus, orderTime
    from PartOrders
    where orderNumber in
    (select max(orderNumber) from PartOrders group by partNumber)

    I hope that:
    1/ You follow
    2/ Postgres can do this
    3/ I didn't make any stupid goof :D

    Not at work at the moment, so I'm afraid I haven't been able to check this.
     
  8. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #8
    Thanks a lot. I'll try this tonight. In PostgreSQL there's an implied "oid" column that's globally unique, not just to that table. I'll try what you described with s/orderNumber/oid

    Unfortunately the version of PostgreSQL I use does not do sequences easily, so I tend to use oid, even though it makes the code less portable.
     
  9. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #9
    Hmm, firestarter, I was thinking that maybe if the system is down, and old orders have to be manually inputted, that the sequencing of orderTime and orderNumber (or oid) might not be the same.

    How about:

    SELECT * FROM PartOrders WHERE orderNumber IN
    ( SELECT orderNumber, MAX(orderTime) FROM PartOrders GROUP BY partNumber );


    This might sound dumb, but in the sub-select, would the orderNumber come from the same row as the maximum orderTime?

    Thanks again.
     
  10. macrumors 603

    firestarter

    Joined:
    Dec 31, 2002
    Location:
    Green and pleasant land
    #10
    A valid concern.

    Would it be more appropriate just to key on date (no time) and order entry number - with orders fulfilled in strict entry order?

    Who provides the timestamp, the computer or the order entry clerk?

    If orders get entered in a random order after the system was offline, and the computer provides timestamp - then really you're hosed both ways.

    There's a fine line between paranoid engineering (good) and overengineering (bad).

    You'd need a group by on the orderNumber too (everything in the select of a statement using group by must be in the group, or an aggregate function - ignore this and you're in danger of getting big cartesian products).

    Another problem here is that the nested select should just produce one column of values - 'IN' won't work with the two used here.

    I think to do what you want to do in this more complex scenario it gets more complex still. Break it down, use stored procedures and temporary tables to store the interim result if necessary (and cleaner).
     
  11. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #11
    The system provides the time, so yeah, that's probably overengineering.

    Thanks for setting me straight on this. I think now that I've accepted insertion order at my definition of latest, your SELECT suggestion a few posts back is quite a good fit. I lost sight that the timestamp was for system arrival time, and not some arbitrary timing.

    Thanks. You've inspired me to go to the bookstore and try to get a better SQL book for my intro level.
     
  12. macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #12
    Join the table with itself

    You will have to join the table with a subset of itself.

    SELECT *
    FROM partOrders,
    (
    SELECT partNumber as PN ,
    max(orderTime) f
    FROM PartOrders
    GROUP BY (partNumber)
    } as latest

    WHERE partOrders.partNumber = latest.PN;

    So you have two tables (1) partNumber and (2) a subset of partNumber that contains only the latest orders.

    I don't know your application but I worry you may have a bug. Why do you want the lastest orders? Unless you are working inside a transaction the rows returned may not remain the "latest" as the table could be updated by another user while your querry is processing. If you are simply driving a screen display of the "hotest parts" then is OK but if you use this information to modify the database your in trouble




     
  13. macrumors 601

    zimv20

    Joined:
    Jul 18, 2002
    Location:
    chicago
    #13
    i've never been the world's greatest DB designer or SQL programmer, but by the time we're doing either two calls to a table or a join to itself, shouldn't we be looking at the design of table and asking ourselves if perhaps the table wasn't properly designed to do what we need it to do?
     
  14. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #14
    I guess I could split the PartOrders table into HistoricalPartOrders and LatestPartOrders, and do INSERTs into HistoricalPartOrders and UPDATEs into LatestPartOrders. But that sounds like it's have a bunch of tradeoffs, and I'd have to do a query from both to get all the data out.

    Anyways, I still have to try out everyone's suggestions (had to fix some other code first).
     
  15. macrumors 601

    zimv20

    Joined:
    Jul 18, 2002
    Location:
    chicago
    #15
    all right, granted i'm not involved in the project and have no idea the full requirements of the system, but from the little i've seen and applying my rusty and shoddy DB design skills, here's where i'd start:

    i'd have tables to track Parts and Orders, then another table to track which parts go on which order, like this (using very generic data types):

    Parts
    ------
    part no: id
    description: string
    weight: double
    manufacturer: id
    num in stock: int
    most recently ordered: date

    Orders
    --------
    order no: id
    customer: id
    status: id
    when placed: date
    when fulfilled: date

    OrderedParts
    ---------------
    order no: id
    part no: id

    in practice, there would be one row for each unique part, one row for each unique order, and for each order containing n parts, n rows in OrderedParts.

    further, when the entry to Order is made (and the entries to OrderedParts are made), each part for that order has its 'most recently ordered' field updated in Parts. a little kludgy, but at least that query is straightforward.

    thoughts?
     
  16. macrumors 603

    firestarter

    Joined:
    Dec 31, 2002
    Location:
    Green and pleasant land
    #16
    OK, I checked and my grouped subquery works great (in Sybase!)

    Code:
    1> select * from  ProductOrders
    2> 
     orderId     orderStatus productCode 
     ----------- ----------- ----------- 
               1           0         100 
               2           0         100 
               3           1         100 
               4           1         100 
               5           1         200 
               6           0         200 
    
    (6 rows affected)
    1> select max(orderId) from ProductOrders group by productCode
    2> 
                 
     ----------- 
               4 
               6 
    
    (2 rows affected)
    1> select * from ProductOrders where orderId in (select max(orderId) from ProductOrders group by productCode)
    2> 
     orderId     orderStatus productCode 
     ----------- ----------- ----------- 
               4           1         100 
               6           0         200 
    
    (2 rows affected)
    
     
  17. macrumors 68000

    savar

    Joined:
    Jun 6, 2003
    Location:
    District of Columbia
    #17
    YES! This is exactly what I was about to say.

    Sub-queries are horrible and should never, ever be used.*


    *OF course, when you're just a low-level coder and the database design is already written in stone, then you have to resort to hacks to get things working.
     
  18. macrumors 603

    firestarter

    Joined:
    Dec 31, 2002
    Location:
    Green and pleasant land
    #18
    BS!

    If you're finding the latest of anything on a table, you'll be using groups and aggregates, and you will possibly have to use a subquery (or a temp table or suchlike) to make use of the results.

    Subqueries give you a load of powerful techniques (and a good speedup with 'not exists' statements). They're essential in non-toy applications (and in this instance, still needed on the improved schema cited).
     
  19. macrumors 65816

    devman

    Joined:
    Apr 19, 2004
    Location:
    AU
    #19
    You are correct. And so are the others that said the schema should be fixed. That's the correct place to start.
     
  20. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #20
    Sorry, but that didn't give the right results.
     
  21. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #21
    I was reading that kind of thing in my SQL book, about reducing redundancy in a given table, by splitting things into separate tables. So, where I had PartOrders, you have Orders + OrderedParts (I already had Parts). One drawback is that it looks like the SQL queries would be harder to come up with, since every query would involve some kind of joining.

    Thank you for the suggestion, but I think that's outside of my capabilities right now.
     
  22. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #22
    And it works great in PostgreSQL too :)
     
  23. macrumors 601

    zimv20

    Joined:
    Jul 18, 2002
    Location:
    chicago
    #23
    joins aren't that difficult. most of the ones i write a pretty much boilerplate, so i only had to figure out the first one.

    here's an example from one of my sites, where given the id of an album, i display the songs from that album. note that the songs table lists all existing songs, and because a given song can exist on more than one album, there's no column 'album' in the songs table:

    PHP:
        function displaySongsPerAlbum($album_id$conn)
        {
            
    // grab the songs for the given album

            
    $row_query 'SELECT a.song_name, a.version, a.song_id'
                
    ' FROM songs a, albums b, bands c, band_album_assign d,'
                
    ' album_song_assign e'
                
    ' WHERE b.album_id = '$album_id ' AND d.band_id = c.band_id AND'
                
    ' d.album_id = b.album_id AND e.album_id = d.album_id AND'
                
    ' e.song_id = a.song_id'
                
    ' ORDER BY e.play_order';

            if (!(
    $row_result = @ mysql_query($row_query$conn)))
                
    showerror();

            
    // display each song

            
    while ($row mysql_fetch_array($row_result))
            {
                
    displaySong($row$conn);
            }
        }
     
  24. thread starter macrumors 65816

    MarkCollette

    Joined:
    Mar 6, 2003
    Location:
    Calgary, Canada
    #24
    Abbreviated quote:

    It looks like a consequence of your pulling the pairing out into a separate table, is that instead a 1-n correspondence, yours supports m-n. Like if multiple bands worked together on a single album, or a song in in multiple albums, like greatest hits collections, or best of the 80s collections.

    In my case, it would allow me to make meta-orders that could include sub-orders for multiple parts.

    Thanks, I'm going to save your SELECT example, in case that dynamism is needed.
     
  25. macrumors 601

    zimv20

    Joined:
    Jul 18, 2002
    Location:
    chicago
    #25
    yes, that's it exactly. as i said, i'm not really a DB designer, but that's the flexibility i put in the design. i've written some php code for it (bands member can submit songs, create albums, and put songs on that album, plus all the music pages are working), but i've come nowhere near writing all the code to take advantage of that flexibility.

    maybe some day...
     

Share This Page