MySQL calculated value / group by advice

Discussion in 'Web Design and Development' started by big_malk, Jul 26, 2010.

  1. big_malk macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #1
    Hi,

    I'm trying to write a query that selects print products for a custom based on various dropdown selects. Most of the fields are optional, and I'd like to show them close matches if there is no exact match for what they've selected.
    I'm also trying to group the results by printer, so they get the single best match from each company to start with.

    Here's a sample of the SQL:
    Code:
    SELECT (CASE paper WHEN 'gloss' THEN 1 ELSE 0 END) as relevance, `print_products`.*, `printers`.`display_name` AS printer_display_name
    FROM (`print_products`)
    LEFT OUTER JOIN `printers` printers ON `printers`.`id` = `print_products`.`printer_id`
    WHERE `print_products`.`product` = 'brochures'
    AND `print_products`.`active` = 1
    GROUP BY print_products.printer_id
    HAVING relevance = (SELECT MAX((CASE paper WHEN 'gloss' THEN 1 ELSE 0 END)) FROM print_products pp WHERE pp.printer_id = print_products.printer_id and active=1 GROUP BY pp.printer_id)
    ORDER BY `relevance` desc
    LIMIT 10
    This works perfectly without the first GROUP BY (the one in the outer query), it selects the correct rows with the expected calculated relevance value and puts them in order. This includes two rows from the same company, each with relevance of 1. When I add the GROUP BY back in, BOTH of these rows disappear, so no result at all come from that company... obviously not ideal. I can't work out why GROUP BY could make an entire 'group' disappear from the results?

    Any suggestion would be most helpful :)
     
  2. JFreak macrumors 68040

    JFreak

    Joined:
    Jul 11, 2003
    Location:
    Tampere, Finland
    #2
    you're trying too much. make it simpler first.
     
  3. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #3
    All the smaller parts of the query work fine though, the combination just seems to not work together how I expect it to, and I can't work out what it is :confused:

    I an't understand how grouping results would make one of the values of the grouped field completely disappear?
     
  4. trule macrumors 6502

    Joined:
    Mar 16, 2007
    #4
    That's a pretty complicated query.

    Try the LIKE keyword, it works with wildcards and might be closer to what you want.
     
  5. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #5
    I'd considered that, but the data needs to be stored in several different fields (I'm only working with one, paper, at the moment for simplicity), so LIKE wouldn't really work for this, I also don't think you can order LIKE by the number of matches.
     
  6. trule macrumors 6502

    Joined:
    Mar 16, 2007
    #6
    In that case try sub queries, they also help break complex queries into smaller and more manageable parts.
     
  7. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #7
    I am using a subquery?
    The subquery in the HAVING clause pulls the correct maximum relevance, and is used to limit which results are shown.
    It's just adding the GROUP BY clause that, which limiting the results to one per printer, also removes one printer entirely.

    If you meant there's a better way of doing the subqueries I'd be very grateful for your advice :)
     
  8. trule macrumors 6502

    Joined:
    Mar 16, 2007
    #8
    You mean this group by

    GROUP BY pp.printer_id

    if so this makes no sense too me for a Having with in = condition, its equal or its not QED
     
  9. trule macrumors 6502

    Joined:
    Mar 16, 2007
    #9
    I think I would get a list of all printers, products, search keywords so that there would be duplicates for each keyword match. Then use SUM or COUNT on printers AND products to get the number of keyword matches and a sort on that count to get them in the desired order

    Along these lines
    http://www.tizag.com/mysqlTutorial/mysqlcount.php

    It should be a much simpler query.
     
  10. WraithAkaMrak macrumors newbie

    Joined:
    Aug 18, 2006
    #10
    Try running the query, without the Group By AND Having clauses. I'm betting your first result will be a relevance = 0. (Or run it WITH the Group By and WITHOUT the Having.)

    If it's like MS SQL Server, the HAVING clause is run against the results of your Group By. So if your Group By first returns something with relevance = 0, your Having clause filters it out. Giving you no results for a given printer_id.

    Maybe not the most elegant solution, but you could remove Group By from your current query, then wrap what you have in another query, with the Group By in this new outermost query. There are probably more efficient ways to totally rewrite the query, but this should at least get you the rows you're looking for.
     
  11. JFreak macrumors 68040

    JFreak

    Joined:
    Jul 11, 2003
    Location:
    Tampere, Finland
    #11
    Sorry about that, it seems I was a bit arrogant with my post. Didn't mean to be.

    Usually when you CASE something you ask for trouble if you don't understand your data very well. Try building a temporary table first (relevance), and then link the real data to your temp collection. This way your query becomes way simpler and you can better understand what it really is you want to get done. Simple queries work the best, and are easier to optimize in the SQL engine...

    Anyway, the real reason why your query does not work is that you have illegal SELECT & GROUP BY column combination. Your column list has "relevance" which is okay because it is an aggregate, pp.star (which may or may not be okay) and printers.display_name which is at least two columns plus an aggregate; however, your grouping column list only has a pp.printer_id which is one column. These two need to match before your grouping can work.

    EDIT: real-world help here ;)

    You have to:
    1) remove the printers.display_name from your SELECT column list or add it to the grouping column list
    2) remove the pp.star from your SELECT column list or modify the pp.star to only include the column you want to use in the grouping (pp.printer_id)

    It all comes down to what you want to group and how. This is how the grouping is done in theory:

    SELECT col1, col2, ..., colN, aggregate1, aggregate2, ..., aggregateN
    FROM table
    GROUP BY col1, col2, ..., colN
    ;
     
  12. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #12
    Thanks for all your advice guys, but for other reasons the table structure has had to completely change.
    I didn't actually get that query working, but @JFreak that sounds like good advice for the future!

    I'm using CodeIgniter and DMZ to build my queries (as I was before, with just the subquery written manually).
    If you though that was a big query, see what I'm working with now (still quite a bit more to be added!)

    Code:
    SELECT DISTINCT `printers`.*
    FROM (`printers`)
    LEFT OUTER JOIN `products` products ON
    	`printers`.`id` = `products`.`printer_id`
    LEFT OUTER JOIN `product_types` product_product_types ON
    	`product_product_types`.`id` = `products`.`product_type_id`
    LEFT OUTER JOIN `product_attributes`
    	product_product_type_paper_type_product_attributes ON `product_product_types`.`id` =
    	`product_product_type_paper_type_product_attributes`.`product_type_id`
    LEFT OUTER JOIN
    	`product_attribute_values` product_product_type_paper_type_product_attribute_values ON
    	`product_product_type_paper_type_product_attributes`.`id` =
    	`product_product_type_paper_type_product_attribute_values`.`product_attribute_id`
    LEFT OUTER JOIN
    	`product_attribute_value_prices`
    	product_product_type_paper_type_product_attribute_value_product_attribute_value_prices ON
    	`product_product_type_paper_type_product_attribute_values`.`id` =
    	`product_product_type_paper_type_product_attribute_value_product_attribute_value_prices`.`product_attribute_value_id`
    LEFT
    	OUTER JOIN `product_attributes` product_product_type_paper_weight_product_attributes ON
    	`product_product_types`.`id` =
    	`product_product_type_paper_weight_product_attributes`.`product_type_id`
    LEFT OUTER JOIN
    	`product_attribute_values` product_product_type_paper_weight_product_attribute_values ON
    	`product_product_type_paper_weight_product_attributes`.`id` =
    	`product_product_type_paper_weight_product_attribute_values`.`product_attribute_id`
    LEFT OUTER JOIN
    	`product_attribute_value_prices`
    	product_product_type_paper_weight_product_attribute_value_product_attribute_value_prices ON
    	`product_product_type_paper_weight_product_attribute_values`.`id` =
    	`product_product_type_paper_weight_product_attribute_value_product_attribute_value_prices`.`product_attribute_value_id`
    WHERE
    	`product_product_types`.`str_id` = 'brochure'
    AND
    	`product_product_type_paper_type_product_attribute_values`.`str_id` = 'matte'
    AND
    	`product_product_type_paper_type_product_attribute_value_product_attribute_value_prices`.`value` >
    	0
    AND
    	product_product_type_paper_type_product_attribute_value_product_attribute_value_prices.product_id
    	=products.id
    AND `product_product_type_paper_weight_product_attribute_values`.`str_id` =
    	'100gms'
    AND
    	`product_product_type_paper_weight_product_attribute_value_product_attribute_value_prices`.`value` >
    	0
     
  13. trule macrumors 6502

    Joined:
    Mar 16, 2007
    #13
    That's not a complicated query, its just difficult to read. If you can I would use different table names at the very least:

    product_product_types
    -> product_type
    -> type

    product_product_type_paper_weight_product_attribute_value_product_attribute_value_prices
    -> refactor, attribute_name(id,name) attribute_value(id, name_id, product_id, integer_value, string_value)
     
  14. big_malk thread starter macrumors 6502a

    Joined:
    Aug 7, 2005
    Location:
    Scotland
    #14
    The huge table names are made automatically, the biggest 'actual' table name I have is product_attribute_value_price, where various prices can be set for the same product attribute value by different printers.
    I'll also possibly need similar but different tables for 'add-ons', that will have attributes and prices etc, so I'm keeping things unambiguous, but hopefully I won't have to deal with any huge queries like that directly!!
     

Share This Page