|Aug 12, 2013, 11:21 PM||#1|
MySQL Question Re: Complicated Comparisons
Thanks for taking the time to help:
I have a table of media each with a unique ID.
********************** * ID * Filename * ********************** * 56 * Cat.jpg * * 57 * Dog.jpg * * 58 * Hitler.gif * **********************
********************** * ID * Attribute * ********************** * 20 * Funny * * 21 * Sad * * 22 * Confusing * **********************
********************* * Media ID * Tag ID * ********************* * 58 * 22 * * 58 * 21 * * 56 * 20 * *********************
This table that associates tags with media will end up being HUGE. I don't mind it being huge, but I do need to be extra efficient when I work with it to keep CPU load down. I need an efficient way to find photos with very specific attributes. For example, I might need to find all photos that are both Funny (tag 20) AND Confusing (tag 22) but that are not also tagged as offensive (tag 25) OR NSFW (tag 30).
I'm an SQL newbie, so I'm used to "SELECT x from y where Tag = 22 as a generic simple call to find specific data. But I can't just say where tag = 22 AND tag = 24. No ands will ever return a result because each association between tag and image is a separate row. No row tests true for two attributes. So, how do I do comparisons on multi-row data? Specifically, how do I do comparisons on multi-row data that combines looking for media with certain tags, specifically without certain tags, potentially with "ands", and potentially with "ors". Hopefully that makes sense. I'm trying to avoid doing this logic part with PHP, if possible.
Sorry if the answer is obvious. I tried to google for this, and couldn't find anything because I wasn't quite sure what keywords to search for.
Thank you, thank you, thank you for the help.
|Aug 12, 2013, 11:55 PM||#2|
select a.media_id from media_tag a
left outer join media_tag b on a.media_id = b.media_id and b.tag_id = 21
left outer join media_tag c on a.media_id = c.media_id and c.tag_id = 24
left outer join media_tag d on a.media_id = d.media_id and d.tag_id = 30
where (a.tag_id = 20 OR b.tag_id is not null) and c.tag_id is null and d.tag_id is null;
It's been a long time, so this is likely wrong, and definitely inefficient. What I'd hope this would yield is all media_ids where it is tagged with 20 or 21, but is not tagged with 24 or 30.
I'm confident a DBA or SQL wunderkind will point out a clause or function that will turn this into a tiny, super efficient query. I thought I'd try to stretch my brain.
|Aug 13, 2013, 01:47 AM||#3|
Very enlightening, Lee. I'm the last person to be able to judge the efficiency of your code, so if some one else wants to weigh in, please do so. But I'm very thankful to have a direction, at least. So, thank you!
|Aug 13, 2013, 05:41 AM||#4|
You seem to have constructed a many to many table structure when I think you want a many to one scenario.
As a media column can have many tags but an individual tag column can only have one media associated with it.
How about if you structure the data like this:
Keep the media table
Drop the media id / tag id join table
create a new table for each of:
each of these tables will have a column called 'media_id'
representing each media objects id that is tagged for that particular category
Then your queries will be much simpler.
|Aug 13, 2013, 08:25 AM||#5|
|Aug 13, 2013, 08:29 AM||#6|
May I add that you can prefix each SQL statement with EXPLAIN and MySQL will tell you what it had to do to obtain the result.
EXPLAIN SELECT data FROM table WHERE this > that;
It will tell you about use of indexes, etc.
Just a comment.
|Aug 13, 2013, 12:26 PM||#7|
Thanks pmau. I'll definitely try this out.
|Aug 13, 2013, 02:19 PM||#8|
Another way of doing this is with self-joins and subquery expressions (lets assume that 'media_tag' is the table of relations between media_id and tag_id):
SELECT DISTINCT A.media_id FROM (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 24) as A JOIN (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 22) as B ON A.media_id = B.media_id LEFT OUTER JOIN (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 25 OR tag_id = 30) as C ON A.media_id = C.media_id WHERE C.media_id IS NULL;
The first inner-join enforces the AND part of the desired query (has both tags 24 and 22), while the LEFT OUTER JOIN ... WHERE ... IS NULL is a sort of anti-join that excludes the OR part of the query (does not have tags 25 or 30).
15'' Early 2011 MBP | iPhone 5 | iPad (2012) | MacOS X 10.8.3
|Aug 14, 2013, 07:53 AM||#9|
I dislike complex SQL statements. I use multiple simple SQL statements.
Far easier to debug and modify over time, at least for me.
You've created the most efficient table structure for a many to many ( well if you have unique ids for the media_tag table ) relationship.
HUGE is not a problem for indexed SQL. A very loose definition for huge is a little over 2 Billion elements in an SQL table.
TI-99/4A, tape cassette, 12" B&W Zenith
Last edited by 960design; Aug 14, 2013 at 08:00 AM. Reason: fixed mizspleing
|Aug 14, 2013, 10:46 AM||#10|
Well, not sure if you can do something like this and if I understand your requirement correctly.
my tag table
my match table
SELECT * FROM (SELECT media_id, tag_id, ATTRIBUTE, filename FROM match a, media b, tags c WHERE a.media_id = b.ID AND a.tag_id = c.ID) a WHERE a.tag_id IN (20, 22) -- get media_id that has these tags. -- AND a.tag_id NOT IN (25); -- don't do this because we need to look as a 'group'. Use below subquery. AND NOT EXISTS -- do not include media_id that has these tags ( SELECT 'x' FROM match a1 WHERE a1.tag_id IN (25,30) AND a.media_id = a1.media_id GROUP BY media_id);
media_id tag_id attribute filename
56 20 funny cat.jpg
PS. I tried this on Oracle, and never worked on MySQL or mobile database.
|Aug 14, 2013, 12:24 PM||#12|
I came up against a similar type of problem a while back. I ended up with a table with millions of rows and while this is of course fine, I found that using SELECT statements and trying to join tables lead to significant waits after a query was sent. My solution was to break the table up in to multiple tables which is similar to the suggestion above to have a table per tag.
Alternatively have you thought of creating a table with multiple columns where each column is a different tag and each row corresponds to a Media ID? Then it is very simple to return all tags of one Media ID or return all Media IDs meeting a particular tag or set of tags. Managing the adding of tags is a case of adding a column to this table. Each of the tag columns can also be a simple bit type.
I guess this method depends on the column limitations of a table? Which I don't know.
|Aug 14, 2013, 07:53 PM||#13|
|Aug 14, 2013, 08:38 PM||#14|
I don't have much experience in MySQL but I have done some complex SQL statements in DB2, SQL Server and MS Access.
Here are 2 examples in DB2 format that would acheive what you want.
I would expect that some variation of these would work in MySQL.
If you are only dealing with a few thousand records, I would expect it to be quick, but indexing the tables would improve the speed.
1. Join to a summarised sub-select:
select m.ID, m.Filename
from media as m
join ( select tm.MediaID,
sum(if tm.TagID=20 then 1 else 0 end) as s20,
sum(if tm.TagID=22 then 1 else 0 end) as s22,
sum(if tm.TagID=25 then 1 else 0 end) as s25,
sum(if tm.TagID=30 then 1 else 0 end) as s30
from tag_media as tm
group by tm.MediaID ) as s on s.MediaID = m.ID
where s.s20>0 and s.s22>0 and s.s25=0 and s.s30=0;
2. Check for presence of media ID in 4 lists of media IDs, 1 for each tag selection:
select m.ID, m.Filename
from media as m
m.ID in ( select tm1.MediaID from tag_media as tm1 where tm1.TagID=20 ) and
m.ID in ( select tm2.MediaID from tag_media as tm2 where tm1.TagID=22 ) and
m.ID not in ( select tm3.MediaID from tag_media as tm3 where tm3.TagID=25 ) and
m.ID not in ( select tm4.MediaID from tag_media as tm4 where tm4.TagID=30 );
2010 MacBook Air 13.3", iPad 4G/LTE, iPhone 3GS, iPod Touch (1st Gen), Time Capsule, TV2, TV3
|Aug 14, 2013, 09:27 PM||#15|
Some really interesting approaches in this thread.
$result1 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=22');
$result2 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=25');
$result3 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=30');
$query = 'select m.ID, m.Filename
from media m
m.ID in ( implode(',', $result1 )
and m.ID in ( implode(',', $result2 )
and m.ID not in ( implode(',', $result3 )'
|Aug 14, 2013, 10:13 PM||#16|
GROUP_CONCAT is a fun solution
In the following example we search for pics with tags 1 and 3 but not 2 or 4:
SELECT picid, GROUP_CONCAT(tagid ORDER BY tagid) AS thispicstagids FROM matches GROUP BY `picid` HAVING thispicstagids REGEXP '^([0-9]*,)*1(,[0-9]*)*,3(,[0-9]*)*$' AND thispicstagids NOT REGEXP '(^|,)2(,|$)' AND thispicstagids NOT REGEXP '(^|,)4(,|$)'
Then we use a regular expression to search that list. Here's a breakdown of that expression:
'^' starts us at the beginning of the string. If this wasn't here, due to the nature of the next part, we might match numbers that ended with the figures (like allowing '24' when you only wanted '4').
'([0-9]*,)*' searches for any amount (the last asterisk) of the group in parenthesis, being any amount of digits (the class declared by the square brackets) followed by a comma.
'1' searches for our first search id. Replace with the first figure in your array.
'(,[0-9]*)*' searches for any amount of the group of a comma and any number of digits. This is where the search should end if you are searching for only one tag.
',' then there should be at least one comma between the first tag and the second. Only insert this comma if you're looking for more than one tag.
'3' searches for our second search id. Replace with the second and following figures in your array.
'(,[0-9]*)*' searches for any amount of the group of a comma and any number of digits, allowing for tags to appear after your last search figure. If this wasn't here, we'd match 3 when 33 was present.
'$' matches the end of the string, for the same reason we matched the beginning.
The negative searches use NOT REGEXP, and search for our figures 2 and 4 preceded exclusively either by the beginning of the string or a comma, and followed exclusively either by a comma or the end of the string
I haven't tested this as thoroughly as one might, but you might enjoy using GROUP_CONCAT for this.
Note I am only working with the matches table, because that's where all this magic happens.
I should also post this on my wiki at http://shawnreeves.net/wiki/index.php?title=SQL_recipes
I just realized that what precedes the first tag could simply be the beginning of the string or a comma, so I shortened the REGEX. Likewise, final search term could also be followed by a search for a comma or the end of the string, shown in this example of searching for three tag ids (1,3,72) instead of two:
SELECT picid, GROUP_CONCAT(tagid ORDER BY tagid) AS thispicstagids FROM matches GROUP BY `picid` HAVING thispicstagids REGEXP '(^|,)1(,[0-9]*)*,3(,[0-9]*)*,73(,|$)' AND thispicstagids NOT REGEXP '(^|,)2(,|$)' AND thispicstagids NOT REGEXP '(^|,)4(,|$)'
Last edited by sreeves; Aug 14, 2013 at 11:07 PM.
|Aug 21, 2013, 11:48 PM||#17|
Try IN operator
Suppose you want to retrieve all media which have tags funny(20), confusing (22), but not offensive (25) or NSFW (30). You can try to use the 'IN' operator:
SELECT media_id FROM tags_to_media WHERE tag_id IN (20, 22) AND tag_id NOT IN (25,30)
Sorry bad english here.
Last edited by Irfin; Aug 21, 2013 at 11:55 PM.
|mysql, php, sql, tables|
|Thread Tools||Search this Thread|
All times are GMT -5. The time now is 07:21 AM.