Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

scem0

macrumors 604
Original poster
Jul 16, 2002
7,028
1
back in NYC!
Thanks for taking the time to help:

I have a table of media each with a unique ID.

Code:
**********************
*  ID  *  Filename   *
**********************
*  56  *  Cat.jpg    *
*  57  *  Dog.jpg    *
*  58  *  Hitler.gif *
**********************

I have a table of tags, each with a unique ID.

Code:
**********************
*  ID  *  Attribute  *
**********************
*  20  *  Funny      *
*  21  *  Sad        *
*  22  *  Confusing  *
**********************

I have a table that matches the tags to the media:

Code:
*********************
* Media ID * Tag ID *
*********************
*  58      *  22    *
*  58      *  21    *
*  56      *  20    *
*********************

So, in this example, hitler.gif is confusing and sad. Cat.jpg is just funny.

This table that associates tags with media will end up being HUGE. I don't mind it being huge, but I do need to be extra efficient when I work with it to keep CPU load down. I need an efficient way to find photos with very specific attributes. For example, I might need to find all photos that are both Funny (tag 20) AND Confusing (tag 22) but that are not also tagged as offensive (tag 25) OR NSFW (tag 30).

I'm an SQL newbie, so I'm used to "SELECT x from y where Tag = 22 as a generic simple call to find specific data. But I can't just say where tag = 22 AND tag = 24. No ands will ever return a result because each association between tag and image is a separate row. No row tests true for two attributes. So, how do I do comparisons on multi-row data? Specifically, how do I do comparisons on multi-row data that combines looking for media with certain tags, specifically without certain tags, potentially with "ands", and potentially with "ors". Hopefully that makes sense. I'm trying to avoid doing this logic part with PHP, if possible.

Sorry if the answer is obvious. I tried to google for this, and couldn't find anything because I wasn't quite sure what keywords to search for. :p

Thank you, thank you, thank you for the help.
 

lee1210

macrumors 68040
Jan 10, 2005
3,182
3
Dallas, TX
select a.media_id from media_tag a
left outer join media_tag b on a.media_id = b.media_id and b.tag_id = 21
left outer join media_tag c on a.media_id = c.media_id and c.tag_id = 24
left outer join media_tag d on a.media_id = d.media_id and d.tag_id = 30
where (a.tag_id = 20 OR b.tag_id is not null) and c.tag_id is null and d.tag_id is null;

It's been a long time, so this is likely wrong, and definitely inefficient. What I'd hope this would yield is all media_ids where it is tagged with 20 or 21, but is not tagged with 24 or 30.

I'm confident a DBA or SQL wunderkind will point out a clause or function that will turn this into a tiny, super efficient query. I thought I'd try to stretch my brain.

-Lee
 

scem0

macrumors 604
Original poster
Jul 16, 2002
7,028
1
back in NYC!
Very enlightening, Lee. I'm the last person to be able to judge the efficiency of your code, so if some one else wants to weigh in, please do so. But I'm very thankful to have a direction, at least. So, thank you!

– Emerson
 

mattofsmeg

macrumors newbie
Aug 13, 2013
24
6
Hi,

You seem to have constructed a many to many table structure when I think you want a many to one scenario.

As a media column can have many tags but an individual tag column can only have one media associated with it.

How about if you structure the data like this:

Keep the media table
Drop the media id / tag id join table

create a new table for each of:
Funny Tag
Sad Tag
Confusing Tag

each of these tables will have a column called 'media_id'

representing each media objects id that is tagged for that particular category

Then your queries will be much simpler.
 

pmau

macrumors 68000
Nov 9, 2010
1,569
854
May I add that you can prefix each SQL statement with EXPLAIN and MySQL will tell you what it had to do to obtain the result.

EXPLAIN SELECT data FROM table WHERE this > that;

It will tell you about use of indexes, etc.

Just a comment.
 

scem0

macrumors 604
Original poster
Jul 16, 2002
7,028
1
back in NYC!
Thanks pmau. I'll definitely try this out.

Hi,

You seem to have constructed a many to many table structure when I think you want a many to one scenario.

As a media column can have many tags but an individual tag column can only have one media associated with it.

How about if you structure the data like this:

Keep the media table
Drop the media id / tag id join table

create a new table for each of:
Funny Tag
Sad Tag
Confusing Tag

each of these tables will have a column called 'media_id'

representing each media objects id that is tagged for that particular category

Then your queries will be much simpler.

Mattofsmeg –*thanks for the response. I don't think this will be practical for me because, in reality, I could have thousands of tags. For the example I only have three, but chances are I'll have a couple hundred to 1000+.
 

ElectricSheep

macrumors 6502
Feb 18, 2004
498
4
Wilmington, DE
Another way of doing this is with self-joins and subquery expressions (lets assume that 'media_tag' is the table of relations between media_id and tag_id):

Code:
SELECT DISTINCT A.media_id FROM
    (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 24) as A JOIN
    (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 22) as B ON A.media_id = B.media_id
    LEFT OUTER JOIN (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 25 OR tag_id = 30) as C ON A.media_id = C.media_id
    WHERE C.media_id IS NULL;

This should result in a set of all media_ids which have a relation to tag_id 24 and 22, but do not have a relation to tag_id 25 or 30.

The first inner-join enforces the AND part of the desired query (has both tags 24 and 22), while the LEFT OUTER JOIN ... WHERE ... IS NULL is a sort of anti-join that excludes the OR part of the query (does not have tags 25 or 30).
 

960design

macrumors 68040
Apr 17, 2012
3,700
1,569
Destin, FL
I dislike complex SQL statements. I use multiple simple SQL statements.
Far easier to debug and modify over time, at least for me.

You've created the most efficient table structure for a many to many ( well if you have unique ids for the media_tag table ) relationship.

HUGE is not a problem for indexed SQL. A very loose definition for huge is a little over 2 Billion elements in an SQL table.
 
Last edited:

Bach2011

macrumors member
Aug 2, 2011
50
1
Well, not sure if you can do something like this and if I understand your requirement correctly. :rolleyes:

my tag table
20 funny
21 sad
22 confusing
25 offensive
30 NSFW

my match table
58 22
58 21
56 20
58 25
Code:
SELECT *
  FROM (SELECT media_id,
               tag_id,
               ATTRIBUTE,
               filename
          FROM match a,
               media b,
               tags c
         WHERE a.media_id = b.ID
           AND a.tag_id = c.ID) a
 WHERE a.tag_id IN (20, 22) -- get media_id that has these tags.    
  -- AND a.tag_id NOT IN (25); -- don't do this because we need to look as a 'group'. Use below subquery. 
   AND NOT EXISTS -- do not include media_id that has these tags
          (  SELECT 'x'
               FROM match a1
              WHERE a1.tag_id IN (25,30)
                AND a.media_id = a1.media_id
           GROUP BY media_id);

result
media_id tag_id attribute filename
56 20 funny cat.jpg

PS. I tried this on Oracle, and never worked on MySQL or mobile database.
 

aly

macrumors member
Jul 3, 2006
88
0
Scotland
I came up against a similar type of problem a while back. I ended up with a table with millions of rows and while this is of course fine, I found that using SELECT statements and trying to join tables lead to significant waits after a query was sent. My solution was to break the table up in to multiple tables which is similar to the suggestion above to have a table per tag.

Alternatively have you thought of creating a table with multiple columns where each column is a different tag and each row corresponds to a Media ID? Then it is very simple to return all tags of one Media ID or return all Media IDs meeting a particular tag or set of tags. Managing the adding of tags is a case of adding a column to this table. Each of the tag columns can also be a simple bit type.

I guess this method depends on the column limitations of a table? Which I don't know.
 

mattofsmeg

macrumors newbie
Aug 13, 2013
24
6
Thanks pmau. I'll definitely try this out.



Mattofsmeg –*thanks for the response. I don't think this will be practical for me because, in reality, I could have thousands of tags. For the example I only have three, but chances are I'll have a couple hundred to 1000+.

Ok sweet

----------

This seems like it would be bad having to create a table for each tag. Wouldn't the other way work just fine, especially if it was indexed well?

Ok, do you have any of your own ideas?
 

LAnMarc

macrumors member
May 11, 2010
39
0
Adelaide South Australia
I don't have much experience in MySQL but I have done some complex SQL statements in DB2, SQL Server and MS Access.

Here are 2 examples in DB2 format that would acheive what you want.
I would expect that some variation of these would work in MySQL.
If you are only dealing with a few thousand records, I would expect it to be quick, but indexing the tables would improve the speed.

1. Join to a summarised sub-select:
select m.ID, m.Filename
from media as m
join ( select tm.MediaID,
sum(if tm.TagID=20 then 1 else 0 end) as s20,
sum(if tm.TagID=22 then 1 else 0 end) as s22,
sum(if tm.TagID=25 then 1 else 0 end) as s25,
sum(if tm.TagID=30 then 1 else 0 end) as s30
from tag_media as tm
group by tm.MediaID ) as s on s.MediaID = m.ID
where s.s20>0 and s.s22>0 and s.s25=0 and s.s30=0;

2. Check for presence of media ID in 4 lists of media IDs, 1 for each tag selection:
select m.ID, m.Filename
from media as m
where
m.ID in ( select tm1.MediaID from tag_media as tm1 where tm1.TagID=20 ) and
m.ID in ( select tm2.MediaID from tag_media as tm2 where tm1.TagID=22 ) and
m.ID not in ( select tm3.MediaID from tag_media as tm3 where tm3.TagID=25 ) and
m.ID not in ( select tm4.MediaID from tag_media as tm4 where tm4.TagID=30 );
 

ohbrilliance

macrumors 65816
May 15, 2007
1,010
355
Melbourne, Australia
Some really interesting approaches in this thread.

2. Check for presence of media ID in 4 lists of media IDs, 1 for each tag selection:
select m.ID, m.Filename
from media as m
where
m.ID in ( select tm1.MediaID from tag_media as tm1 where tm1.TagID=20 ) and
m.ID in ( select tm2.MediaID from tag_media as tm2 where tm1.TagID=22 ) and
m.ID not in ( select tm3.MediaID from tag_media as tm3 where tm3.TagID=25 ) and
m.ID not in ( select tm4.MediaID from tag_media as tm4 where tm4.TagID=30 );

In MySQL, 'select in' queries can have terrible performance. It has something to do with the (select ... ) statement being executed once for every row of the main select query. The approach I take is to run the ( select ... ) queries individually, and then build the sql with implodes, e.g. (not working code!):

$result1 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=22');
$result2 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=25');
$result3 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=30');

$query = 'select m.ID, m.Filename
from media m
where
m.ID in ( implode(',', $result1 )
and m.ID in ( implode(',', $result2 )
and m.ID not in ( implode(',', $result3 )'
 

sreeves

macrumors newbie
Aug 14, 2013
1
0
GROUP_CONCAT is a fun solution

In the following example we search for pics with tags 1 and 3 but not 2 or 4:
Code:
SELECT picid, GROUP_CONCAT(tagid ORDER BY tagid) AS thispicstagids FROM matches
GROUP BY `picid`
HAVING thispicstagids REGEXP '^([0-9]*,)*1(,[0-9]*)*,3(,[0-9]*)*$'
AND thispicstagids NOT REGEXP '(^|,)2(,|$)'
AND thispicstagids NOT REGEXP '(^|,)4(,|$)'
We create an ordered list of all the tags of each pic with the GROUP_CONCAT function.
Then we use a regular expression to search that list. Here's a breakdown of that expression:
'^' starts us at the beginning of the string. If this wasn't here, due to the nature of the next part, we might match numbers that ended with the figures (like allowing '24' when you only wanted '4').
'([0-9]*,)*' searches for any amount (the last asterisk) of the group in parenthesis, being any amount of digits (the class declared by the square brackets) followed by a comma.
'1' searches for our first search id. Replace with the first figure in your array.
'(,[0-9]*)*' searches for any amount of the group of a comma and any number of digits. This is where the search should end if you are searching for only one tag.
',' then there should be at least one comma between the first tag and the second. Only insert this comma if you're looking for more than one tag.
'3' searches for our second search id. Replace with the second and following figures in your array.
'(,[0-9]*)*' searches for any amount of the group of a comma and any number of digits, allowing for tags to appear after your last search figure. If this wasn't here, we'd match 3 when 33 was present.
'$' matches the end of the string, for the same reason we matched the beginning.
The negative searches use NOT REGEXP, and search for our figures 2 and 4 preceded exclusively either by the beginning of the string or a comma, and followed exclusively either by a comma or the end of the string

I haven't tested this as thoroughly as one might, but you might enjoy using GROUP_CONCAT for this.

Note I am only working with the matches table, because that's where all this magic happens.

I should also post this on my wiki at http://shawnreeves.net/wiki/index.php?title=SQL_recipes

----------

I just realized that what precedes the first tag could simply be the beginning of the string or a comma, so I shortened the REGEX. Likewise, final search term could also be followed by a search for a comma or the end of the string, shown in this example of searching for three tag ids (1,3,72) instead of two:
Code:
SELECT picid, GROUP_CONCAT(tagid ORDER BY tagid) AS thispicstagids FROM matches
GROUP BY `picid`
HAVING thispicstagids REGEXP '(^|,)1(,[0-9]*)*,3(,[0-9]*)*,73(,|$)'
AND thispicstagids NOT REGEXP '(^|,)2(,|$)'
AND thispicstagids NOT REGEXP '(^|,)4(,|$)'

Also, note you need to order the tags in the regexp in the same order they'd appear in the concatenation.
 
Last edited:

Irfin

macrumors newbie
Nov 25, 2011
9
0
Try IN operator

Suppose you want to retrieve all media which have tags funny(20), confusing (22), but not offensive (25) or NSFW (30). You can try to use the 'IN' operator:

Code:
SELECT media_id FROM tags_to_media
WHERE tag_id IN (20, 22)
  AND tag_id NOT IN (25,30)

Of course you need to first make list for the tags you want to include and excluded. Perhaps querying them first and then construct the SQL in string form, and then execute it.

Sorry bad english here.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.