Register FAQ / Rules Forum Spy Search Today's Posts Mark Forums Read
Go Back   MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Reply
 
Thread Tools Search this Thread Display Modes
Old Aug 12, 2013, 10:21 PM   #1
scem0
macrumors 604
 
scem0's Avatar
 
Join Date: Jul 2002
Location: back in NYC!
MySQL Question Re: Complicated Comparisons

Thanks for taking the time to help:

I have a table of media each with a unique ID.

Code:
**********************
*  ID  *  Filename   *
**********************
*  56  *  Cat.jpg    *
*  57  *  Dog.jpg    *
*  58  *  Hitler.gif *
**********************
I have a table of tags, each with a unique ID.

Code:
**********************
*  ID  *  Attribute  *
**********************
*  20  *  Funny      *
*  21  *  Sad        *
*  22  *  Confusing  *
**********************
I have a table that matches the tags to the media:

Code:
*********************
* Media ID * Tag ID *
*********************
*  58      *  22    *
*  58      *  21    *
*  56      *  20    *
*********************
So, in this example, hitler.gif is confusing and sad. Cat.jpg is just funny.

This table that associates tags with media will end up being HUGE. I don't mind it being huge, but I do need to be extra efficient when I work with it to keep CPU load down. I need an efficient way to find photos with very specific attributes. For example, I might need to find all photos that are both Funny (tag 20) AND Confusing (tag 22) but that are not also tagged as offensive (tag 25) OR NSFW (tag 30).

I'm an SQL newbie, so I'm used to "SELECT x from y where Tag = 22 as a generic simple call to find specific data. But I can't just say where tag = 22 AND tag = 24. No ands will ever return a result because each association between tag and image is a separate row. No row tests true for two attributes. So, how do I do comparisons on multi-row data? Specifically, how do I do comparisons on multi-row data that combines looking for media with certain tags, specifically without certain tags, potentially with "ands", and potentially with "ors". Hopefully that makes sense. I'm trying to avoid doing this logic part with PHP, if possible.

Sorry if the answer is obvious. I tried to google for this, and couldn't find anything because I wasn't quite sure what keywords to search for.

Thank you, thank you, thank you for the help.
scem0 is offline   1 Reply With Quote
Old Aug 12, 2013, 10:55 PM   #2
lee1210
macrumors 68040
 
lee1210's Avatar
 
Join Date: Jan 2005
Location: Dallas, TX
select a.media_id from media_tag a
left outer join media_tag b on a.media_id = b.media_id and b.tag_id = 21
left outer join media_tag c on a.media_id = c.media_id and c.tag_id = 24
left outer join media_tag d on a.media_id = d.media_id and d.tag_id = 30
where (a.tag_id = 20 OR b.tag_id is not null) and c.tag_id is null and d.tag_id is null;

It's been a long time, so this is likely wrong, and definitely inefficient. What I'd hope this would yield is all media_ids where it is tagged with 20 or 21, but is not tagged with 24 or 30.

I'm confident a DBA or SQL wunderkind will point out a clause or function that will turn this into a tiny, super efficient query. I thought I'd try to stretch my brain.

-Lee
lee1210 is online now   1 Reply With Quote
Old Aug 13, 2013, 12:47 AM   #3
scem0
Thread Starter
macrumors 604
 
scem0's Avatar
 
Join Date: Jul 2002
Location: back in NYC!
Very enlightening, Lee. I'm the last person to be able to judge the efficiency of your code, so if some one else wants to weigh in, please do so. But I'm very thankful to have a direction, at least. So, thank you!

Emerson
scem0 is offline   0 Reply With Quote
Old Aug 13, 2013, 04:41 AM   #4
mattofsmeg
macrumors newbie
 
Join Date: Aug 2013
Hi,

You seem to have constructed a many to many table structure when I think you want a many to one scenario.

As a media column can have many tags but an individual tag column can only have one media associated with it.

How about if you structure the data like this:

Keep the media table
Drop the media id / tag id join table

create a new table for each of:
Funny Tag
Sad Tag
Confusing Tag

each of these tables will have a column called 'media_id'

representing each media objects id that is tagged for that particular category

Then your queries will be much simpler.
mattofsmeg is offline   0 Reply With Quote
Old Aug 13, 2013, 07:25 AM   #5
robvas
macrumors 68000
 
Join Date: Mar 2009
Location: USA
Quote:
Originally Posted by mattofsmeg View Post
Funny Tag
Sad Tag
Confusing Tag
This seems like it would be bad having to create a table for each tag. Wouldn't the other way work just fine, especially if it was indexed well?
robvas is offline   0 Reply With Quote
Old Aug 13, 2013, 07:29 AM   #6
pmau
macrumors 6502
 
Join Date: Nov 2010
May I add that you can prefix each SQL statement with EXPLAIN and MySQL will tell you what it had to do to obtain the result.

EXPLAIN SELECT data FROM table WHERE this > that;

It will tell you about use of indexes, etc.

Just a comment.
pmau is offline   0 Reply With Quote
Old Aug 13, 2013, 11:26 AM   #7
scem0
Thread Starter
macrumors 604
 
scem0's Avatar
 
Join Date: Jul 2002
Location: back in NYC!
Thanks pmau. I'll definitely try this out.

Quote:
Originally Posted by mattofsmeg View Post
Hi,

You seem to have constructed a many to many table structure when I think you want a many to one scenario.

As a media column can have many tags but an individual tag column can only have one media associated with it.

How about if you structure the data like this:

Keep the media table
Drop the media id / tag id join table

create a new table for each of:
Funny Tag
Sad Tag
Confusing Tag

each of these tables will have a column called 'media_id'

representing each media objects id that is tagged for that particular category

Then your queries will be much simpler.
Mattofsmeg –*thanks for the response. I don't think this will be practical for me because, in reality, I could have thousands of tags. For the example I only have three, but chances are I'll have a couple hundred to 1000+.
scem0 is offline   0 Reply With Quote
Old Aug 13, 2013, 01:19 PM   #8
ElectricSheep
macrumors 6502
 
Join Date: Feb 2004
Location: Wilmington, DE
Send a message via AIM to ElectricSheep
Another way of doing this is with self-joins and subquery expressions (lets assume that 'media_tag' is the table of relations between media_id and tag_id):

Code:
SELECT DISTINCT A.media_id FROM
    (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 24) as A JOIN
    (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 22) as B ON A.media_id = B.media_id
    LEFT OUTER JOIN (SELECT DISTINCT media_id FROM media_tag WHERE tag_id = 25 OR tag_id = 30) as C ON A.media_id = C.media_id
    WHERE C.media_id IS NULL;
This should result in a set of all media_ids which have a relation to tag_id 24 and 22, but do not have a relation to tag_id 25 or 30.

The first inner-join enforces the AND part of the desired query (has both tags 24 and 22), while the LEFT OUTER JOIN ... WHERE ... IS NULL is a sort of anti-join that excludes the OR part of the query (does not have tags 25 or 30).
__________________
15'' MBP (early 2011) | i7 3770k Hackintosh | i7 Mac Mini (late 2012) | iPhone 5 | iPad 3 (2012) | iPad mini | MacOS X 10.9.2
ElectricSheep is offline   0 Reply With Quote
Old Aug 14, 2013, 06:53 AM   #9
960design
macrumors 6502
 
Join Date: Apr 2012
Location: Destin, FL
I dislike complex SQL statements. I use multiple simple SQL statements.
Far easier to debug and modify over time, at least for me.

You've created the most efficient table structure for a many to many ( well if you have unique ids for the media_tag table ) relationship.

HUGE is not a problem for indexed SQL. A very loose definition for huge is a little over 2 Billion elements in an SQL table.
__________________
TI-99/4A, tape cassette, 12" B&W Zenith

Last edited by 960design; Aug 14, 2013 at 07:00 AM. Reason: fixed mizspleing
960design is offline   0 Reply With Quote
Old Aug 14, 2013, 09:46 AM   #10
Bach2011
macrumors member
 
Join Date: Aug 2011
Well, not sure if you can do something like this and if I understand your requirement correctly.

my tag table
20 funny
21 sad
22 confusing
25 offensive
30 NSFW

my match table
58 22
58 21
56 20
58 25
Code:
SELECT *
  FROM (SELECT media_id,
               tag_id,
               ATTRIBUTE,
               filename
          FROM match a,
               media b,
               tags c
         WHERE a.media_id = b.ID
           AND a.tag_id = c.ID) a
 WHERE a.tag_id IN (20, 22) -- get media_id that has these tags.    
  -- AND a.tag_id NOT IN (25); -- don't do this because we need to look as a 'group'. Use below subquery. 
   AND NOT EXISTS -- do not include media_id that has these tags
          (  SELECT 'x'
               FROM match a1
              WHERE a1.tag_id IN (25,30)
                AND a.media_id = a1.media_id
           GROUP BY media_id);
result
media_id tag_id attribute filename
56 20 funny cat.jpg

PS. I tried this on Oracle, and never worked on MySQL or mobile database.
Bach2011 is offline   0 Reply With Quote
Old Aug 14, 2013, 09:50 AM   #11
robbieduncan
Moderator
 
robbieduncan's Avatar
 
Join Date: Jul 2002
Location: London
2 responses mention indexes. The OP does not.

Are these tables indexed?
robbieduncan is offline   0 Reply With Quote
Old Aug 14, 2013, 11:24 AM   #12
aly
macrumors member
 
Join Date: Jul 2006
Location: Scotland
I came up against a similar type of problem a while back. I ended up with a table with millions of rows and while this is of course fine, I found that using SELECT statements and trying to join tables lead to significant waits after a query was sent. My solution was to break the table up in to multiple tables which is similar to the suggestion above to have a table per tag.

Alternatively have you thought of creating a table with multiple columns where each column is a different tag and each row corresponds to a Media ID? Then it is very simple to return all tags of one Media ID or return all Media IDs meeting a particular tag or set of tags. Managing the adding of tags is a case of adding a column to this table. Each of the tag columns can also be a simple bit type.

I guess this method depends on the column limitations of a table? Which I don't know.
__________________
ARGH!
aly is offline   0 Reply With Quote
Old Aug 14, 2013, 06:53 PM   #13
mattofsmeg
macrumors newbie
 
Join Date: Aug 2013
Quote:
Originally Posted by scem0 View Post
Thanks pmau. I'll definitely try this out.



Mattofsmeg *thanks for the response. I don't think this will be practical for me because, in reality, I could have thousands of tags. For the example I only have three, but chances are I'll have a couple hundred to 1000+.
Ok sweet

----------

Quote:
Originally Posted by robvas View Post
This seems like it would be bad having to create a table for each tag. Wouldn't the other way work just fine, especially if it was indexed well?
Ok, do you have any of your own ideas?
mattofsmeg is offline   0 Reply With Quote
Old Aug 14, 2013, 07:38 PM   #14
LAnMarc
macrumors member
 
Join Date: May 2010
Location: Adelaide South Australia
I don't have much experience in MySQL but I have done some complex SQL statements in DB2, SQL Server and MS Access.

Here are 2 examples in DB2 format that would acheive what you want.
I would expect that some variation of these would work in MySQL.
If you are only dealing with a few thousand records, I would expect it to be quick, but indexing the tables would improve the speed.

1. Join to a summarised sub-select:
select m.ID, m.Filename
from media as m
join ( select tm.MediaID,
sum(if tm.TagID=20 then 1 else 0 end) as s20,
sum(if tm.TagID=22 then 1 else 0 end) as s22,
sum(if tm.TagID=25 then 1 else 0 end) as s25,
sum(if tm.TagID=30 then 1 else 0 end) as s30
from tag_media as tm
group by tm.MediaID ) as s on s.MediaID = m.ID
where s.s20>0 and s.s22>0 and s.s25=0 and s.s30=0;

2. Check for presence of media ID in 4 lists of media IDs, 1 for each tag selection:
select m.ID, m.Filename
from media as m
where
m.ID in ( select tm1.MediaID from tag_media as tm1 where tm1.TagID=20 ) and
m.ID in ( select tm2.MediaID from tag_media as tm2 where tm1.TagID=22 ) and
m.ID not in ( select tm3.MediaID from tag_media as tm3 where tm3.TagID=25 ) and
m.ID not in ( select tm4.MediaID from tag_media as tm4 where tm4.TagID=30 );
__________________
2010 MacBook Air 13.3", iPad 4G/LTE, iPhone 3GS, iPod Touch (1st Gen), Time Capsule, TV2, TV3
LAnMarc is offline   0 Reply With Quote
Old Aug 14, 2013, 08:27 PM   #15
ohbrilliance
macrumors 6502a
 
Join Date: May 2007
Location: Melbourne, Australia
Some really interesting approaches in this thread.

Quote:
Originally Posted by LAnMarc View Post
2. Check for presence of media ID in 4 lists of media IDs, 1 for each tag selection:
select m.ID, m.Filename
from media as m
where
m.ID in ( select tm1.MediaID from tag_media as tm1 where tm1.TagID=20 ) and
m.ID in ( select tm2.MediaID from tag_media as tm2 where tm1.TagID=22 ) and
m.ID not in ( select tm3.MediaID from tag_media as tm3 where tm3.TagID=25 ) and
m.ID not in ( select tm4.MediaID from tag_media as tm4 where tm4.TagID=30 );
In MySQL, 'select in' queries can have terrible performance. It has something to do with the (select ... ) statement being executed once for every row of the main select query. The approach I take is to run the ( select ... ) queries individually, and then build the sql with implodes, e.g. (not working code!):

$result1 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=22');
$result2 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=25');
$result3 = execute('select tm2.MediaID from tag_media as tm2 where tm1.TagID=30');

$query = 'select m.ID, m.Filename
from media m
where
m.ID in ( implode(',', $result1 )
and m.ID in ( implode(',', $result2 )
and m.ID not in ( implode(',', $result3 )'
ohbrilliance is offline   0 Reply With Quote
Old Aug 14, 2013, 09:13 PM   #16
sreeves
macrumors newbie
 
Join Date: Aug 2013
GROUP_CONCAT is a fun solution

In the following example we search for pics with tags 1 and 3 but not 2 or 4:
Code:
SELECT picid, GROUP_CONCAT(tagid ORDER BY tagid) AS thispicstagids FROM matches
GROUP BY `picid`
HAVING thispicstagids REGEXP '^([0-9]*,)*1(,[0-9]*)*,3(,[0-9]*)*$'
AND thispicstagids NOT REGEXP '(^|,)2(,|$)'
AND thispicstagids NOT REGEXP '(^|,)4(,|$)'
We create an ordered list of all the tags of each pic with the GROUP_CONCAT function.
Then we use a regular expression to search that list. Here's a breakdown of that expression:
'^' starts us at the beginning of the string. If this wasn't here, due to the nature of the next part, we might match numbers that ended with the figures (like allowing '24' when you only wanted '4').
'([0-9]*,)*' searches for any amount (the last asterisk) of the group in parenthesis, being any amount of digits (the class declared by the square brackets) followed by a comma.
'1' searches for our first search id. Replace with the first figure in your array.
'(,[0-9]*)*' searches for any amount of the group of a comma and any number of digits. This is where the search should end if you are searching for only one tag.
',' then there should be at least one comma between the first tag and the second. Only insert this comma if you're looking for more than one tag.
'3' searches for our second search id. Replace with the second and following figures in your array.
'(,[0-9]*)*' searches for any amount of the group of a comma and any number of digits, allowing for tags to appear after your last search figure. If this wasn't here, we'd match 3 when 33 was present.
'$' matches the end of the string, for the same reason we matched the beginning.
The negative searches use NOT REGEXP, and search for our figures 2 and 4 preceded exclusively either by the beginning of the string or a comma, and followed exclusively either by a comma or the end of the string

I haven't tested this as thoroughly as one might, but you might enjoy using GROUP_CONCAT for this.

Note I am only working with the matches table, because that's where all this magic happens.

I should also post this on my wiki at http://shawnreeves.net/wiki/index.php?title=SQL_recipes

----------

I just realized that what precedes the first tag could simply be the beginning of the string or a comma, so I shortened the REGEX. Likewise, final search term could also be followed by a search for a comma or the end of the string, shown in this example of searching for three tag ids (1,3,72) instead of two:
Code:
SELECT picid, GROUP_CONCAT(tagid ORDER BY tagid) AS thispicstagids FROM matches
GROUP BY `picid`
HAVING thispicstagids REGEXP '(^|,)1(,[0-9]*)*,3(,[0-9]*)*,73(,|$)'
AND thispicstagids NOT REGEXP '(^|,)2(,|$)'
AND thispicstagids NOT REGEXP '(^|,)4(,|$)'
Also, note you need to order the tags in the regexp in the same order they'd appear in the concatenation.

Last edited by sreeves; Aug 14, 2013 at 10:07 PM.
sreeves is offline   0 Reply With Quote
Old Aug 21, 2013, 10:48 PM   #17
Irfin
macrumors newbie
 
Join Date: Nov 2011
Try IN operator

Suppose you want to retrieve all media which have tags funny(20), confusing (22), but not offensive (25) or NSFW (30). You can try to use the 'IN' operator:

Code:
SELECT media_id FROM tags_to_media
WHERE tag_id IN (20, 22)
  AND tag_id NOT IN (25,30)
Of course you need to first make list for the tags you want to include and excluded. Perhaps querying them first and then construct the SQL in string form, and then execute it.

Sorry bad english here.

Last edited by Irfin; Aug 21, 2013 at 10:55 PM.
Irfin is offline   0 Reply With Quote

Reply
MacRumors Forums > Apple Systems and Services > Programming > Mac Programming

Tags
mysql, php, sql, tables

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 04:11 PM.

Mac Rumors | Mac | iPhone | iPhone Game Reviews | iPhone Apps

Mobile Version | Fixed | Fluid | Fluid HD
Copyright 2002-2013, MacRumors.com, LLC