Database normalization question

Discussion in 'Mac Programming' started by Super Macho Man, Apr 19, 2007.

  1. Super Macho Man macrumors 6502a

    Super Macho Man

    Joined:
    Jul 24, 2006
    Location:
    Hollywood, CA
    #1
    So if I've got a table of books, with columns "id," "title," and "language." I'm using the 2-letter ISO language codes, like en, fr, de, jp, etc. This works well (although it's not normalized) for books that are written in a single language. Currently, book.language is a varchar(10) which is long enough to hold up to 4 languages separated by spaces. But this is bad design.

    So what do I do if a book is written in more than one language? I can create a "language" table with all languages in it, and set up book.language as a foreign key to language.id or whatever. But how can I assign multiple languages to a book, without creating multiple book.language columns, unless that's what I have to do?
     
  2. LtRammstein macrumors 6502a

    LtRammstein

    Joined:
    Jun 20, 2006
    Location:
    Denver, CO
    #2
    Well... There's two ways to go about this. Mind you when I work on a database, I try to find a common table to use as a key, that way I can easily search for desired data.

    I would personally make a Language table, and have the book's ID as a key to that. From there you can specifiy how many columns you want. As a cell for those columns, I would make it a Yes/No data input.

    I know in MS Access you can have a memo field, so you could just put the 2-letter ISO language code in there seperated by spaces or commas.

    I'd rather have a lot of tables with each table taking up very little space than a few tables taking up a lot of space. Plus, I find that search in many tables with very little space to be faster.

    Steve
     
  3. toddburch macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #3
    Three tables

    You could have your book table (with ID as primary key), with no language info.

    Then, you could have a language table, with the ISO code column as a primary key, with no book info.

    Finally, you could have a book/language table, that ties the book to an available language, with a book ID and Language code both as foreign keys. Insert as many rows as the book is available in languages.

    The database manager will use the language table to make sure you are inserting a valid langauge code in the book/language table, just as it will use the book table to make sure you are inserting a valid book in the book/language table.

    Ain't referential integrity great?! You just took tons of logic and condition checking out of your program and dumped it on the database manager.

    Todd
     
  4. iSee macrumors 68040

    iSee

    Joined:
    Oct 25, 2004
    #4
    I'd set it up like this:
    Code:
    Book Table
    ----------
    bookid
    title
    
    BookLanguage Table
    -------------------
    bookid
    langid
    
    The key in BookLanguage is the combination of bookid and langid.

    You have multiple entries in BookLanguage for a particular book to indicate more than one language (that is, multiple tuples with the same bookid but different langids).

    And you'd have one entry in BookLanguage for a book with one language.

    (Also, you could extend this with a Language table with langid as the key if you wanted to track additional fields relating to an entire language.)

    Edit: ah, I see toddburch beat me to it.
     
  5. Super Macho Man thread starter macrumors 6502a

    Super Macho Man

    Joined:
    Jul 24, 2006
    Location:
    Hollywood, CA
    #5
    OK, I'm reading more about this and it sounds like the join table is the way to go for a many-to-many relationship like this. So I've got the book/language join table set up (book_lang_join) with the foreign keys, but it's empty. When I insert a new book for example, do I update book.lang or book_lang_join.bookid? Will the values in the foreign key columns of book_lang_join automatically propagate to the book and language tables?

    What about the data I already have in the book.language and language.name columns? Is there a way I can populate book_lang_join automatically with an insert statement? A way to somewhat automate this...

    (This is with PostgreSQL by the way :) )
     
  6. savar macrumors 68000

    savar

    Joined:
    Jun 6, 2003
    Location:
    District of Columbia
    #6
    Bingo...this is called a junction or join table. It's not terribly space efficient, but its the best design solution for speed. (Assuming you create the correct indices on each of the three tables.)

    You have to update each table separately. book_lang_join should be the only table with RI constraints, so you insert into the book table first (presumably, the language table is pre-filled with correct metadata), then insert into join table. If you did it the other way around you'd get an error.

    I don't know about Postgre, but in Oracle you can write a stored procedure to do this process in 1 step:
    1) Get next in sequence ID for the book table.
    2) Insert a row into the book table using unique ID.
    3) Insert unique ID & language pairs into join table.
     
  7. iSee macrumors 68040

    iSee

    Joined:
    Oct 25, 2004
    #7
    It sounds like you still have a language column in book (book.lang in the quote). You shouldn't. Everything relating languages to books is in your book_lan_join table.
     
  8. Super Macho Man thread starter macrumors 6502a

    Super Macho Man

    Joined:
    Jul 24, 2006
    Location:
    Hollywood, CA
    #8
    Yeah... but is there a way I can "move" the data in book.lang to the book_lang_join table?
     
  9. toddburch macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #9
    If PostgreSQL allows a select on an insert clause (see blue below), you can populate the book/lang table with a single insert statement. Like this (as a complete example)

    Code:
    CREATE TABLE BOOK_TABLE                  
    (BOOKID   INTEGER NOT NULL,              
    AUTHOR    CHAR(20) NOT NULL ,            
    PAGES     INTEGER NOT NULL ,             
    LANG_CODE CHAR(2) NOT NULL)              
                                              
    IN DATABASE TBURCH ;                     
                                             
    CREATE TABLE BOOK_LANG                   
                                             
    (BOOKID   INTEGER NOT NULL,              
    LANG_CODE CHAR(2) NOT NULL)              
    IN DATABASE TBURCH ;                     
                                             
    INSERT INTO BOOK_TABLE VALUES            
    ( 001 , 'DR. SEUSS',  40,  'EN') ;       
                                             
    INSERT INTO BOOK_TABLE VALUES            
    ( 002 , 'DR. PHIL' ,  3 ,  'EN') ;       
                                             
    INSERT INTO BOOK_TABLE VALUES            
    ( 003 , 'DR. SPOCK', 20 ,  'FR') ;        
                                             
    INSERT INTO BOOK_TABLE VALUES            
    ( 003 , 'DR. SPOCK', 20 ,  'EN') ;       
                                             
    INSERT INTO BOOK_TABLE VALUES            
    ( 003 , 'DR. SPOCK', 20 ,  'DE') ;        
                                             
    COMMIT ;                                 
                                             
    [color=blue]INSERT INTO BOOK_LANG (BOOKID, LANG_CODE)
    SELECT BOOKID, LANG_CODE FROM BOOK_TABLE ;[/color]
                                              
    SELECT *  FROM BOOK_LANG ;   
    
    Then, perhaps with PostgreSQL, you can ALTER your table to remove the column you don't want. In DB2, you would have to drop the table and recreate (along with other hoops and loops to carry the data across).
     
  10. Super Macho Man thread starter macrumors 6502a

    Super Macho Man

    Joined:
    Jul 24, 2006
    Location:
    Hollywood, CA
    #10
    Todd - thank you very much. I finally accomplished what I was trying to do. I am getting this SQL thing... very slowly. :)
     
  11. ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #11
    [classically you would have a "book" table and a "language" table and the third table relating the two. But your language" table wouod have only one column and be usless. So don't bother with it. Now you have just two. One table "book" where you list each book once and one table called book_language that contains just the book_ID and the language code. If a book is in more than one language it applers in book_languages table once per language.

    Now it is easy to querry for say, all books in "EN" and the set of languages for a given title.
     
  12. toddburch macrumors 6502a

    Joined:
    Dec 4, 2006
    Location:
    Katy, Texas
    #12
    Well, perhaps not useless. With RI definitions, it could be used by the database manager to validate language codes, ala a domain of valid values.

    It's use will be mandated by the needs of the application.

    Todd
     
  13. ChrisA macrumors G4

    Joined:
    Jan 5, 2006
    Location:
    Redondo Beach, California
    #13
    Correct. validate is useful.

    Also, One could add columns to the language table, for example to point to a national flag that could be used as an icon.

    When I design something I normally go for the "classic" implementation. So I would have used three tables. The reason for going classic is so I can understand it later or someone after me could.

    PostgreSQL has some feaures that would make this problem easy but are non-standard. For example one could use just one table for books and define one column to contain an ARRAY or language codes. Yes I know not standard not "real relational DBMS" but it would work and be a bit faster.

    But with a small database, ("small means under a million rows) three tables is fast enough
     

Share This Page