Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
1) Any insight as to why this is the case?
Why what is the case?
2) Is this a problem for you in practice?
Not at all.
Either your device's capacity exceeds your entire library (so you can take it all at once regardless) OR you're only taking a portion of your library at once, in which case you won't be taking 21GB of songs AND exactly those albums which make up 21GB of duplicates.
My library is split between my internal drive an an external drive. I have about 12GB of duplicates on the internal drive.
3) Do you think your scenario is common enough to warrant restructuring iTunes the way LittleReg1 wants?
No, LittleReg1's proposal is not practical, for the many reasons already stated. Like dXTC, I've spent many years designing, building, maintaining and fixing databases, and the proposal isn't practical. It's much cleaner to have the metadata stored in the files.

I'm fine with the fact that I have duplicates. You have to remember that if the Artist and song Name match, iTunes considers it a duplicate. Not all of those are true duplicates, for the reasons already stated.
 
Why what is the case?

Why 12% of your library is duplicates. It seems high to me, so I'm curious.

No, LittleReg1's proposal is not practical, for the many reasons already stated. Like dXTC, I've spent many years designing, building, maintaining and fixing databases, and the proposal isn't practical. It's much cleaner to have the metadata stored in the files.

:confused: yeah, I've been agreeing with you? When you posted the size of your duplicates I assumed it was to contradict my point that the size of duplicate files is inconsequential under the circumstances of the majority of iTunes users.
 
Why 12% of your library is duplicates. It seems high to me, so I'm curious.
While most duplicates are a single song appearing on two albums, many times the same song appears on multiple albums. For example:
  • "Baby, What You Want Me To Do" by Elvis appears 5 times on the same album.
  • "All You Need Is Love" by The Beatles appears on 4 albums.
  • "Can't Fight The Moonlight" by LeAnn Rimes appears on 5 albums.
  • "Change It" by Stevie Ray Vaughan & Double Trouble appears on 4 albums.
  • "Couldn't Stand The Weather" by SRV appears on 5 albums.
  • "Cocaine" by Eric Clapton appears on 4 albums.
  • "Put A Lid On It" by Squirrel Nut Zippers appears on 3 albums.
When you posted the size of your duplicates I assumed it was to contradict my point that the size of duplicate files is inconsequential under the circumstances of the majority of iTunes users.
No, it wasn't to contradict; only to give some real numbers.
 
I'm not sure if we're talking the same thing (and I don't know how to post images to this forum), but I can provide one example. I have a tune in my database called "If Everyone Cared" by Nickelback. The metatdata field for the artist as it shows in my Windows Explorer is spelled 'Nickleback,' but in iTunes I've corrected it to 'Nickelback.' In the past, when I've somehow lost my iTunes database or it has become corrupted, when I've had to reimport the songs from my storage device, most of my original edits (corrections, additions, etc.) were lost and I had to re-enter them.

As far as the space that duplicates take up being 'trivial,' Apple fanatics should be the last people to dismiss this issue. Apple won't provide any reasonable storage capacity for its gadgets (with the exception of the soon-to-be-extinct iPod Classic), nor the ability to even hook up an external storage device, so each gigabyte of data is precious. I noted a previous poster (GGJ...) here breaking down the amount of space and duplicates, and my case is similar. I have nearly 20GB of duplicates - that's more space required than the entire flash memory of most of Apple's handhelds.

As far as metadata being 'God,' this is quite a stretch of faith. Metadata is far from a perfect system, especially across platforms. File metadata developers had high hopes for this technology in the beginning (especially for professional photographers), but it has dissipated into a sea of non-standardization and lack of attention to the requirements of each medium.

The field labels are not standardized, resulting in wild variances of categorization. For example, I imported a 3-volume CD into iTunes of a Nitty Gritty Dirt Band set, and because of the differences, iTunes created a different 'artist' for each and every cut! Do you have any idea what it was like to sort all that out?? 'Compilations,' 'Collections,' 'Various Artists,' 'Contributing Artists,' 'Songwriter vs. Composer'...it's a huge mess, and any product that would build its interface upon it is woefully idealistic.

Then there's the lack of support from iTunes even if we DO rely upon metadata. For example, I imported my entire catalog of Jethro Tull albums (21 in all) from a single artist folder (instead of one album at a time). Well, iTunes simply chose to ignore over 50 songs because (as I read in another blog) it "couldn't figure out the metadata" even though when I went into Windows and looked, at least the Song Title, Album, Year, and Artist were there for each and every one. Now, I have to go back through each import (this happened on other collections, as well) and manually enter the cuts and map them to the files on my hard drive.

Computer technology is supposed to serve humans, not the other way around. The way we have created, stored, and accessed record albums for decades is crucial to the music experience, especially from a historical perspective. I don't have any of the problems listed by some of the posters here figuring out which files are indeed real duplicates and which are variations of a particular tune. This is actually part of the point: it is important to know which are which and believe that our computer databases will help us in this endeavor. That programmers can't simply allow multiple reference pointers in a database interface to a single file source makes no sense.
 
I have a tune in my database called "If Everyone Cared" by Nickelback. The metatdata field for the artist as it shows in my Windows Explorer is spelled 'Nickleback,' but in iTunes I've corrected it to 'Nickelback.' In the past, when I've somehow lost my iTunes database or it has become corrupted, when I've had to reimport the songs from my storage device, most of my original edits (corrections, additions, etc.) were lost and I had to re-enter them.

If what records/contains your edits becomes corrupted, then your edits will be corrupted - this sounds like what you are saying. It's an insurmountable problem, because any solution will still have some place that records those edits. When that place is corrupted, you're out of luck. There are ways to back up your iTunes library data, AFAIK.

As far as the space that duplicates take up being 'trivial,' Apple fanatics should be the last people to dismiss this issue. Apple won't provide any reasonable storage capacity for its gadgets (with the exception of the soon-to-be-extinct iPod Classic), nor the ability to even hook up an external storage device, so each gigabyte of data is precious. I noted a previous poster (GGJ...) here breaking down the amount of space and duplicates, and my case is similar. I have nearly 20GB of duplicates - that's more space required than the entire flash memory of most of Apple's handhelds.

The definition of what counts as 'reasonable' when it comes to storage capacity is wildly variable so it's silly to fault Apple for that.

If you have 20GB of duplicates, what percentage of your library is this? If it's 10%, then you have a 200GB library, so even if you got rid of those duplicates you wouldn't be able to put your whole library on a device.

If it's 50% (highly doubtful) then you have a 40GB music library and you can still get your whole library on an iPhone or an iPod Touch. If you got rid of those duplicates you could put your whole library on cheaper devices, but still not get it on a Nano or Shuffle.

The only time it would be worth it to reduce storage through duplicates is if all of the following are satisfied: (1) someone has to take all their music with them all the time (or all their duplicates, for some reason); (2) their iTunes library is too big for the device they want to use; (3) the removal of duplicates would make their library small enough to put on their device; and (4) they are reasonably sure their library won't get any bigger.

So this is like someone who has a 9GB library, 2GB of duplicates (22% of a relatively small library - what kind of library is this?), and has to get it on an 8GB Nano.

Or someone who has a 35GB library, has over 10% of duplicates, has to take it everywhere, doesn't want to pay for a 64GB iPod Touch or a Classic, and will never need any more storage at all.

I wager the population with these needs is so small that there is no incentive for Apple to restructure how iTunes works as a database.

The field labels are not standardized, resulting in wild variances of categorization. For example, I imported a 3-volume CD into iTunes of a Nitty Gritty Dirt Band set, and because of the differences, iTunes created a different 'artist' for each and every cut! Do you have any idea what it was like to sort all that out?? 'Compilations,' 'Collections,' 'Various Artists,' 'Contributing Artists,' 'Songwriter vs. Composer'...it's a huge mess, and any product that would build its interface upon it is woefully idealistic.

It's easy to edit fields for multiple tracks at once in iTunes. I do it all the time. A 3-volume CD of one artist would take maybe 10 minutes to label and be done for all time. I don't understand the problem, unless you are using multiple music management programs with all kinds of different fields. In this case, it is possible that iTunes is not the right product for you.

Then there's the lack of support from iTunes even if we DO rely upon metadata. For example, I imported my entire catalog of Jethro Tull albums (21 in all) from a single artist folder (instead of one album at a time). Well, iTunes simply chose to ignore over 50 songs because (as I read in another blog) it "couldn't figure out the metadata" even though when I went into Windows and looked, at least the Song Title, Album, Year, and Artist were there for each and every one. Now, I have to go back through each import (this happened on other collections, as well) and manually enter the cuts and map them to the files on my hard drive.

What? I'm curious how this happened, how you tried to import this music, etc.. I just dump music into an folder which automatically imports it to iTunes, and then I can edit the fields within iTunes en masse.

The way we have created, stored, and accessed record albums for decades is crucial to the music experience, especially from a historical perspective. I don't have any of the problems listed by some of the posters here figuring out which files are indeed real duplicates and which are variations of a particular tune. This is actually part of the point: it is important to know which are which and believe that our computer databases will help us in this endeavor. That programmers can't simply allow multiple reference pointers in a database interface to a single file source makes no sense.

If I have a track on two vinyl records, and this is 'crucial to the music experience', then it's equally crucial to have the track twice in iTunes. I still don't see how altering the database design reveals this deep commitment to past practices of music consumption. If your covering principles are historical fidelity and mimicking how LPs have been created, stored, and accessed, then trying to reduce a library's duplicates through pointers should be a pretty low priority.

'Variations' of a tune are not immediately obvious nor even on extended listening. Unless you go talk to the people responsible for recording and producing specific albums you have absolutely no idea which tracks are 'real' duplicates.

'Multiple reference pointers to a single file source' in this context has too many compromises and the problem it solves is not worth solving. Actual cases in the world experiencing this problem are a bizarre and isolated confluence of conditions.
 
The metatdata field for the artist as it shows in my Windows Explorer is spelled 'Nickleback,' but in iTunes I've corrected it to 'Nickelback.
That's not the case on the Mac, and metadata that I change in iTunes on the Mac follows the file if I copy it to a Windows system. Perhaps it's Windows not permitting the iTunes changes to be reflected in the file, but I'll test that later with a Windows system.
Apple won't provide any reasonable storage capacity for its gadgets (with the exception of the soon-to-be-extinct iPod Classic), nor the ability to even hook up an external storage device, so each gigabyte of data is precious.
Where are you getting this? Of course you can always connect external storage. And if you're referring to iPods or iPhones, you don't have to sync your entire library to those devices. You can sync selected portions to them. There is not a requirement that your entire iTunes library fits on one device.
I have nearly 20GB of duplicates - that's more space required than the entire flash memory of most of Apple's handhelds.
That's absolutely false. The iPod Shuffle and Nano, yes, but the iPod Classic, iPod Touch, 2 of the 3 iPhone 4S models, 2 of the 3 iPad models, and of course all Mac notebooks have greater capacity than 20GB.
The field labels are not standardized, resulting in wild variances of categorization.
All the fields that are editable in iTunes are standard fields, with no variation.
For example, I imported a 3-volume CD into iTunes of a Nitty Gritty Dirt Band set, and because of the differences, iTunes created a different 'artist' for each and every cut!
What exactly were the differences?
Well, iTunes simply chose to ignore over 50 songs because (as I read in another blog) it "couldn't figure out the metadata" even though when I went into Windows and looked, at least the Song Title, Album, Year, and Artist were there for each and every one. Now, I have to go back through each import (this happened on other collections, as well) and manually enter the cuts and map them to the files on my hard drive.
If the metadata is corrupt or unreadable or missing, of course iTunes can't guess that. However, when that's happened to me when ripping a CD, I use several scripts that can automatically populate metadata in iTunes from information in file names, and vice versa. I can also make mass changes in seconds. Doug's AppleScripts for iTunes is a great resource for managing your iTunes tags and files.
The way we have created, stored, and accessed record albums for decades is crucial to the music experience, especially from a historical perspective.

That programmers can't simply allow multiple reference pointers in a database interface to a single file source makes no sense.
You contradict yourself in these two statements. If you're concerned about "historical perspective", you'll want each album to be complete, with all the songs exactly as they were recorded or remastered for that particular album. The concept of pointing multiple albums to a single recording of a song diminishes that historical integrity. As already stated, it's absolutely not practical. You would need 5 Track Number fields if a song appears on 5 different albums. But what if it appears on 6? It's simply not a good idea.

If you're concerned with the storage taken by duplicates, store the dupes on a different drive and only take one copy of each song with you. Or, like I do, store your most-listened-to music on your internal drive or iPod/iPhone, and keep that music you listen to less frequently on another storage device. It's a very workable solution that doesn't require reinventing the wheel and turning into a hexagon.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.