Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

G5Unit

macrumors 68020
Original poster
Apr 3, 2005
2,107
10
I'm calling the cops
TLDR:

Current main goal is to have a photo library that only contain the highest quality version of a photo that I have, along with the correct metadata (time captured; GPS, camera if possible). Hide/tag all non-photos (art/memes/etc). I have already done a lot of manual work, but moving forward, I will need some advice

I have been consolidating over 100,000 images from the past 25 years into my Photos.app photos library, and it has been a trip. A combination of my convoluted requirements, and how abstracted Photos.app can be has lead me to have a collection of questions to throw at the community.

Current situation:
  • Down to about ~75,000 photos (through de-duping using PhotoSweeper X and apple's new Ventura duplicate detector/metadata merger)
  • Images consist of:
    • Original RAW and JPEG files from the camera
    • Facebook downloads from various eras (some subtly more compressed than others due to Facebook’s different algorithms throughout the years)
    • Memes/image macros
    • Art
    • Videos
De-duping/Metadata/Capture Date:

While I believe I have mostly de-duped to the highest quality image in photos, some of the highest quality versions did not contain the correct capture date. My current plan is:
  1. Take all the photos from my original source (dozens of folders outside the current iPhoto library)
  2. Batch resize them down to 128 (or 256) at their long end
  3. Import into the Photos library in a specific album and tag them
  4. De-dupe/merge metadata using Apple’s new de-dupe/metadata merge feature in Ventura/IOS 16
It is not very clear how the new de-dupe/merge function works, and I have often seen it take the lower visual quality image as the master. But what it does seem to prioritize is always going with the earliest capture date when merging metadata. So I figured if I batch-resized down to a very low resolution, it will 100 percent of the time:
  1. Use the correct, high quality image
  2. If the low resolution image has an earlier capture date, it would use that
Apple’s duplicate finder is likely using 64x64 or 128x128 bitmaps (or histograms) to do image compare, but I cannot be certain.

Can anyone think of a better method here?



Cleaning out the memes:

  • Detect text-based images such as screenshots/memes/documents.
    • So far, I have used a combination of common text found in non-photo images, such as “likes”, “retweet”, “starter pack”. This has worked great for a large majority of memes, but it is limited. I have used PhotoSweeper to find visually similar images to how tweets are usually formatted, but this is time consuming/clunky/not definitive.
    • I have also used Photos.app search for finding images that:
      • Contain text
      • Don’t camera model or lens metadata
    • The issue is that Photos.app doesn’t appear to have a way of distinguishing a real photo from a screen cap (non iPhone screenshot)/art.
  • Auto save non-photos to specific folder based on what app was open when it was screenshot was taken.
    • Certain apps already do this (like Apollo), but this is not thorough across all of iOS/MacOS. Is there a system extension that can somehow tag which URL or application an image was taken from?
  • Find NSFW photos
  • Is there a way to see/expose what categories can be search for via visual search?
    • Im surprised I have not seen. Master list anywhere. /Users/redacted/Dropbox/Pictures/Photos Library.photoslibrary/database/search/searchMetadata.plist doesn’t seem definitive.
    • Can we see which machine learning tags have been applied to a photo? This has not worked: https://brattoo.com/propaganda/#photostagger
    • Is there a similar way to explore photo library databases like there is for iPhone backups (ie iExplorer/iMazing)?
Other:
  1. Detect how compressed/artifact-y an image is and compare to see if it is ‘better’ than the duplicate I am comparing to?
    • As mentioned above, I have many different copies of photos that have subtle differences in compression. It is visible if looking side by side, and I would like to have the one with less compression. A smaller file size does not necessarily mean lower quality though, as it really depends how the JPEG was saved.
  2. Losslessly rotate JPEG images
    • I have only done small tests here, but it does look like Photos.app re-encodes the file on rotate, which I don’t want as this would be a slight loss in quality.
  3. AI upscale/remove artifacts of older Facebook images
    • I am messing with the Topaz Gigapixel AI trial to clean up the artifacts on old, low res Facebook images, but was wondering if there is was another option here?
  4. What are some of the best/most powerful photos plugins?
    • App Store is limited in it’s search for these. I am already using PhotoSweeper X and Duplicate File Finder
    • All de-dupe apps appear to be limited to 128x128 bitmaps for comparing. Are there any that go beyond this? I am getting a lot of visually similar images giving false positives.
  5. I have a few thousand Facebook photos (this was a challenge on it's own; used Album Downloader and Tagged Photo Exporter for Facebook) that unfortunately don't have their correct capture dates (it is stamped as the day I downloaded them). Thankfully, they all have their original name (formatted as 12539_159253124087251_2115393_n), so I am attempting to script a way of calling https://www.facebook.com/photo/?fbid=, downloading the page, scraping it for date info, then adding that metadata to the photos using exif.sh
Thank you!
 
Last edited:
No one is going to read this "wall of text".
You'd do better by breaking down your questions into separate posts.
 
No one is going to read this "wall of text".
You'd do better by breaking down your questions into separate posts.
Was considering breaking this down into multiple questions. I'll do so/may delete this original post.

Had osxphotos (GitHub) (Link to Documentation) suggested to me on another site, which may be the solution to a lot of my questions above; albeit with a lot of scripting required. Can't believe I had never heard of this before.
 
Keeping all my photos in a library where you can't access them as files (unless importing via reference) makes me nervous. The larger it is the more problematic it might be in doing a restore given the complex Photos library structures.

If you import via reference you will always be using the original files. You can just leave them in their original folders and then create albums in Photos as needed. The metadata in the files therefore is not touched. The downside is that you lose iCloud photo services.

I use Lightroom since iCloud wouldn't work as the library is too large (~200K, ~4.5 TB). All management is done via the filesystem so you have access to the files so you can rename them, move them, etc. Unfortunately you have to do this in Lightroom otherwise the pointers get messed up. You could always delete the photos from the library (not from disk) and re-import them but you would then lose all of the metadata not in the original file.

No one is going to read this "wall of text".

Read it. You can compare files in Lightroom and check their file sizes, dimensions, etc. to see which is the better image, but it is on a file by file basis. Otherwise too many issues to address in your post. Can't say how many of them could be addressed by Lightroom, but Lightroom has tons of add-ons to do special things. Many more than Photos I would expect. Much more flexible.
 
  • Like
Reactions: Makisupa Policeman
You could try using something like Power Photos. I haven’t personally tried it yet but I’ve heard a lot of good things, and I think it’s more powerful and gives you more control over how to dedupe. It’s expensive but worth it for a monumental task like this.

I have a similar problem OP. Back in the days of pre-iCloud when I was on PC I sloppily “backed up” all of my photos to the computer using the windows import tool which doesn’t recognize duplicate files so I have tons of duplicate photo backups going back a decade, plus memes, album artwork, screenshots and low-res stuff from Facebook and instagram. It’s been a nightmare to organize.

I’m disappointed to learn that the AI in Ventura/iOS 16 doesn’t always grab the highest quality version of a photo. But that Topaz App looks intriguing.

I’m probably going to do a combo of Power Photos/Ventura AI to dedupe my own library when I finally work up to courage to take it on. 😅

Edit: added link to the Fatcat Software website for PP
 
So ime so far with ios16 (I reluctantly upgraded to take advantage of this feature—so far so good though, no bugs) the duplicate album always uses the highest resolution as the master when merging, but sometimes uses the newest metadata—specifically it seems to want to keep the copy with the most recent date. So it works very well if you have duplicate photos with similar metadata but you’ll have to be careful if you’re trying to keep the oldest copy of an image. YMMV of course, that’s just how it’s worked for me so far.
 
Ventura Photos.app Duplicates Finder:

Somewhat related, but I believe I have found a solution for triggering the duplicate photo analysis:

1. Close the photos app
2. Shut down your computer (make sure to not have it reopen apps on startup)
3. Turn computer back on
4. Login, but don't open any apps
5. Let the computer sit overnight (make sure to not have it go to sleep)

Doing this, I went from 275 duplicates found to 12,214 found.

You can also see that photolibraryd, photoanalysisd, and cloudphotod have used a considerable amount of CPU during this time.

image.png


Not sure if this will work for everyone, but I have tested this in a more vague sense before, and has yielded similar results.
 
  • Like
Reactions: Makisupa Policeman
So ime so far with ios16 (I reluctantly upgraded to take advantage of this feature—so far so good though, no bugs) the duplicate album always uses the highest resolution as the master when merging, but sometimes uses the newest metadata—specifically it seems to want to keep the copy with the most recent date. So it works very well if you have duplicate photos with similar metadata but you’ll have to be careful if you’re trying to keep the oldest copy of an image. YMMV of course, that’s just how it’s worked for me so far.
I have seen the same results in regards to it wanting to keep the newest metadata. Here is what I plan on testing today:

  1. If a photo is set as a favorite, will it be selected as the actual photo that will be retained? (somewhat confirmed)
  2. If a photo is set as a favorite, will all of its metadata be considered better, and all of that will be kept?
  3. We can already see that newer versions of photos are prioritized. What happens if a newer-imported photo is imported, then backdated to an earlier date? Do we go off of the:
    1. Import Date?
    2. Captured Date?
    3. Original Date?
    4. Modified/Adjusted Date?
    5. Any of the various metadata fields that a program like exiftools can extract?
  4. If there is face data in the non-selected photo, and no face data in the selected photo, will photos transpose the face data from one photo to another?
Example of fields for a recently-taken HEIF file (found using exiftool):

Code:
File Modification Date/Time     : 2022:09:15 08:50:12-04:00
File Access Date/Time           : 2022:09:18 12:35:21-04:00
File Inode Change Date/Time     : 2022:09:18 12:35:20-04:00
Modify Date                     : 2022:09:14 17:56:40
Date/Time Original              : 2022:09:14 17:56:40
Create Date                     : 2022:09:14 17:56:40
Create Date                     : 2022:09:14 17:56:40.826-04:00
Date/Time Original              : 2022:09:14 17:56:40.826-04:00
Modify Date                     : 2022:09:14 17:56:40-04:00
 
Power Photos seems to have advantages and disadvantages. It gives a lot more control over what to decide is a duplicate or not, but I discovered an issue with HDR photos I posted about here.
 
I have seen the same results in regards to it wanting to keep the newest metadata. Here is what I plan on testing today:

  1. If a photo is set as a favorite, will it be selected as the actual photo that will be retained? (somewhat confirmed)
  2. If a photo is set as a favorite, will all of its metadata be considered better, and all of that will be kept?
  3. We can already see that newer versions of photos are prioritized. What happens if a newer-imported photo is imported, then backdated to an earlier date? Do we go off of the:
    1. Import Date?
    2. Captured Date?
    3. Original Date?
    4. Modified/Adjusted Date?
    5. Any of the various metadata fields that a program like exiftools can extract?
  4. If there is face data in the non-selected photo, and no face data in the selected photo, will photos transpose the face data from one photo to another?
Example of fields for a recently-taken HEIF file (found using exiftool):

Code:
File Modification Date/Time     : 2022:09:15 08:50:12-04:00
File Access Date/Time           : 2022:09:18 12:35:21-04:00
File Inode Change Date/Time     : 2022:09:18 12:35:20-04:00
Modify Date                     : 2022:09:14 17:56:40
Date/Time Original              : 2022:09:14 17:56:40
Create Date                     : 2022:09:14 17:56:40
Create Date                     : 2022:09:14 17:56:40.826-04:00
Date/Time Original              : 2022:09:14 17:56:40.826-04:00
Modify Date                     : 2022:09:14 17:56:40-04:00
1. Seems to be my experience too.
2. I’m not sure, this is handled by Power Photos much more reliably i think
3. I think it’s using captured date, but I’ll have to test this
4. I’m pretty certain it would
 
Keeping all my photos in a library where you can't access them as files (unless importing via reference) makes me nervous. The larger it is the more problematic it might be in doing a restore given the complex Photos library structures.

If you import via reference you will always be using the original files. You can just leave them in their original folders and then create albums in Photos as needed. The metadata in the files therefore is not touched. The downside is that you lose iCloud photo services.

I use Lightroom since iCloud wouldn't work as the library is too large (~200K, ~4.5 TB). All management is done via the filesystem so you have access to the files so you can rename them, move them, etc. Unfortunately you have to do this in Lightroom otherwise the pointers get messed up. You could always delete the photos from the library (not from disk) and re-import them but you would then lose all of the metadata not in the original file.



Read it. You can compare files in Lightroom and check their file sizes, dimensions, etc. to see which is the better image, but it is on a file by file basis. Otherwise too many issues to address in your post. Can't say how many of them could be addressed by Lightroom, but Lightroom has tons of add-ons to do special things. Many more than Photos I would expect. Much more flexible.
This is probably a good idea if you’re extra paranoid, but I can’t get past the convenience of iCloud photos. It’s just so nice having everything synced in nearly real time, and then if you’re using set-and-forget systems like Time Machine or CCC it’s all backed up as well.

I suppose another alternative I’ve considered (because I understand the apprehension of trusting all of my pictures to a container library like Apple Photos) is to periodically export your library to a large external drive. You can keep things organized by date, and as long as you choose the option to include the .xml files all metadata should be retained.
 
  • Like
Reactions: MikeDr206
How’s the thread starter getting on? I’ve just moved to ios16 but Mac is on big sur… I’m not sure if it’s related to the upgrade but I’ve just noticed by library is a mess… dates are all wrong and lots of duplicates plus missing photos..

I keep another library on lightroom CC which is still in perfect order so I can’t help but wonder if my iCloud Photos library has maybe lost its way..

I have 40k + photos so it’s quite impossible to find out what’s gone missing.. I’m half tempted to delete them all and then start adding them again into Mac photos…

Only problem with that is it takes forever to upload so it will be a very slow process 😭
 
How’s the thread starter getting on? I’ve just moved to ios16 but Mac is on big sur… I’m not sure if it’s related to the upgrade but I’ve just noticed by library is a mess… dates are all wrong and lots of duplicates plus missing photos..

I keep another library on lightroom CC which is still in perfect order so I can’t help but wonder if my iCloud Photos library has maybe lost its way..

I have 40k + photos so it’s quite impossible to find out what’s gone missing.. I’m half tempted to delete them all and then start adding them again into Mac photos…

Only problem with that is it takes forever to upload so it will be a very slow process 😭
I'm still here and have a lot to report on (mores on using different tools/how to download and tag Facebook photos). Unfortunately, due to time constraints/life -> I don't have a fully conclusive word on how the duplicate processing/merging works.

Chances are that RhetTbull (who created osxphotos) will reverse engineer this duplicate processor before I do.

Expect more info on my experience/suggestions sometime this week.
 
  • Like
Reactions: JamesMay82
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.