Hey SIRI: haven’t I seen that image before?
The way the AI engineers prevent that problem is actualy very simple. They called "regularization" which is a taribly non-descriptive term.
What they really do is i=use a VERY "lossy" kind of compression. They collect a trillion "tons" of data and store the parameters in a billion-ton box. At first you think "Well, that means they are saving space." But no. More importantly, the huge database of input data is GONE, GONE GONE, so there is no chance of the AI spitting out exact copies of the input data. The input is been compressed away.
Compression always removes redundancy and saves the "essence" of the input, but not an exact copy. JPG photos and MP3 audio have only about 3:1 or 6:1 compression so the stored data still looks or sounds like the input data. But the LMM (AI) is doing
thousands to one compression, so what gets saved are general rules, trends, and ideas.
AI engineers don't really think in terms of compression, they are just searching a grident but really what is happening is a search for a way to keep the most relevant 0.01 percent of the input data.
In fact the AI engineers have a word for it when the parameters are enough to capture all of the input. They call it "over training" and the textbooks are full of ways to prevent that. Mostly this involves randomly throwing some data away.
So you should expect the images to look very much like the ones it was training on but only because it is a composite of many of them. I'd expect only a strong sematic and stylistic similarity but never a copy. Copies should be impossible.
Think of an artist who studied in Europe. His work might be in a style a little like you have seen before and the subjects might be things you have seen before and he would use colors that you have seen before but each work would be new. This is what AI's will do. Don't expect radically new innovations but also don't expect copies.