I realize this is speculative as we don't know for sure. But let's assume the 3.5x optical zoom (alongside 48mp) is true. To a camera layman, would that be better or worse than the straight 5x on the 16 Pro at 12mp?
Short answer: similar, assuming the new phone gets a moderately bigger sensor (which seems likely to me).
Longer answer:
It’s not trivial to compare, because sensor size matters more than pixel count, and we don’t yet know what’s going on with the sensor size. The new camera bar may by a response to using a larger tele sensor (requiring a larger tetraprism lens that would no longer fit in the old camera bump).
A larger sensor produces a cleaner (less noisy, more detailed, more croppable) image.
But just taking the 3.5× zoom with 48 megapixels: if you crop the image to 5× equivalent like the 16 Pro, you throw away approximately half the pixels to put you at around 24 megapixels. Clearly that’s still higher than the 16 Pro’s 12 megapixels, but the 48 million pixels are in a different layout (quad-Bayer) than the old phone, which compromises their detail rendition. A 24-megapixel Quad-bayer sensor should still produce slightly more detailed images than a 12-megapixel sensor – if similarly sized. But of course when you crop from 3.5× to 5× you not only throw away half the pixels but half the sensor area (light captured).
So if the new sensor is about twice the area of the old one, and the lens f-numbers are similar, the new tele camera will do better in every way at a digitally cropped 5× than the 16 Pro with its optical 5×. Cleaner, more detailed, more background blur.
If the sensor is only about 50% bigger, image detail would be a wash: sometimes the 16 Pro would win, sometimes the 17 Pro.
If the sensor doesn’t get much larger at all, the old 16 Pro will show more detail in most cases despite a pixel-count advantage to the 17 Pro – the only exception being high-contrast detail in very bright light, e.g. a printed page in sunlight.
My own guess is that Apple will have targeted at least equal 5× (and beyond) performance, to satisfy customers who use the tele for max ‘reach’, e.g. to capture a plane at an airshow, their dog in the garden, or your kids at 20–50 yards. So I see regression as very unlikely.