That means during the capture window, per pixel there is less information being captured result in very coarse discretisation of data which can only be resolved through software afterwards. That means the higher the pixel count the more we are at the mercy of computational photography and the perception that has of what was seen rather than the actual data captured.
Nah that's not how it works. More pixels at the same sensor area results in *finer* discretisation of the captured data. If you have a 48MP shot, you can reproduce the image you'd have gotten with a 12MP sensor exactly, just by adding the intensity value of each subpixel in groups of 4 (which is what Apple has called 'Quad Pixels').
The photodiodes of the CMOS sensor in cameras work by 'translating' the number of received photons during a given time frame into an electrical signal. If you have a single 'big' pixel in a given area of the sensor, let's say a 2.44x2.44µm region, all the info you'd get would be something an electrical signal equivalent to saying something like
'136 photos were detected in this pixel'. If you had 4 pixels in the same 2.44x2.44µm region instead, you'd get four electrical signals equivalent to
'30, 45, 27 and 34 photons were detected'. You can recover the 'big pixel' number if you wanted to simply by adding those numbers -> 30 + 45 + 27 + 34 = 136 photons in the 2.44x2.44µm region. But instead of having a single data point, you have four, and you can infer things that you couldn't before (like the standard deviation of the distribution, for example).
That's SUPER useful for a lot of things that are done (and have always been done) in the built-in post-processing of digital images. It doesn’t mean that the post-processing will be heavier, it means that the post processing will be
better informed. In fact, the effect can be the opposite: a better informed noise reduction algorithm can preserve more of the fine detail of the image because it'll be better equipped to tell it appart from background noise.
Outlier values in pixels that are caused by undesired effects (cosmic rays, ionising particles, thermal noise…) can be detected more accurately the smaller the pixel size is (because many of those effects affect single pixels) rather that when they’re averaged with other, valid image data.
A very simple example of how you'd do that: you can median-average pixel values instead of mean-average them (which is what 'bigger pixels' essentially do). The median is a better estimator for the center value of a distribution that the mean, so you'd likely get less noise just by doing that.
The actual reason the pixels in CMOS sensors are not made as small as possible and bigger pixels are sometimes preferred is because there are parts of the photodiode surface (the edges) that can’t capture photons (this is mitigated by using microlensing to redirect photons to the center of each photodiode). So if you cram too many pixels in the same area, more photons are lost to the edges of the photodiodes (because there are more edges), which can’t detect them. This is a compromise between how much data you want and how sensible you want your sensor to be. But it’s a balance. Now technology has advanced to the point that you can recover more data while having minimal losses in light sensitivity.