actually, this is the very problem
Light doesn't 'miniatiarize'
yes, you can shrink the sensors and lenses and focal length down, but as these decrease, so does the ammount of light that psses through the optics.
this is the very argument that precludes smartphone sensors from achieving "SLR" like results.
Photography is the art of light. its very nature depends on light. the more light that hits the sensor, the better it can be at understanding what that light represents, the colours it carries and all that information
whhen you miniaturize that, those sensors have to become smaller. and as the MPX count has gone up, each 'dot' has to become smaller. that light catching bucket has less chance of catching light.
the counter to this is driving up the ISO on each bucket. but when there's just not enough light at all getting to the sensor, it picks up nothing. This is what noise essentially is. nothing hitting the sensor in the picture.
so far, physics is a big limitation to how much miniaturization can be done to cell phone cameras. nobody has yet discovered a way of multiplying the light inside a camera to increase it beyond what the glass and it's opening allows.
and there is some very complex math out there (well beyond my head) that proves this if you'd like to find it
http://www.cambridgeincolour.com/tutorials.htm
an awesome site that has tonnes of relevant info
The "there ain't enough photons" argument is based on several assumptions - most of which have to do with whatever the current state of the art happens to be. As far as I know, we are not yet at the quantum mechanical limits.
I happily concede that bigger is indeed better - all other things being equal, larger photo sites collect more photons. In some circles, that's called "brute force engineering." If you took two multi-lens/multi-sensor cameras, one using phone-sized sensors, the other 35mm-sized, the larger sensor would undoubtedly win, for the reasons you present.
But the complete statement is, "All things being equal (which they never are)..."
The point here is that the multi-lens/multi-sensor approach allows signal processing techniques that are unavailable in a single-lens/single-sensor configuration. The only question we can really ask is, "Are those techniques sufficient to overcome the normal shortcomings of that small sensor?"
Now, imaging noise is not the "absence of light." (see Wikipedia:
http://en.wikipedia.org/wiki/Image_noise ). A sensor doesn't start generating noise in a zero-photon environment. It's more accurate to say that noise is
more apparent in the absence of a masking signal.
Noise in an image sensor has a variety of causes. "Shot noise"
is related to a
shortage of photons, but it's more specifically due to (per Wikipedia), "statistical quantum fluctuations, that is, variation in the number of photons sensed at a given exposure level." And yes, the fewer photons there are, the greater the variation. Regardless, it's a
random fluctuation - multiple sensors focused on the same scene will generate different noise patterns/placements. Compare the images, and the noise component becomes easier to identify and remove. The other noise components are not related to the number of photons hitting the sensor - they're part of the noise floor, artifacts that are present regardless of signal level.
It's the classic analog signal to noise ratio problem (and an imaging sensor is an analog device). The electronics generate constant, low-level noise as a matter of course. Provide enough signal, and that noise is obscured. In low-signal situations, the noise becomes apparent. Push the noise floor lower (such as cooling the sensor to reduce thermal noise), or make it possible to separate the noise component from the signal, and the game changes.
An imaging sensor cannot better "understand what that light represents." Leaving aside the anthropomorphism and romanticism, the sensor simply converts incoming photons to electrons. The light is either detectable with a reasonable amount of accuracy, or not. If you want the scene to be "intelligently analyzed" you run the output of the sensor through a computer.
The notion of "SLR quality" is not some sort of cosmological constant. It is a qualitative judgement, based on an ever-changing baseline. Yesterday's SLR quality is tomorrow's crap. We compare test scores and 100x enlargements and conveniently forget that, in many circumstances, the differences would not be discernible to the naked eye in a double-blind (excuse the term) test.
The practical test of photographic quality has always been, "To what degree can it be enlarged before the defects become perceptible?" Back when a 10x enlargement was the practical upper limit for an exhibition-quality print from a 35mm negative, "SLR quality" was nothing to be proud of. It was (and still is) just one point on a continuum.
So again, it's not about two sensors, one small, one large, going mano a mano. It's not about violating the laws of physics. It's about what comes out the far end, after signal processing.
A rough analogy is what happens with our own eyes. Unless blessed with perfect vision in both eyes (a situation I certainly don't enjoy), our ability to see improves when both eyes are open - the strengths of one eye compensate for the weaknesses of the other, thanks to the power of our brains.