FaceTime would seem like an ideal app here. The other rumored feature sounds like it would be awkward to use, or is it me?
I don't know that that would be practical.
Visual Intelligence just requires a sliver of visual data. It would be quite distorted, but the Watch, thanks to compass, accelerometer, gyroscope can figure out its orientation and recreate a low-resolution of the original image (as perceived by a human), which should be enough for a "Visual Intelligence" to find some information. For example, add location data (which the Watch also has), and the system should be able to ask itself "I'm at this place, and the user is looking in this direction; oh, that seems to be a hair salon; what's the name, when is it open, what are the reviews, has the user been here before?"
Whereas, for FaceTime, you'd still need to un-distort the image, but the result would be low-res. I'm guessing for a human face, it just isn't good enough. Think about how you hold a watch. Right now as I'm typing this, the display doesn't even fully face me. On top of that, it wiggles around a little due to finger movement. You'd have to correct the perspective, and you'd be working, as a result, with a low amount of image data. And probably not flattering image data.