I bet the patent is longer than that one paragraph
Apple won on two claims, 1 and 8. Claim 1 and 8 of '647 reads as follows:
1) A computer-based system for detecting structures in data and performing actions on detected structures, comprising: an input device for receiving data; an output device for presenting the data; a memory storing information including program routines including an analyzer server for detecting structures in the data, and for linking actions to the detected structures; a user interface enabling the selection of a detected structure and a linked action; and an action processor for performing the selected action linked to the selected structure; and a processing unit coupled to the input device, the output device, and the memory for controlling the execution of the program routines.
and,
8) The system recited in claim 1, wherein the user interface highlights detected structures.
As for specificity, they're really only specific in the concept, the idea:
The background for the patent is the need for a system "…that identifies structures, associate candidate action to the structures, enables selection of an action and automatically performs the selected action on the structure". They point at prior art, and argue that while there are systems that make e.g. phone numbers actionable, they do are limited in that they do not recognize the detected data as a telephone number per se, and are limited in that they don't allow for multiple actions.
Against this backdrop, Apple describe their invention as follows: "The present invention overcomes the limitations and deficiencies of previous systems with a system that identifies structures in computer data, associates candidate actions with each detected structure, enables the selection of an action, and automatically performs the selected action on the identified structure."
Then (absurdly) they move on to state that:
"it will be appreciated that the system may operate on recognizable patterns for text, pictures, tablets, graphs, voice, etc. So long as a pattern is recognizable the system will operate on it."
Meaning Apple basically filed a patent on pretty much anything and everything "SIRI-esque" (in the broad sense). The patent does not however go into specifics on HOW this will be achieved (the narrow sense); that is, how to solve the semantic interpretation of text, pictures, tablets, graphs, voice, etc.
Basically, what they're proposing is that text is parsed for "structures" which are then mapped to "structure-specific actions". A great idea (while hardly novel), without a great implementation (had they cracked the implementation, i would've accepted the patent on the spot. That would've been the invention of the century - however, we aren't even close to cracking that even decades later).
All they really say is that they have a CPU, I/O, memory, "a program to identify structures in a document and perform selected computer-based actions on the identified structures". Further, the program is said to include subroutines that include a "analyzer server, an application program interface, a user interface and an action processor". The analyzer server (AS) uses patterns to discover structures - how, is left out (and the idea itself is not novel). When the AS finds a pattern, it links a set of actionable actions (any action really). Granted, the patent is said to include methods through which "the document is analyzed using a pattern to identify corresponding structures". This method is not, however, specified (its an idea, not an implementation).
This, in essence, is the problem with the patent, and software patents in general. They tend to describe ideas, not implementations. Ideas should not, for obvious reasons, be patentable; the hard part is always implementing ideas - making them work. Abstract descriptions do not work. They're just functional decompositions of ideas, no more, no less. For example, i could easily write a patent mimicking that of Apple, that would be a patent of the semantic web. I am not, however, anywhere near able to create the semantic web (nor is anyone else at this point, really, but we're getting there one step at the time).
One thing i do not get. Apple claim that their implementation works by the program itself having a UI*; that is, the program itself runs concurrently to the application in which the document is displayed, with its own UI. As such, any program that instead relies on the native UI of the Application (e.g. Mail) should be home free; i.e. any solution that is not based on running an overlay of document application in question. However, as i can't imagine that HTC is actually doing the UI thing, and the patent evidently passed through court, i could be misreading things. Like i said, i don't get it.... if it wasn't for the following "cover all bases"-lines:
"Application program interface then transmits this location information to user interface, which highlights the detected structures although other presentation mechanisms can be used"
Further showing how broad, vague and non-implementation based the patent is (in the broad sense).
Apple then go on to show how the patent can be used for audio etc. (without explaining specifically how - besides pointing out that patterns will be used). They also cover the base of non-visual interfaces in term of the UI speaking actions back to the user.
As far as the Analyzer server goes, it is said to include a) a parser b) a grammar file. Alternatively or additionally a fast string search function or other function could be used (basically, covering all bases - once more. The opposite of being specific). Parser uses grammar to parse the file (i.e. to find structures). When structures are spotted, actions are associated (furthermore describing the functional decomposition of the idea, rather than the implementation of said idea; i.e. how do you go from grammar to structure?) .
Then, something confusing shows up again (as Apple becomes specific for a brief while). The patent states that "the parser retrieves from grammar file pointers attached to the grammar and attaches the same pointers to the identified structure". As such, a solution that implements a relational table between grammar and action, instead of coupling pointers within the grammar file should be fine. Maybe this is the case - if so, the patent is essentially worthless.
Now, Apple then shows some honest examples (Fig. 4). Here, for example, "Phone number:" is seen as grammar for "phone number". Actions associated are "Call #" and "Put in electronic telephone book". This shows the true nature of the patent; that is, what Apple really had implemented (think classic Voice command, rather than SIRI). This is also clear by Fig. 5. in the patent - showing a heavily marked up document in the form of:
This is my new
phone number: (415) 555-1234
address: 1 Hilly Street… and so on.
Clearly, its no where near as refined as what we have to day (meaning current implementations constitute substantial improvements of the original patent, and thus should not be seen as infringing). Thus, in my opinion it stands clear that Apples current claim goes well beyond the original point (i.e., what they actually sought to patent). Further, the only way to make the patent apply is to shred every single specific element, relying on broad, vague all-encompassing visionary ideas; things that should not be patentable.
* Maybe I'm tired here.
TL;DR: In general, the patent is vague, non-implementory in nature, and idea-driven. The few specificities included in the patent show a technology far from todays smart parsers. Further, if not for "cover all bases"-carte blanches in the filing, the patent wouldn't hold at all. Basically, the invention patented is on the level of recognizing that the text after "telephone number" should be parsed as a telephone number and associated with "telephone number actions". Thats it. No more, no less.