In January, Microsoft introduced a new build of Windows 10, which it will be giving away for free for non-enterprise users running Windows 7 and Windows 8.1. One of the features on this update is a personal digital assistant that goes by the name Cortana.
It’s one of the most anticipated features of the new Windows 10, and I’ve started digging through patents at the USPTO to get some hints of what this might mean for us. An article published recently got me started, with the name, HereΓ’β¬β’s how to make the most of Cortana, the Windows 10 digital assistant.
You’ve likely seen Apple’s Personal Assistant Siri, which was featured on a number of celebrity enhanced advertisements, and you may have seen people writing about Google Now, which feeds you cards to give you information that it predicts you might need or want when that information becomes available. Cortana is Microsoft’s entry into the Personal Assistant field.
Cortana is supposedly “powered by Bing” and “developed for Windows Phone 8.1”, and it looks like an important feature in Windows 10. I’ve been having difficulties defining what “powered by Bing” actually means, except that it seems to imply that all of the questions asked to Cortana are answered by the Bing search engine.
From an April 17th article on the Microsoft research blog, Anticipating More from Cortana we learned more about the people who have been working on Cortana, including Larry Heck who started the “conversational-understanding (CU) personal-assistant” project in 2009.
When I read that, I went over the the United States Patent and Trademark Office site and started looking for patents that were co-invented by Larry Heck, filed in 2009 or later, and about personal assitants, to get an idea of some of the things that Cortana might be capable of. There were more than a couple.
Digital Assistants Learning from Each Other
The first one I ran across was a patent application by the name of Collaborative Learning through User Generated Knowledge (20140201629)
Its focus is upon having personal assistants helping to teach each other. It tells us that in its abstract:
A feedback loop is used by a central knowledge manager to obtain information from different users and deliver learned information to other users. Each user utilizes a personal assistant that learns from the user over time. The user may teach their personal assistant new knowledge through a natural user interface (NUI) and/or some other interface.
For example, a combination of a natural language dialog and other non-verbal modalities of expressing intent (gestures, touch, gaze, images/videos, spoken prosody, . . . ) may be used to interact with the personal assistant. As knowledge is learned, each personal assistant sends the newly learned knowledge back to the knowledge manager.
The knowledge obtained from the personal assistants is combined to form a collective intelligence. This collective intelligence is then transferred back to each of the individual personal assistants. In this way, the knowledge of one personal assistant benefits the other personal assistants through the feedback loop.
So, having phones that help each other learn, even though they are owned by different people, is kind of interesting. But, what kinds of things might they be teaching each other?
Request and Command Capabilities
Another patent that looks like it could involve Cortana combining what is being called “a request and a command.”
Here’s a description from that patent of this type of behavior:
For instance, a user may hold a juice can to a camera of a processing system and say “Add calories for this” to update a calorie log; the user may point the camera at a scene and say “Send this to Ken” to email an electronic image of the scene to Ken; the user may press or point at a portion of a display of the processing system in which a movie trailer is playing and say “What times is this showing tonight” to obtain a list of showtimes at cinemas near the user’s home, etc.
The granted patent is:
Satisfying specified intent(s) based on multimodal request(s) (8,788,269)
The abstract for the patent tells us:
Abstract
Techniques are described herein that are capable of satisfying specified intent(s) based on multimodal request(s).
A multimodal request is a request that includes at least one request of a first type and at least one request of a second type that is different from the first type.
Example types of request include but are not limited to a speech request, a text command, a tactile command, and a visual command. A determination is made that one or more entities in visual content are selected in accordance with an explicit scoping command from a user. In response, speech understanding functionality is automatically activated, and audio signals are automatically monitored for speech requests from the user to be processed using the speech understanding functionality.
So the examples above, for calculating calories for a can of juice and possibly adding those to a total count for a meal, or to email an image to someone else, are the kinds of “requests and commands” anticipated by this patent.
That sounds useful.
Maybe even more interesting might be how well Cortana might respond to natural language queries.
Translating Natural Language Utterances into Queries
Imagine that you ask your phone or your computer to find a restaurant of a certain type for you in a certain location which might have reservations available, using a phrase like “find me an Italian restaurant nearby with a table for two.”
That type of request for information isn’t the kind of thing that you ask a search engine these days to answer for you, at least using language like that. Your personal assistant could potentially perform a number of queries and tasks (such as locating a certain type of restaurant nearby, and checking for availability of seating).
This is another patent likely involving Cortana, and it tells us that this kind of query could also be covered:
[0014] Spoken language understanding (SLU) in human/machine spoken dialog systems is a process that aims to automatically identify a user’s goal-driven intents for a given domain. For example, the user’s intent may be to make a dinner reservation, with goals of: (a) locating a restaurant (b) in a particular area (c) with available reservations and (d) for a particular time and date. A query may be expressed in natural language, such as “find me an Italian restaurant nearby with a table for two,” and the SLU system may detect a top level domain as an initial classification. In the example query above, the domain may comprise “restaurants.”
The patent application is:
Translating Natural Language Utterances to Keyword Search Queries (20140059030)
The abstract tells us:
Natural language query translation may be provided. A statistical model may be trained to detect domains according to a plurality of query click log data. Upon receiving a natural language query, the statistical model may be used to translate the natural language query into an action. The action may then be performed and at least one result associated with performing the action may be provided.
Take-Aways
There are 21 pending and one granted Microsoft patents that have Larry Heck’s name on them as an inventor for Microsoft. Most of them look like they could potentially be used for the personal assistant Cortana, and provide some features that would make Cortana pretty interesting.
It’s difficult telling how much any of these patents have been developed, and put into use for Cortana. But they look like they might provide a challenge for both Apple’s Siri, and Google Now when it comes to which Personal Assistant might be the best of the bunch.
Nice update Bill very interesting stuff on where we are headed in regards to technology on the web, have you come across AlchemyApi, just curious I like some of it’s functions in assessing webpages and it’s content.
Apologies for forgetting to link to that post I shared via Gofish on twitter π
Hi Gerry,
Thank you. There is a lot of interesting stuff out there these days. I do like the alchemy API to use to learn more about web pages, too.
I just didn’t know what you were doing with your tweet, but thanks for bringing it up and apologizing – that’s much appreciated.
No problem it was a complete stupid mistake on my part I was trying out a new tool for the first time that takes images from web pages to share “with the link” tho I forgot the link π
All the best
Hi Gerry
I think we’ve all made mistakes like that one. Thanks for filling me in on the details, though. π
I’m the prototypical apple fanboy but darn, Cortana is definitely better at this point from what I’ve tested. (IMO)
Interesting to see where technology is headed. Thanks for the research Bill. Always a treat learning from you.
Hi Dennis,
Thanks. I haven’t had a chance yet to use Cortana, but I’m looking forward to getting it with the Windows upgrade. It does look very interesting.
Hey Bill very nice post, I was aware about the fact that Microsoft has launched a new build of Windows 10 but was not aware about something like Cortana I read about it first from your this post only. I am looking forward to use it.
Thanks, Ady,
I’m looking forward to having a chance to try out Cortana.
Interesting to determine where engineering is headed. Cheers for that study Statement. Usually a goody learning from you.
I feel like this is less novel than they think it is. I mean, doesn’t google’s direct answers also amalgamate information from multiple users? Doesn’t Siri (ugh)?
Eagerly waiting for Windows 10 release. Cortana is something everyone is waiting for. I will see how it would be once I upgrade windows on my PC.
Thank You
Hi Narendra,
I think a lot of people are excited about the upgrade to Windows 10. I hope it’s a good one.
Hi,
I have played with siri and got funny answers. But I would say cortana is cool
I recently upgraded windows 10 from windows 7 on my PC. The OS is really quick and tried Cortana as well. That’s amazing. I just loved it and for me Cortana is better than Siri. Thank You
Hi Arda,
Glad you liked the new operating system and how Cortana works.