Microsoft Bing, with Powerset Inside

Microsoft’s new search engine Bing has launched this week with some fanfare and excitement. One thing to watch for is how natural language search elements from Microsoft’s acquisition of Powerset will appear in this new version of Microsoft’s search.

Right now, you can get an idea of how Powerset works by itself in searching through Wikipedia articles, like this result for Albert Einstein.

Powerset’s blog was moved to a new address this week within the Bing community of blogs. A recent post on the blog hints at more information on how Powerset’s technology has used in Bing.

Chances are good that even more technology from Powerset will find its way into the search services from Microsoft. To get a flavor of what that technology has to offer, here are a number of the latest patent filings assigned to Powerset, with either some commentary from me on the applications or the abstracts or quotes from the documents:

Semi-Automatic Example-Based Induction of Semantic Translation Rules to Support Natural Language Search
Invented by Emmanuel Rayner, Richard Crouch, Hannah Copperman, Giovanni Lorenzo Thione, and Martin Henk Van den Berg
Assigned to Powerset, Inc.
US Patent Application 20090138454
Published May 28, 2009
Filed August 29, 2008

A natural language search engine attempts to answer questions such as “is a butterfly as small as a mouse?” A conventional search engine that looks for pages containing keywords would return a set of results that include words such as “butterfly,” “small,” and “mouse,” and wouldn’t answer the question.

The answer to the question has to do with the relative sizes of mice and butterflies. This patent application explores a method of understanding the question asked and providing a meaningful answer to it.

Efficiently Representing Word Sense Probabilities
Invented by Rion Snow, Giovanni Lorenzo Thione, Scott A. Waterman, Chad Walters, and Timothy Converse
Assigned to Powerset, Inc.
US Patent Application 20090094019
Published April 9, 2009
Filed August 29, 2008

Words can have different meanings or senses based upon how they are used. As an example, the word “print” can mean text that appears in a document, a picture made from an engraving, a copy of a movie on film, or the action of creating a document by printing it, amongst other meanings.

Word sense disambiguation is a process of identifying which sense of a word is being used when the word appears in a passage of text. In a semantically based search engine, providing seach results for a query that uses a word in the sense intended by a searcher would yield better search results. Indexing the many possible different senses of a term may require an immense amount of storage. This patent application addresses ways to reduce that requirement by giving different senses of words scores based upon the probability of which sense a word may be referred to in a query.

Natural Language Hypernym Weighting For Word Sense Disambiguation
Invented by Barney Pell, Rion Snow, and Scott A. Waterman
Assigned to Powerset, Inc.
US Patent Application 20090089047
Published April 2, 2009
Filed August 29, 2008

Words and phrases may have more than one meaning, which can be referred to as different senses of a word (or phrase). Word senses can have one or more hypernyms, which are broader, or more generic meanings. “Blue” is a hypernym for senses of the words “navy,” “aqua,” and “cyan.” Also, “color” is a hypernym for a sense of the word “blue.”

Understanding these hypernyms can result in better search results. Indexing different hypernyms for a word could also take up an immense amount of storage in a search index. Applying scores to the different possible hypernyms based upon a probability of which might be indicated in a query can reduce that requirement as well.

Coreference Resolution In An Ambiguity-Sensitive Natural Language Processing System
Inventd by Richard Crouch, Martin Henk Van den Berg, Franco Salvetti, Giovanni Lorenzo Thione, and David Ahn
Assigned to Powerset, Inc.
US Patent Application 20090076799
Published March 19, 2009
Filed August 29, 2008

There’s a nice example from this patent filing that illustrates how it works in understanding different words that may refer to the same person or place or thing, as “coreferences” and why that understanding might be important.

Someone searches for the term “Picasso painted.” One document that might be returned to that searcher contains the text, “Picasso was born in Malaga. He painted Guernica.” Another document has the sentence, “Picasso’s friend Matisse painted prolifically.” If all other things are equal, a conventional search engine might rank the second page higher than the first one because the words “Picasso” and “painted” are closer together. If it can be understood that the “he” in the first document refers to Picasso, it may be ranked higher and may be a more relevant result.

Emphasizing Search Results According to Conceptual Meaning
Invented by Barney Pell, Scott Prevost, Giovanni Lorenzo Thione, Brendan O’Connor, and Lukas Biewald
Assigned to Powerset, Inc.
US Patent Application 20090063472
Published March 5, 2009
Filed: August 29, 2008

The process described in this patent application is one that attempts to find the use of a query term or phrase within documents in the meaning intended by a searcher, show snippets from those documents, and highlight the use of those query terms, or terms that are semantically related to the query terms.

Identification of Semantic Relationships within Reported Speech
Invented by Richard S. Crouch, Martin Henk Van Den Berg, David Ahn, Olga Gurevich, Barney D. Pell, Livia Polanyi, Scott A. Prevost, and Lorenzo Thione
Assigned to Powerset, Inc.
US Patent Application 20090063426
Published March 5, 2009
Filed: August 29, 2008

Some terms that may be found in content share a semantic relationship, based upon things such as their location or topic, and may be determined based upon the meanings of those words and how they are used grammatically within the text of a document. Understanding those relationships can be helpful in responding to queries, like in the patent application listed above this one.

Indexing Role Hierarchies for Words in a Search Index
Invented by Martin Henk Van Den Berg, Richard S. Crouch, Giovanni L. Thione, and Chad P. Walters
Assigned to Powerset, Inc.
US Patent Application 20090063473
Published March 5, 2009
Filed August 29, 2008

Understanding how words related to each other in a document can mean that better search results are returned to searchers.

A conventional search engine faced with the query “who bought PeopleSoft” might return a document that contains the sentence, “J. Williams was an officer, who founded Vantive in the late 1990s, which was bought by PeopleSoft in 1999,” because it contains the keywords “who,” “bought,” and “PeopleSoft.” But it doesn’t answer the question. A semantic analysis of the query would show that a searcher wanted to know who the purchaser of peoplesoft was, and a semantic analysis of that example document would show that it wasn’t a qood match for the query since the page was about the acquisition of Vantive by peoplesoft.

Fact-Based Indexing for Natural Language Search
Invented by Martin Henk Van Den Berg, Daniel Babrow, Robert D. Cheslow, Barney D. Pell, Giovanni Lorenzo Thione, and Chad Walters
Assigned to Powerset, Inc.
US Patent Application 20090063550
Published March 5, 2009
Filed August 29, 2008

Facts are extracted from text in a way that can be used to index that text in a helpful way. For example, in the sentence, “”Mary washes a red tabby cat.” the following factual relationships may be identified:

  • agent (wash, Mary)
  • theme (wash, cat)
  • mod (cat, red)
  • mod (cat, tabby)

The agent is the person acting, or “washing.” The theme of the sentence is the washing of a cat. Modifiers about the cat indicate that it is “red,” and that it is a “tabby.”

If the sentence were changed slightly to “Mary washes her red tabby cat,” we would also be able to associate “Mary” with “her.”

Calculating Valence Of Expressions Within Documents For Searching A Document Index
Invented by Livia Polanyi, Martin Henk Van den Berg, and Barney Pell
Assigned to Powerset, Inc.
US Patent Application 20090077069
Published March 19, 2009
Filed August 29, 2008

Some phrases and sentences can be interpreted to be positive, negative, or neutral on a topic, person, object, or event. Being able to identify and distinquish between different sentiments within documents means that a search engine can return a mix of pages expressing different sentiments in response to a question such as “What do doctor’s think about Medicare reform?”

Browsing Knowledge on the Basis of Semantic Relations
Invented by Franco Salvetti, Giovanni Lorenzo Thione, Richard S. Crouch, David Ahn, Lukas Biewald, Brendan O’Connor, and Barney D. Pell
US Patent Application 20090070322
Published March 12, 2009
Filed: August 29, 2008

Abstract

Computer-readable media and computer systems for conducting semantic processes to facilitate navigation of search results that include sets of tuples representing facts associated with content of documents in response to queries for information. Content of documents is accessed and semantic structures are derived by distilling linguistic representations from the content. Groups of two or more related words, called tuples, are extracted from the documents or the semantic structures. Tuples can be stored at a tuple index. Representations of the relational tuples are displayed in addition to documents retrieved in response to a query.

Efficient Storage and Retrieval of Posting Lists
Invented by Chad Walters, Giovanni Lorenzo Thione, Barney Pell, Lukas Biewald, and Brendan O’Connor
Assigned to Powerset, Inc.
US Patent Application 20090132521
Published May 21, 2009
Filed August 29, 2008

Many types of search engine indexing algorithms utilize inverted indexes. An inverted index is a data structure that is utilized to store a mapping between terms and the location of the terms within a database, document, or set of documents. For instance, an inverted index may be utilized to store a mapping between words and World Wide Web (“Web”) pages in which the words are utilized. Data identifying the particular location at which each term appears within a document might also be stored in an inverted index. The list of documents in which a particular term appears is commonly referred to as a posting list.

Some types of indexing algorithms generate a separate entry in the inverted index for each semantic role that a term occurs in. This results in a separate posting list and a separate entry in the index to the posting lists, called the lexicon, for each term-role pair. For instance, one posting list may be created in the index for the word “dog” and the role “subject.” Another posting list may be created for the word “cake” and the role “object.” In order to identify documents where a dog is the subject and a cake is the object, such as for example where a dog is described as eating a cake, an intersection operation is performed between the two posting lists. Semantically based search engines may utilize this type of indexing and document retrieval.

The patent filing describes an approach for making the use of posting lists more efficient.

Iterators for Applying Term Occurrence-Level Constraints in Natural Language Searching
Invented by Giovanni Lorenzo Thione, Barney Pell, Chad Walters, and Richard Crouch
Assigned to Powerset, Inc.
US Patent Application 20090070298
Published March 12, 2009
Filed August 29, 2008

An index can support representing a large corpus of information so that the locations of words and phrases can be rapidly identified within the index. A traditional search engine may use keywords as search terms such that the index maps from keywords specified by a user to articles or documents where those keywords appear. The semantic index can represent the semantic meanings of words in addition to the words themselves.

Semantic relationships can be assigned to words during both content acquisition and user search. Queries against the semantic index can be based on not only words, but words in specific roles. The roles are those played by the word in the sentence or phrase as stored in the semantic index.

The semantic index can be considered an inverted index that is a rapidly searchable database whose entries are semantic words (i.e. word in a given role) with pointers to the documents, or web pages, on which those words occur. The semantic index can support hybrid indexing. Such hybrid indexing can combine features and functions of both keyword indexing and semantic indexing.

Checkpointing Iterators During Search
Invented by Chad Walters, Lukas Biewald, Nitay Joffe, and Andrew Alan James
Assigned to Powerset, Inc.
US Patent Application 20090070308
Published March 12, 2009
Filed August 29, 2008

Abstract

Tools and techniques are described herein for checkpointing iterators during search. These tools may provide methods that include instantiating iterators in response to a search request. The iterators include fixed state information that remains constant over a life of the iterator, and further include dynamic state information that is updated over the life of the iterator. The iterators traverse through postings lists in connection with performing the search request.

As the iterators traverse the posting lists, the iterators may update their dynamic state information. The iterators may then evaluate whether to create checkpoints, with the checkpoints including representations of the dynamic state information.

Share

21 thoughts on “Microsoft Bing, with Powerset Inside”

  1. Without commenting on the details of Powerset’s various technologies and abilities, I can say that several test queries of Microsoft’s Bing search engine did not impress me.

  2. I tried out the new Bing search the other day. It was kinda weird and I didnt really like it. I liked the landing page it was cool and web 2.0-ish but the results page sucked.

    Although I did like how on bing, I am on the first page for my primary keyword.

  3. Hi Portland Web Design,

    It’s funny, but I know a few people who really like the layout of Bing and the results that it’s been returning. It will be interesting to see how the public responds to Bing.

    I do think Bing is an improvement on Microsoft’s previous search engines, and I do suspect that it will evolve and improve over time. I think it’s a good thing for the other search engines to have some competition – it means that we all may benefit by seeing better search results from all of the search engines.

  4. Hi People Finder

    I’m not sure that Dogpile analogy is really a fair one, consider that Dogpile is a mashup of search results from other search engines rather than its own search engine with its own ranking system, and may not account for many of the ranking and reranking approaches that a singular search engine may develop on its own.

    One of the reasons why I spent some time with the Powerset patent filings was to get an idea of how Microsoft might be trying to do things differently. If Microsoft is using a machine learning approach that incorporates user behavior data into how results are ranked, how query refinements and suggestions are provided, and in other possible ways, then it’s possible that results may improve over time as the search engine is used by more searchers.

    During an interview last year with Udi Manber, a Google Vice President of Engineering, he told us that Google made at least 450 changes to their ranking algorithms in 2007. Chances are that they’ve probably made changes at a similar frequency since then. It wouldn’t be a surprise to have search engineers at Microsoft/Bing studying the quality of their results and trying to find ways to make those better on a similar scale.

    I think we need to wait and see how Bing evolves rather than writing it off so quickly.

  5. About PowerSet…seems to be great engine for extracting data about people. Thanks

  6. By the way I like Bing too :D I started using it for reviews its system is very powerful for that purpose.

  7. Search results are very different then Google. Good thing that I have noticed is it shows related searches on the top of left side bar and makes them more visible while Google at the bottom.

  8. Bing has taken many good features from other search engines like the categorization of Hakia, and so I am finding that it is bringing back a sense of wonder into my searches.

    I have been experimenting with various keywords in Bing & other engines, and I am finding that I am impressed about half of the time. Natural Language Search can be a great feature. Using semantic studies to improve search is a wonderful development (I like the use of lexemes as suggestions), but I am finding that results typing in a keyword alone, instead of a natural phrase is producing similar results from different engines. I think once users begin to use their own voice when searching, instead of attempting to find the perfect keyword, we may see Bing learning to produce better results across the board.

  9. Hi Finder Mind

    The Powerset search page is a nice way of finding information in Wikipedia about the many subjects that they cover. Will its use in Bing mean that more of the Web will use powerset technology? I think it might.

    The reviews, and the many different ways to sort them are pretty interesting. :)

  10. Hi Agra Indian,

    A researcher at Microsoft came up with an application that lets you compare search results from Google, Yahoo, and Bing without knowing which results came from which search engine. It’s an interesting way to compare some of the results, though it doesn’t include some of the different features like related searches and query refinements and blended search results. It’s at: http://blindsearch.fejus.com/

    I like having the related searches to the side rather than at the bottom of the results as well.

  11. Hi Frank,

    A very interesting point. I think you might be right – the way that people phrase their queries may have a significant impact upon the search results that they see in Bing, compared to the other search engines.

  12. Microsoft has been advertising Bing like no other! I’ve literally seen like 4 Bing commercials just while typing this haha. Overall I have to say I have been pretty impressed with the search engine in itself. The pictures and videos searching is extremely in depth.

    Hope to see good things from them in the near future.

    Ryan v.

  13. Hi Ryan,

    I’ve been seeing a lot of Bing commercials as well. They have a pretty substantial campaign going on to back the launch of their new search engine – I’ve heard that it was somewhere between 80-100 million dollars. If only more of that was spent on website development instead of television advertising, to beef up areas like their community pages. :(

    I’m hoping that they do well – the more choices we have for search, the better off everyone is.

  14. I’ve been pleasantly surprised by the popularity and initial penetration of Bing, it really has been leaping out at me in analytics. Here in South Africa Google rules supreme, and MSN outperformed YAHOO! because of the localised search option that YAHOO! hasn’t got, but Bing way outperforms MSN already.

    With the relationship between Microsoft and YAHOO! finally being formalised, I am really interested to see where this is headin. Along with Wolfram Alpha I feel that the seemingly unassaialble position Google held mere months ago is not so unassaialble any more.

    When you factor in the IBM patent that was being discussed in another post I visited today, then I’d say that the Search Engine Wars are going to staert hotting up!

  15. Hi Jacques,

    I’ve been seeing some growing numbers of visits from Bing in Analytics as well.

    I’ve seen a lot of press about Bing and Yahoo, and I’m wondering about how well they might (or might not) be able to complement each other. Both have some very interesting technology. Will they be able to bring out the best in each other, or the worst? I guess we wait and see.

  16. It looks like Bing has scared google into a corner. They’ve already announced the “caffeine” update which many seem to believe is in direct corelation with the release of Bing. In all honesty I doubt if Bing will make so much as a temporary dent in Google.

  17. Hi Blake,

    Matt Cutts seemed instrumental in getting news out about the Caffeine update. He did mention on his blog that the update wasn’t a reaction to something that any other company is doing, but rather is part of a natural evolution of moving forward at Google. As evidence for this, he claims that Caffeine has been something Google has been working on for a good number of months – starting well before Bing was released to the public.

    I think that’s probably true. Google has been pushing to improve on an ongoing basis. That’s part of what has made them successful in the past.

  18. I am really not a fan for Bing as a user – but the SEO I have applied to many sites puts me in position one for the search term I target – against being 30+ in G’s search engine. Bing is making a run to get their name out there with commercials and stuff … but even with Yahoo’s help they will fail .. I have also came to the conclusion that Bing most of the time ignores the “nofollow” tag..if not completely.

    Dave

  19. Hi Dave,

    It doesn’t hurt paying attention to what the different search engines are doing, and whether they are delivering visitors to your pages who are interested in what you have to offer. I’m not convinced that spending $100 million on TV ads is necessarily the best way to get people to visit your search engines, and return over and over and over.

    As for the “nofollow” tag, I think some people had been getting too caught up in using that to try to funnel link equity (or PageRank) to different pages of a site instead of trying to make the best pages they can. Why not build the best “About” page you can, that makes it more likely that people interested in learning about your site/business might come to your site, for instance, instead of using a rel=”nofollow” in links to that page, in some attempt to get other pages of a site to rank higher.

Comments are closed.