When someone types “George Washington” into a search box, they are probably more interested in the Revolutionary War general and President than some random George in Washington. A search for “Washington Hotels” is more likely looking for lodging in Washington than hotels named Washington. Searches for places with signs that say “Washington Slept Here” are probably not about hotels (and those searchers probably have too much time on their hands).
When words used in search queries can have more than one meaning, a search engine may provide better search results to searchers if the search engines can calculate a probability of the most likely meaning of that word. That’s the focus of a patent granted to Yahoo this past week:
System for determining probable meanings of inputted words
Invented by David Richardson-Bunbury, Soren Riise, Devesh Patel, Eugene H. Stipp, Paul J. Grealish
Assigned to Yahoo!
US Patent 7,681,147
Granted March 16, 2010
Filed December 13, 2005
Abstract
A system is disclosed for determining probable meanings of words. An input of a word is obtained. Probable meanings of the word may be determined in accordance with a prior probability of probable meanings of the word and a context frequency probability of probable meanings of the word.
Examples in the patent primarily focus upon place names, but the inventors listed in the patent tell us that the processes described could be used for other terms that could be interpreted more than one way. So, a jaguar could be a kind of animal, a car, or a NFL footballer from Jacksonville.
A search engine may attempt to calculate a probability that a search for “jaguar” may be intended to meet one of those meanings. If another term is added, those probabilities may be calculated differently based upon context. A search for “Jacksonville Jaguar” is more likely about someone playing football, while the odds are that a search for “Jaguar carburetor” isn’t.
A web search at Google for Jaguar brings back pictures of cars and cats. Same search at Yahoo shows a couple of images alongside snippets for pages, one of a feline in the wild, and one of a stylized feline in a logo for the automobile.
How might a search engine such as Yahoo (and possibly Bing if they acquire the rights to this patent), use statistical probabilities of meanings of words? The patent’s authors give us the following list on how the best estimate of the meaning of a word might be used in different ways:
- Web pages may be indexed to a search.
- News stories location may be plotted on a map.
- Geographically relevant advertisements may be placed on a web page.
- Enhanced statistics may be calculated for use in query analysis.
- Search result listings may be presented to the user in accordance with the probabilities.
- Ads may focus upon that meaning for pay-for-placement, cost-per-click, pay-per-call and pay-per-act type services.
Instead of attempting to match up queries with pages where those words may be keyword phrases that appear on those pages or in links to those pages, the search engine may rerank search results based upon probabilities that a searcher intended to see something related to one type of search rather than another.
So, someone with the last name “Ind” and the first name “Gary” could possibly have a personal web page that might rank highest on a search for “Gary Ind.” But, the search engine may calculate a higher probability that someone searching for “Gary Ind.” wants to see information about a City named Gary in the State Indiana, than the home page of Gary Ind. Based upon those probabilities, it might rerank search results for “Gary, Ind.” to show pages about the City first.
If you live in the City of Bath in the UK, and you’re in need of a plumber, you may still have problems finding what you’re looking for when you search for “Bath plumber” (Good luck to you). We’re told about the City of Springfield:
For example if there are thirty different places called “Springfield”, then thirty-one prior probabilities may be generated, one for each place plus one for the possibility that it is not a place at all.
The patent does provide a number of examples as well as some details on how probabilities might be calculated for different words used both alone, and within the context of other words. If you’re interested in how probabilities might be used to rerank search result, you may want to spend some time with this patent.
When someone searches for “Washington,” do they mean the State of Washington, the District of Columbia, a City named Washington, George, or something else completely? Probabilities, in addition to ranking signals based upon things such as relevance and quality and link analysis, may play a role in what pages show up where in search results.
This is a really great patent. For words which can mean a lot of things, maybe it’s up to the searcher to include a word clue on his/her search term, like make it more specific. When typing a search term on Google, there’s a drop down that appears that lists all probable things you might want to search. I think that can also help in refining searches.
Interesting to consider. I wonder if/how they may combine geographical factors into the probabilities. Does someone in Florida typing in “Jaguar” obtain the football results more often? My service provider routes me through Corpus Christi, so I show up as being in that city for some of these tracking programs, yet when I am testing keywords in searches, I frequently get local results for Pennsylvania. The idea of probabilities is great to improve results, but it would be nice to see how they can combine other factors to produce more relevant results.
Thanks for the patent dig up and analyze Bill.
This patent definitely helps given the fact that nowadays the way searchers search is getting more complicated, given the social and live status streams.
Came across the anatomy of a large-scale social search engine paper from Aardvark,stating the way searchers are searching are more conversation or question based as compared to the conventional keyword search couple of years ago.
Would be interesting to see how the search engines present the data other than the conventional blue hyperlinks.
I am continually impressed with the advances that are coming out for web. This is really going to impact search engine optimization. I am from a small town, and I think something like this will really help the little businesses get their message across. With the geographical factors, it seems like the local businesses have a better chance of being seen. I am excited to see how it comes about.
Hi Andrew,
I wonder if one of the best things that search engines could do to make it easier for searchers to find what they are looking for is to make search boxes much wider.
Chances are that if they do, people would type in longer queries that might make it easier for the search engines to get a better idea of the intent behind a search, and the context of words within queries.
The predictive suggestions that we see when that dropdown appears are often based upon histories of queries and query sessions, but I’m not quite sure that they completely follow the kind of logic that this patent contains. I so think there is some calculation of probabilities going on in those predictive search suggestions.
Hi Frank,
Let’s take your example and make the geographic regions a little larger. Imagine a search for “football,” done by someone in the US, someone in the UK, and someone in Australia. Chances are that the US searcher will get NFL results that the UK and the Australian searchers won’t. The Australians will likely see Australian Rules Football results, and the UK will see results for what people in the US refer to as soccer.
This patent really doesn’t address where someone is performing their search from, but there have been patents and whitepapers from the search engines where they describe how they might consider those kinds of factors. I’d guess that there’s some calculating of probabilities based upon whether or not someone in Florida searching for “jaguar” is more likely concerned about the Jacksonville team. I’m not sure though that this patent is trying to cover that situation. It is something that I did ask myself though as I was reading through it.
Hi Deric,
Real time search incorporating things like Microblogging tools like Twitter into a search engine’s index does present some serious challenges – indexing short content without much in the way of links to and from it. We have seen how Google is attempting to display tweets and other microblogging content, in sections of a page that automatically scroll and can be paused. That’s a serious departure fom the old ten blue links. I’d live to see a study on how often people select something from that feature when it’s displayed.
Hi Ryan,
I agree with you that if a search engine can be smarter about which place is which when a geographic reference is made in a query, that it can benefit small businesses and small towns. I’m excited about it, too.
A clever patent that would likely be useful in most cases. Although it may be frustrating at other times and may actually have an adverse affect on the results. I may be wanting results for the less likely term and have to put more in to get the results I need. I suppose we will see if people get better results by the search market share figures. Interesting…
Hi Lee,
It seems like a good approach to me as well, though I agree with you that it might present some problems sometimes when people type in queries that could possibly be interpreted as meaning something other than what a searcher intended. If the worse that happens is that you have to revise your query and possibly make it a little more specific, that might not be a bad thing.
One thing that bothers me with Google’s local search is that if I search for one place in one location, and then try to search for another place in a different location, Google will sometimes try to continue to show me information about the first location. likely based upon a probability that my second query is related to my first one. Search engines basing the results they show upon probabilities can be useful, but sometimes they can create frustrating results as well.
Ive noticed some changes in the search engines recently. Local search actually gives me completely different results at home vs at my office which is only about 20 minutes away
Hi Jason,
About five years ago, when I lived in Delaware and worked about 30 minutes away in Maryland, about 30 minutes away, I would see some very different results as well. Some of them probably had something to do with the different locations I was searching from, but others may have had to do with me accessing a different data center at either location. I don’t know if either are the cause of the differences that you’re seeing, but they are a possibility.
I wonder how this probability search will interact with personalized results. It seems to me that if they are using probabilities it makes sense to calculate based on what they know of your search behavior.
Hi David,
Interesting points.
Personalized search probably is better if it uses a probability based approach, too. It attempts to make a best quess on what you might want to see based upon past searching and browsing history and other information that it has collected about you.
The probabilities described in this patent filing are also aimed at making it more likely that someone searching for something finds what they might want to locate, but it’s trying to do so based upon both information found on how language is used on the Web, and in an analysis of aggregated query information and searching and browsing behavior.
Some combination of those approaches might work out well – as you called it, an “interaction” between the two.
Someone who might be a programmer who tends to look for information about java programming should probably be shown pages about the programming language, but they may have a present interest in the island or the drink, should be shown a diverse set of search results that include those as well, based upon some probability that they may be interested in something other than programming at the moment.
@jason: You dont use Chrome at home, by any chance? And perhaps Firefox or Internet Explorer at work? I`ve seen something strange when using Chrome, that being that the site I click the most after searching for any given phrase actually will rank better and better the more I search for it. Am I just paranoid or is this actually the case? Because a while back I worked on ranking with a site, checking my positions in Chrome and sometimes clicking on my own site. I finally got to the first search result page, and after a while I took the top position – at least I thought so, but when checking my ranking in FF and IE I wasn`t on the first page at all; in Chrome I was on the wery first place.
That is pure logic behind all that. It must have been difficult to programme it. As it is with every invention. Yet at last, nobody truly knows what human brain is apt to 🙂
That’s why personalized search results evolved in Google? So, how do we de-personalized our way of searching?
@wczasy – I agree, it uses great mind power and skills! lol.
Hi Orville,
I’m not sure that we ever can, now that the search engines have been incorporiating personalization and customizations of search results. They want to show us what they think we mean with our searches rather than just providing a list of pages that include the keywords that we enter in our queries.
Actually, you can easily de-personalize your Google search results!
First of all, make sure you haven’t logged in. Make sure you signed out of your Google account.
Visit the Google search engine and type in some search term. You’ll see a list appear as usual. You can see the View Customizations link on the right side above the search results. Click that link. Now you have all the personalization AND de-personalization options right in front of you!
Hi Martijn,
You can see less impacts of personalization by taking the steps that you outlined, but there are still likely some customizations that you have no control over. For instance, Google will still likely bias the search results you see based upon which country and which language and which location it thinks you prefer. It will still likely expand queries in some ways based upon aggregated user-data, to do things like showing spelling corrections and synonyms for results. It may still show you different results based upon your location, the kind of device you’re using to connect to the Web, and in other ways as well.
This post is sooo deep. Context is everything. Sometimes I use Google Wheel, Quinutra or keyword density analyzers to help me determine which terms to make sure I add to support my target keywords. VERY GOOD post. (As always)
Hi James,
Thank you. Google Wheel and quintura provide some interesting visualizations of terms that might be related. I’m less of a fan of keyword density analyzers – they’ve always seem to me to be tools that software makers created some folklore around about how pages are ranked by search engines.
Google Algorithm for SEO improves from time to time and for me the way they sort page rankings is more accurate and more reliable.
Hi Mikaela,
The search engines definitely are aiming to improve the results that they show to searchers, but face some interesting challenges.
For instance, many of the words or phrases that you search for could potentially have more than one meaning, and it may be difficult to decide which results to show at the top for the different meanings. Case in point, someone searching for “java” may mean the software, the island, or the coffee. As a search engine, which pages do you show searchers first? 🙂
Another problem that we are seeing some interesting approaches to from the search engines is that sometimes a synonym for a query term might provide a better search result than the term a searcher actually used. Google also looks like they are trying to address this problem. It is challenging, though.
This type of patent would help tremendously, in my opinion. Not only from a searcher’s standpoint but also the webmaster so they target their market better. Plus Google would like it since they can make more ad dollars off of more targeted queries. 🙂
Thanks for the info.
Kind regards,
Jason
Hi Jason,
I think it would be helpful as well. Chances are it would enable searchers to find pages that were more likely what they were interested in, and would enable the search engine to show more relevant advertising as well.
As to the discussion about Googles personalization, I can add this:
I have experienced, that searching with the same phrases from my home computer and the one at my work, gives slightly different results. This is after clicking the de-activation of personalization, not being logged in to any Google accounts and with cookies cleared.
I expect the difference that must be based on IP-nr, but a bit strange as the distance between these to computers is no more than 5 miles.
That is really interesting about what “Per H.” commented. I tried the same thing myself but the results were no different. Although my work is more than 5 miles away. Hmmm…..
Thanks Bill!
Hi Per and Jason,
I’ve experienced something similar between work and home computers, with a distance of about 14 miles at one job, and a 22 mile commute at another job. One of those jobs was in another state, and it definitely was giving me results from a different data center, but I would see different results at work 14 miles away as well, and that one was likely using the same data center.
Aren’t this already the nature of Google search?
Hi Ryan,
The patent that I wrote about above is from Yahoo rather than Google, but the problem is the same for both search engines. Both of them want to provide the best answers possible, and face the challenge that they often only have a handful or less words supplied to them in a query by a searcher.
It’s much easier to just show searchers documents where the words from a search query either appear in the documents, in links to the documents, or both. It’s harder when those words may have more than one meaning, and the person searching is possibly more interested in one meaning of that word or another.
It’s a problem that Google and the other search engines are trying to solve.
Hi Bill,
Thanks for the awesome article as usual.
I think It would make things easier and hopefully we’ll get better results than before. I just wonder if Bing is going to adopt this patent since its going to power Yahoo Search.
Will Yahoo continue to come up with useful patents like this after being powered by Bing. Or it will leave it to Microsoft to do all the work?
I wonder if Google uses this system already, to me it seems they simply determine the most popular and reputable meaning without any other intelligence.
Hi Max,
Some ineresting questions.
It’s hard to tell what Microsoft is actually getting in the deal to power Yahoo search. Will they gain access to technologies patented by Yahoo? I don’t know.
Will Yahoo continue to come up with useful patents? It can take a few years for patents to go from being filed to being granted, so there are likely a good number still in the pipeline, and we should continue to see some from them. Yahoo also files patents covering a wide range of applications, including paid search and technologies associated with running a range of portal services. It’s likely that Yahoo will continue to file patents for those types of services.
Hi Clinton,
I’ve written a very large number of posts here about different methods that Google might use that attempt to look beyond whether or not keywords appear upon pages in retrieving those pages, to attempt to understand the intent behind a search. If you look at some of my posts on how search engines might attempt to rerank search results, you’ll see a few examples.
I still don`t understand how they determined the probability of what I want to look at the time. Is this based on the history of my previous searches, browsing?
Hi Comor,
There’s a mix of different information that a search engine might consider when trying to decide upon the intent behind a search, when one of the terms in that search could have more than one meaning.
Your search and browsing history might play a role in what you see in search results, but there’s other information that a search engine might look at as well.
For instance, if you live in Florida, and there’s a very recent and popular news story on the Jacksonville Jaguars, you might be more likely to see some pages in your search results on the NFL team. If a jaguar recently escaped from a Florida zoo, you may see more results about the animal.
Rather than just looking at your past searching or browsing history, the search engine may look at the search and browsing history of many other searchers as well.
I think the search engines should just show them a random page for not being able to construct a proper query 🙂
It could work, but I can see there being so many problems with it guessing what you are trying to find. Surely the SERP’s represent the pages the SE’s think are the most relevent, based on the seo, and news articles appear at the top of Google as well, shopping results appear if you search for a purchasable item (ie Sony Laptop).
Hi Shane,
I’ve been hearing from people about how hard it can be sometimes to come up with the right words to use to search for something.
When you don’t know to much about a topic, and you’re trying to find information on it, there are times when a directory structure rather than a search engine can be invaluable.
Search engines can attempt to guess at the meanings behind a query, provide query refinement suggestions, rerank results based upon what they perceive might be the intention behind a search, but it’s possible that allowing searchers ways to refine their queries by providing more interaction could be helpful. For instance, letting searchers see categories that the search engine might associate with a query could add a helpful level to letting searchers locate what they are looking for.
I think some emphasis has to go to the user, I particularly include extra words if search terms have more than one meaning or will probably bring back the results I am not looking for.
Hi Andy,
You would usually expect people to refine their search queries to use terms that may make it more clear what they were looking for. That usually works fine when people want information on a topic that they know something about.
But when they don’t know much about that topic, it might be a lot harder to include extra words in their queries that help return more meaningful results.
By the way this posting is very interesting. I think the search engine should show options when displaying results. i.e. did you mean George Washington the American leader and so on….The sponsored results are even worst than organic. If the keyword was selected in broad match the search results will show any match to it. I have tried the keyword “laser hair removal sex†the Google add was still showing even if I wanted to provide “laser hair removal services†because I have selected a broad matches. It showed for so many combinations.
Hi Sargon,
I believe that I’m seeing more query suggestions now than every before when performing searches. There are the predictive query dropdowns that are shown as you type a query into a search box, as well as suggestions that sometimes show about search results as well as below them.
As for broad match paid search, I believe that advertisers need to be very careful about how they choose to use paid search, and should carefully monitor their campaigns to make sure that they don’t show up for terms that they don’t want to appear for.
You can also better define your search query: ‘president washington’ or ‘jaguar cars’?
Hi Thomas,
It is possible for people to better define the search queries that they use, but that tends to be most helpful for people who may know something about the topic they are searching for already. When someone is searching for something that they might not know a lot about, they might not know the best way to refine their queries to get at the information that they are searching for.
For example, someone who doesn’t know much about football, but know that there’s a team named the Jaguars might not know that the team is from Jacksonville. They might type in “jaguars” into Google, and see a reference to Jacksonville in the results because the search engine may have decided that not everyone searching for just “jaguar” wants to know about the kind of feline or the model of car, and may have determined a probability that some searches were also interested in the football team.
I think Googles new predictive search feature will make it a lot easier for general internet browsers to find what they are looking for. On the downside, as a website owner your less likely to catch traffic which was never intended for you. The battle continues!
Hi Paul,
The predictive query suggestions have actually been live on Google’s home page for more than a couple of years now – Google Instant really only added search results that update as you type, so I don’t know if we are going to see a tremendous amount of impact with the addition of Instant, possibly aside from some changes that Google may have made to reduce bandwidth by changing some of the results that we may see.
Bill, I just discovered your blog and it took me only a few seconds to bookmark it.
It’s great to see someone analyzing patents to gain insights into the algorithms. I have always believed that putting yourself in the search engines’ shoes and trying to think how you would solve a problem in search, such as retrieving relevant results for your users, it is a great way of understanding why the search engines behave the way they do. Looking at their patents takes it one step further.
I’ll be following your writings.
Cheers,
Antonio
Hi Antonio,
Definitely, one of the best things about looking at patents from the search engines is that you gain a different perspective than you might have if you only viewed search engines from the eyes of a marketer or developer or designer.
Thanks. Looking forward to seeing you around.
I think Google has all sorts of fun things in store for us. And I’m sure they won’t tell us about any of them and will spring them on us without any notification. But the patents do give us at least some insights into what they are looking for and want.
It’s interesting to suddenly see some of their changes appear and disappear when I use Google.
Hi Dan,
I definitely agree with you. There are a lot of hints of possible new services and new approaches from Google in their white papers and patents, but likely some real surprises as well.
And some of those changes that you see appearing and disappearing are possibly from live-time testing that the search engine is performing to indentify how well some of those changes might be received, or impact searching behavior.
Very interesting.
I wonder how personalization factors into all this. Does the algorithm lose its value if someone consistently visits car sites instead of cat sites, if they search for the keyword “Jaguar”? I guess this is what Google means when they say they update their algorithm nearly every day. Things like this!
Nice work digging up the patent, too.
Brandon
Hi Brandon,
Thanks. Systems like this should ideally work independently of human intervention, except for possibly some checks on how well the system might be working in terms of providing relevant results. When Google says that they update their algorithm nearly everyday, I believe they mean that they are making actual manual changes to algorithms after testing and analysis of those tests. But many of the algorithms already in place do some kind of machine learning where they update results without human guidance or decision making.
Good find!
How do you did up this stuff? Anyway, I think Google’s been employing something like this for sometime now. I just don’t understand why everyone thinks that Google results are so BS, I find that they are usually exactly what I am looking for. I never have to leave the first page to find the right site.
In my opinion, Yahoo! is just falling a bit behind on the eight-ball, compared to Google.
Thanks,
Randolf
Hi Randolph,
I spend a lot of time digging through patents, RSS feeds from a large number of blogs, and whitepapers from the search engines.
With Bing powering the database behind Yahoo’s results, I’m not sure what to expect from them in the future.
Interesting find, Bill. You seem to have the patience of a saint in analyzing the details of search patents and related documentation. Thanks for sharing your analysis on this somewhat elusive subject.
Generally, the more descriptive the keyword phrase I use for what I’m looking for, the better the results I tend to get. But, not always. And, if I get too specific…especially when using quotes or brackets, sometimes the search engines return no results.
Hi Don,
The challenge of taking a patent, and trying to get to the essence of it as much as I can helps me learn, and I think helps keep me sharp.
There definitely is a fine line between being too general in a search and too specific. One problem that often comes along with many searches is that if the topic is one you don’t know much about, it can be hard finding the right words that give you the results you are looking for. In those situations, I’ll often start with a query that tends to be a little more general, and look at the results and query refinement suggestions to get an idea for one or more additional searches that are more specific.