Can patents be said to have family histories? If so, this post is going to introduce a barely known ancestor to one of the most written about search related patents on the Web, as well as a brand new grandchild to the patent.
The patent is Google’s Information retrieval based on historical data, which was filed in 2003, and granted in 2008. When it was published as a pending patent application in 2005, it created a pretty big stir amongst the forums and blogs of the search community.
The patent has two focuses which both take advantage of recording changes to a site over time. One is to help identify web spam, and the other is to help avoid stale documents being returned in response to a query. It raised questions between SEOs such as how important are the ages of domains and of links, as well as:
- Does Google favor fresher sites over older sites, or older sites over fresher sites?
- Even more, how does Google weigh the age of a website?
- Are the search engines looking at whois data to see who owns websites, and if there has been a change of ownership?
- If the content of a site changes, and the anchor text pointing to it remains the same even though it’s no longer relevant, will it still rank for the terms in the anchor text?
- If you buy a website and make changes to it, will the PageRank for that site start to evaporate or expire?
Continue reading Revisiting Google’s Information Retrieval Based Upon Historical Data
A new patent filing from Yahoo raises the question, “How much has social media influenced the expectations of searchers, and forced search engines to change?”
Before I can begin to even think about that, I have to ask if looking at Yahoo patents even a good idea after their 2009 deal with Microsoft to have Bing power their search results.
The Yahoo patent application was filed after the agreement between Yahoo and Microsoft, and was published last week. Are Yahoo patents are still worth spending time with? After reading through the Yahoo patent application about how the search engine might use information from social media platforms to discover recently hot topics and webpages that are relevant to those topics, I would say that they are. The terms of the agreement between Yahoo and Bing includes a 10 year exclusive right for Microsoft to use search technologies developed by Yahoo, and doesn’t stop Yahoo from applying those technologies itself.
The patent filing explores “recency-sensitive” queries, where searchers are looking for resources that are both topically relevant as well as fresh, such as novel information about an earthquake. If you’ve been watching twitter streams, Facebook updates, and other social media, you’ve seen that sometimes these sources are the best and fastest places on the Web to find that kind of information.
It’s possible that a search engine that ignores sources like those isn’t going to be able to return any relevant results for those types of queries – what the patent’s inventors call a “zero recall” problem.
Continue reading Do Search Engines Use Social Media to Discover New Topics?
One of the things that’s clear about how search engines work is that when they find a link pointing to a page using certain anchor text, that page might be seen to be a little more relevant for the text found in that link. Google pointed that out in one of the earliest white papers about how the search engine works:
This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents. We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.
- The Anatomy of a Large-Scale Hypertextual Web Search Engine
But one of the assumptions that many make is that each link, with its anchor text, is equally as important as any other link and that if a page has lots of links pointing to it with certain anchor text included in those links that it will rank more highly for the terms found in that text than it otherwise might in the absence of all those links.
Continue reading How a Search Engine might Weigh the Relevance of Anchor Text Differently
I read a novel not long ago, Rainbow’s End by Vernor Vinge, that suggested that in the future one of the most popular technology positions would be that of software archeologist, with development and programming skills capable of digging through many lines of code to decipher where they originated and how they might work with other kludge within a program to interact in meaningful ways. It made me wonder how important it would be to have a sense of the history of the growth and development of the Web.
A trip to the US Library of Congress Photographs website showed me a little of the local history of my region that I didn’t know much about, including the existence of a resort I hadn’t heard of before about five miles from where I live that could house more than a thousand people, and which had been the vacation spot of Presidents, Senators, Supreme Court Justices, and more.
Continue reading Search Engine Archeology
I may have been a little unusual as an English major in my college days. I remember one professor asking me what I found interesting about a particular author we were studying, and my answer was about patterns involving the language that he used, and how he tended to frequently use certain words that were no longer much in fashion these days. He asked for an example, and I pointed out the use of the word “singular.” I could tell that he found my point a little odd, and I wish that the Google Books N-Gram Viewer was around back in those days to back up my statement . As a side note, I wish I could have taken a class or two with HITS algorithm inventor Jon Kleinberg, who probably would have appreciated my response.
I point that out because I recall some unusual phrasings by search engineers at a large search conference I attended a few years back where most of the search marketers were using the term “ranking factors,” and all of the search engineers who gave presentations and participated in question and answer sessions instead used the term “signals.” I wasn’t the only one who noticed the phrasing, and someone called one of the search engine representatives on his use of the term, upon which a Google representative responded, and was seconded by the Yahoo and Microsoft reps, that they preferred to use the term “signal” instead of “factor.”
Much like in my college days, I find myself a little obsessed with the language used in the search patents I read. If Google would point their N-Gram viewer at the USTPO’s database of patents, that would be a great thing. There are a few terms that I keep on seeing spring up in some Google patents that I’ve been finding pretty interesting lately.
Continue reading How a Search Engine Might Use Statistics to Identify New Ranking Features
I recently participated in the creation of a book about SEO named Critical Thinking for the Discerning SEO, where Sheldon Campbell, also known as Doc Sheldon, asked 31 internet marketers a series of questions about internet work focusing upon how critical thinking plays a role in what they do. One of the questions that Doc asked was:
If you could mandate just one change to the dynamics of search ranking, what would that change be?
In my answer, I described how Google might make search results more interactive by allowing searchers to decide which algorithm they might use to search with, describing different search modes that a searcher could use when they were looking for results that might be relevant to them.
Continue reading How Google Might Introduce Job, Recipe, and Other Search Modes into Web Search Results
Google’s web search results have gone through a number of transformations over the years, from the additions of images and maps and videos and other kinds of results from Google’s vertical search respositories, to an autocomplete dropdown of query refinement suggestions and automatically updating results based upon those suggestions in Google Instant. Google has shown a link to a cached copy of many pages for years, in case a page you’re trying to visit isn’t presently available. Google introduced thumbnail previews that you can start seeing if you click upon a magnifying glass next to one of the results.
If you’re logged into your Google Account, you can see other information in Google search results as well, such as a +1 button that you can click upon to vote for a page, and a display of other people whom you are connected to somehow who have clicked upon that plus button. It’s quite likely that Google will continue to experiment with other information that you might be able to see in search results as well.
Continue reading Possible New Google Search Result Annotations
Might Google lower the rankings of a page in search results if it detects unusual patterns related to clicks on advertisements on that page, or might Google use a ranking algorithm that can be tested against such unusual click patterns to lower the rankings of pages in search results? A Google patent granted today is the first that I can recall seeing that suggests that information about clicks on ads might cause pages to be lowered in web search rankings or removed from search results altogether:
Once the document engine 146 determines the likelihood that an article is a manipulated article, the method 400 ends. The likelihood that an article is a manipulated article can be used in a variety of ways. For example, the information that an article is likely a manipulated article can be used to lower a ranking associated with that article such that the article will be displayed lower in a listing of search results or not displayed at all*.
Alternatively, the information that an article is likely a manipulated article can be used to test ranking algorithms.*
Continue reading Early Google Panda Patents