When a judge writes a judicial opinion upon a case, he often includes more than just his ruling on the case. It usually contains an analysis of the present law, the legal atmosphere, and how the ultimate holding on the case was arrived at. Those written rulings can also include some legal opinions on issues that don’t necessarily play an essential role in the outcome of the case at hand, and those are often referred to as “dicta.”
When you read a patent, you’ll see that it’s broken into a number of parts. The most important of those is the claims section, which is what a patent examiner focuses upon when prosecuting a patent, and deciding whether or not it should be granted. There are also description sections in patents which give a richer and more detailed look at how the technology behind a patent might be implemented (with emphasis on the “might”). Often those descriptions include material that isn’t reflected within the claims section of a patent, and in many ways, those description sections could be considered as similar to the dicta that I mentioned sometimes appears within judicial opinions.
Stanford University was granted two new patents today under the name, Scoring documents in a database, both of which were filed at the United States Patent and Trademark Office on January 19, 2010. These two patents, assigned to Stanford and listing Lawrence Page as inventor, are described as continuation patents of the following patents assigned to Stanford which focus upon PageRank:
Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular, we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.
A lot of people were guessing which “method of link analysis” might have been changed, from PageRank being turned off, to anchor text being devalued, to Google ignoring rel=”nofollow” attributes in links, to others. I was asked my opinion by a few people, and mentioned that there were a number of potential approaches that Google might have changed.
According to Google’s Director of Research, Peter Norvig, if you look at Google Trends for trends related to “full moon” or “ice cream”, you’ll see that Google searches for those terms imitate actual physical trends in the world. With a very large number of queries performed for those terms, searches for “full moon” peak every 28 days. Searches for “ice cream” peak every summer, 365 days apart. Large amounts of data make interesting things possible.
If you’re interested in how search engines work, and how large amounts of data can help them do what they do more effectively, it’s highly recommended that you read the paper The Unreasonable Effectiveness of Data (pdf), written by Alon Halevy, Peter Norvig, and Fernando Pereira, from Google. Even more highly recommended is a presentation from Peter Norvig of the same name from a Distinguished Lecture Series at the University of British Columbia last fall, which sadly has less than a 1,000 views at YouTube presently:
In the early days of Google, when you performed a search, the results you received were just links to pages found on the Web, showing page titles, snippets, and URLs. Google started adding other types of searches to its Web search, such as:
While these launched as separate search repositories, they weren’t going to stay that way, and may never have been planned as solely being standalone data repositories. In 2007, Google introduced Universal Search. At a Google presentation called Searchology in May of 2007, Google announced Universal Search, which included video, news, books, image and local results incorporated into Web search results. According to the Official Google Blog post, the roots of Universal Search can be traced back to 2001, with a lot of effort leading to its launch:
Over several years, with the help of more than 100 people, we’ve built the infrastructure, search algorithms, and presentation mechanisms to provide what we see as just the first step in the evolution toward universal search. Today, we’re making that first step available on google.com by launching the new architecture and using it to blend content from Images, Maps, Books, Video, and News into our web results.
How much might one page on a website influence the rankings of other pages? When I joined an agency in 2005, our focus was on rankings for individual pages – optimizing their content for specific terms and phrases, and making sure that they had links from other pages, both onsite and off. I found myself unable to color just within those lines. It was impossible to ignore the impact of global issues on a website when trying optimize individual pages for terms. Every page on a site has the ability to impact how each page might be crawled and indexed and displayed by search engines.
For example, if the home page of a site was accessible at multiple URLs, there was the very real risk that PageRank for that page could be split multiple ways, such as amongst:
Has an improvement in how Google understands the layout of pages, and understands and classifies different elements found on page had an impact on the titles and snippets that we see in search results? Google may classify queries to decide what to show for those page titles and snippets in search results, but it’s possible that they might also be classifying the contents of “original titles and snippets and URLs” when deciding to show different titles and expanded snippets. Might Google do that in combination with a classification of page elements (a portion of HTML containing some text) found on the pages in search results to try to determine the best representation of a search result in response to a query?
Google May Chose Titles and Snippets for Pages
When you search at Google, the search results displayed for web pages include titles, URLs, and snippets for the pages listed in the results. In those, the query terms you used, or sometimes synonyms for them, may be included in the title and snippet, and Google will highlight those. As a site owner, you should have unique and engaging titles and meta descriptions for each page you want indexed by search engines. Not only does that make it more likely that search engines will crawl, index, and display those pages, but if you use the keywords you’re optimizing those pages for within those titles and descriptions, Google may show your choice of title and meta description within search results.
Somewhere in an alternative universe, it’s possible that one of the most feared hitters in baseball might have instead been known as one of its greatest pitchers. Babe Ruth started out as a pitcher for the Boston Red Sox in 1914, and when approached about getting his bat into the lineup on a daily basis in 1918, his manager Ed Barrow responded that “I’d be the laughingstock of baseball if I took the best lefthander in the league and put him in the outfield.” A couple of years later, Ruth was sold to New York’s team for an unprecedented $125,000 where he proceeded to hit 54 home runs for the Yankees, and begin a pretty good career hitting a baseball instead of throwing it at people.
In 1920, anyone looking for information about the Babe probably weren’t too interested in his pitching career. Likewise, when someone searches today for [world series champion], it’s likely that they are looking for fresh results. How does a search engine like Google determine when searchers might prefer fresh results, and when they might prefer older results?
Yesterday, I wrote about how Google may be looking at the semantics associated with HTML heading elements, and the content that they head, and how the search engine might be looking at such content with similar headings across the Web to determine how much weight to give words and phrases within those headings.
That post was originally part of the introduction to this post, but it developed a life of its own, and I ran with it. Here, we’re going to look at semantics related to other HTML structures, primarily lists and tables.
I’m going to bundle a handful of patents together for this choice of one of the 10 most important SEO patents, since I think they work together to illustrate how a search engine might use semantic structures to learn about how words and concepts might be related to each other on the Web. Some of these patents are older, and one of them is a pending patent application published this week. I’m also going to include a number of white papers which help define a process that might seem to be very much behind the scenes at Google. I’m going to focus upon Google with this post, though expect that similar things may also be happening at other search engines as well.