PageRank is a measure that stands for a probability that if someone starts out any page on the Web, and randomly clicks on links they find on pages, or gets bored every so often and teleports (yes, that is official technical search engineer jargon) to a random page, that eventually they will end up at a specific page.
Larry Page referred to this person clicking on links as a “random surfer.” Thing is, most people aren’t so random. It’s not like we’re standing at some street corner somewhere, and just randomly set off in some direction. (OK, I confess that I do sometimes do just that, especially when faced with a sign like that below.)
Imagine someone from Google waking up in the middle of the night, with the thought, “Hmmmm. Maybe we’re not quite doing PageRank quite right. Maybe we should be doing things like paying attention to where links appear on a page, and other things as well.”
In earlier days of SEO, many search engine optimization consultants stressed placing important and valuable content towards tops of HTML code on pages, based upon the idea that search engines would weigh prominent content more heavily if it appeared early on in documents. There are still very well known SEO consultants who include information about a “table trick” on their sites describing how to move the main body content for a page above sidebar navigation within the HTML for a page using tables. I’ve also seen a similar trick used with CSS absolute placement in HTML, where less important content appears higher on the HTML page that visitors actually see, but lower in HTML code for a page.
Back in 2003, the folks at Microsoft Research Asia published a paper titled VIPS: a Vision-based Page Segmentation Algorithm. The abstract for the paper describes the approach, telling us that:
A new web content structure analysis based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure.
Imagine gathering together 10 extremely knowledgeable search engineers, locking them into a room for a couple of days with walls filled with whiteboards, with the intent of having them brainstorm ways to limit stale content and web spam from ranking highly in search results. Add to their challenge that the methods they come up with should focus upon “the nature and extent of changes over time” to web sites. Once they’ve finished, then imagine taking what appears on those whiteboards and condensing it into a patent.
The end result would likely look like Google’s patent Information Retrieval based on Historical Data. When this patent was originally published as a pending patent application awaiting prosecution and approval back on March 31, 2005, it caused quite a stir in the SEO community. Here are a few of the many reactions in forums and blog posts as a result:
I like looking at patents and whitepapers and other primary sources from search engines to help me in my practice of SEO. I’ve been writing about them for more than 5 years now, and am putting together this series of the 10 Most important SEO patents to share some of what I’ve learned during that time. These aren’t patents about SEO, but rather ones that I would recommend to anyone interested in learning more about SEO by looking at patents from sources like Google or Microsoft or Yahoo.
The first PageRank patent application was never published by the United States Patent and Trademark Office (USPTO), it was never assigned to a particular company or organization, and it was never granted. It avoids dense legal language and mathematics that can make reading patents difficult, and it captures the excitement of a candidate Ph.D. student, Larry Page, who has just come up with a breakthrough in indexing webpages that had the potential to be a vast improvement over other search engines at the time it was published.
The decision process that you go through when deciding to make changes to your site can be tough. Even if those changes are likely necessary and needed, determining the best way to implement them can make you pause, and spend a lot of time considering all the potential alternatives that you might have. You can do a cost/benefit analysis, where you consider how much change you might make to your site, what the benefits of making that change might be, and what the costs might be in both making the change and deciding not to do so.
It shouldn’t require much thought to do things like make your website more usable, but it can, especially if the changes you make change around the look and feel of your pages, and the way that people interact with them. A good example are the changes taking place at Google, where the search engine has implemented a number of new design elements over the past year or so, including new colors and formatting of their search results pages, a different look to how local search results are presented within Web search results, URLs now appearing under page titles and above snippets for pages, and Instant Previews, which show a thumbnail of a page and call out boxes of text showing where query terms appear within those thumbnails.
On the subject of those Instant Previews, one of the challenges that search engines face is presenting web pages returned from a search in a way that helps searchers locate the information they want to find. A typical search result for a web page includes a page title, a URL for the page, and a short snippet that might be taken from a meta description or from text found on the page itself. A searcher is shown a page filled with these document representions to choose from, but sometimes that’s not enough to make a decision as to what page to click through.
If you’ve never used Twitter before, it can be a little intimidating when you’re first starting out. You’re faced with a message on the front page of the site telling you to “Follow your interests,” and promising “instant updates from your friends, industry experts, favorite celebrities, and what’s happening around the world.”
Then you sign up, and you’re faced with an empty text box with a question above it asking you “What’s Happening?” You have no friends added yet, you’re not following any industry experts or favorite celebrities, and there’s no news about what’s happening around the world. But you might see tweets in more languages than just English, according to a whitepaper presented last month.
The site does have ways to help you search for and find people to follow and interact with, and will recommend people to follow in a few places, but trying to figure out exactly what to say in that box that asks “what’s happening,” isn’t that easy. I remember spending more than a couple of days trying to figure that out myself.
Are you a robot? A spammer? A sock puppet? A trusted author and content developer? A trusted agent in the eyes of Google? (More on trusted agents below.)
When you interact on a social network, or write a review online or update information to an internet mapping service, how much does the service you are using trust the content that you add, or the changes that you might make?
These aren’t rhetorical questions, but rather ones at the heart of approaches from services like Google Web search and Google Maps, which are focusing more and more upon social signals and social collaboration to provide the information that they do to the public.
If you’ve seen a +1 button within Google’s search results or on a site, and you’ve clicked upon it, or shared a page or post or site in Google Plus with others, you’ve engaged in endorsing the work of the author who created that site. How much weight does Google give that endorsement?
If you find an error on a Google Place page, such as an incorrect phone number or bad street address, and you take the time to try to correct that, what process might Google go through to decide if you’re telling the truth?
A number of years back, I remember being humbled by a homework assignment crayon drawing by a friend’s son which listed what he was thankful for, and included his parents, his sister, and shoes that Thanksgiving. We take so much for granted that we should be thankful that we have. A few friends and I had gathered over my friend’s house, and we were all knocked somewhat silent by the picture when he proudly showed it off to his father. Thank you to everyone who stops by here to read, to learn, to share, and to add to the discussion. Thank you too, for the chance to share the things I find and the things that I learn from you all.
On Monday, I wrote about a recently granted patent from Google that described How Human Evaluators Might Help Decide Upon Rankings for Search Results at Google. Interestingly, this week Google was granted a patent that describes an automated method they might use to check the quality of specific sets of search results.
When Google responds to a searcher’s query, it presents a list of pages and other kinds of documents such as images or news or videos. The patent’s filing date is from before Google’s universal search but probably does a good job of describing something Google might do with web page based search results.