Google Patent Granted on PageRank Sculpting and Opinion Passing Links

Google filed for a patent in 2005 that could have transformed how we think about and use links, such as letting webmasters decide how much PageRank a link might pass along, or applying machine readable labels to links, indicating that some links might lead to “offensive” content (“offensive=very”) or “funny” information (“funny=somewhat”), or where on a page the destination of a link might appear, such as in a footer or main content area. This patent would also include a method to encrypt the content of some links, so that only certain people might be able to access the information that those links lead to. The patent was granted this week.

When Tim Berners-Lee wrote Links and Law back in 1997, as a commentary on the architecture of the Web, one of the statements that he included was that “The intention in the design of the web was that normal links should simply be references, with no implied meaning.” Before 2005, if you surveyed the links you came across on the Web, you’d often see a combination of anchor text describing the destination of those links and the actual URL of the links in question, but not much in terms of “opinion” about the destinations of those links. At least not something within links that a computer program or a search engine could easily pick up upon.

Starting in 2005, we’ve been seeing additions to the way that links can be written that do express some opinions that search engines can act upon. In an effort to help stop comment spam on blogs, Google, Yahoo, and Microsoft all agreed to not pass along PageRank or link value to sites being linked to when those links included a rel=”nofollow” within them, like in the example below: Continue reading “Google Patent Granted on PageRank Sculpting and Opinion Passing Links”

How Google Might Rank User Generated Web Content in Google + and Other Social Networks

One of the challenges that face search engines is how to rank content found on sites that rely upon users to create that content, often referred to as User Generated Content or UGC. Towards the end of 2009, I wrote a post about a Yahoo patent that described some of the things they might consider looking at when ranking UGC, in the post How Search Engines May Rank User Generated Content.

With Google’s recent launch of Google Plus, I’m anticipating posts and comments from their new social network system to start appearing in Google Web search results sometime soon.

A Google patent application published this past May at the World Intellectual Property Organization (WIPO) describes possible signals that Google might consider in its Web search results when it displays and ranks images and videos on photo and video sharing sites, questions and answers on Q&A sites, forum posts and responses, blog posts and comments, and social network posts, status updates, and comments. It was originally filed on October 29, 2009, but looks like it could be a system that could be used with Google + without too many modifications. The patent filing hasn’t been published yet at the US Patent and Trademark Office.

Continue reading “How Google Might Rank User Generated Web Content in Google + and Other Social Networks”

Google’s Second Most Important Algorithm? Before Google’s Panda, there was Phil

They named the project Phil, because it sounded friendly. (For those who required an acronym, they had one handy: Probabilistic Hierarchical Inferential Learner.) That was bad news for a Google Engineer named Phil who kept getting emails about the system. He begged Harik to change the name, but Phil it was.

Steven Levy, In The Plex: How Google Thinks, Works, and Shapes Our Lives.

How does Google decide which Adsense advertisements to show on which Web pages? How do they avoid showing inappropriate advertisements on those content pages? How does the document classification system they use to power those decisions work, and has its use been expanded beyond Google’s advertising system?

A screenshot of an interface from the patent Categorizing objects, such as documents and/or clusters, with respect to a taxonomy and data structures derived from such categorization, that shows how someone might discover which categories a website might be included within.

Continue reading “Google’s Second Most Important Algorithm? Before Google’s Panda, there was Phil”

How A Search Engine Might Classify Web Pages as Sensitive

Given the Panda Updates from Google, I’ve been spending a fair amount of time looking at how search engines might use automated programs to classify webpages, and how they use those classifications. If you’re a web publisher, it’s the kind of thing that you might be interested in as well. If you display ads, what does Google think of where and how you present them? How does your choice of colors, font styles and sizes, number of columns, size of headings and footers, inclusion of about pages and privacy policies, and other features on your site influence how Google might perceive and classify and score your pages?

One example of a problem where classification of pages might be helpful to a search engine is described in the book about Google by Steven Levy, In The Plex. The author tells us about some Google Adsense gaffs that show challenges in automating the matching of advertisements with pages to display those ads upon. One particularly offensive match was a Google ad for plastic bags showing on a news page about a grisly murder where the victim’s body was disposed of in plastic trash bags. Tickets for air travel might be placed on a page about plane crashes. A coupon offering a free dinner for 2 at a particular chain restaurant appeared on the same page as an article about a number of people who dined at a restaurant in that chain and had suffered from food poisoning. The author notes:

Google Engineers started working on ways to mitigate this problem, but it would never be eliminated. It was just too hard for an algorithm trained to discover matches between articles and ads to exercise human good taste.

Continue reading “How A Search Engine Might Classify Web Pages as Sensitive”

Google As a Social Search Engine: Aardvark Answers & Circle Posts in Google Search Results?

A Google patent application published in early May explains why Google might start showing social answers in Google search results. The basic premise is that some types of questions are best answered by library type results, and others by a village paradigm approach to information retrieval. In a village, people disseminate knowledge socially, with information passed from person to person, and retrieving information involves finding the right person, as opposed to the right document.

Here’s how a social answer to a query might appear in Google’s search results:

A screenshot from the patent that shows a social answer to a query about [san francisco hotels pets] in Google search results.

Continue reading “Google As a Social Search Engine: Aardvark Answers & Circle Posts in Google Search Results?”