What is the most important part of a page? If a page has images on it, what images are the most important ones?
If a search engine were to try to understand whether or not any images on the pages of a site were related to each other, how would it go about figuring that out?
The first two questions are easy to answer – the most important part of a page is the part that visitors focus upon when they look at it. The most important images are the ones that people look at and pay attention to when they are on that page.
A newly granted patent from Microsoft tries to solve all three questions in an automated manner that can break a page down into blocks, and decide a level of importance amongst those blocks when comparing them to each other – what is the probability that a user will focus upon each of those blocks (or upon images within those blocks) when looking at the page.
It might consider the importance of one block to another on the same page and on other pages within the same site by looking at links between the blocks on those pages. It might view whether images are within the same blocks or related blocks, and also look for links to images from different blocks to see if and how images might be related.
Continue reading Microsoft Playing with Blocks to Understand How Images Might be Related
If you own a web site, how do you measure the way that people interact with your site? What data do you look at, how do you analyze it, and what do you do with that analysis?
The topic is becoming a popular one on the Web, and I have some links below to some articles on the subject that I thought were pretty interesting. I was inspired to collect those links after looking at a patent filing from Yahoo that describes some of the methods that they might use to try to understand how engaged people are upon their web properties.
The patent application is Techniques for measuring user engagement, and the listed inventors are Francesca M. Soito and Nitin Sharma (who appears to have now moved to Google).
User Engagement Variables
Continue reading Measuring User Engagement, with Examples from Yahoo
When a search at a search engine includes a person’s name, or the name of a particular place, or a book, or a band, or an album, there might be some confusion as to which person (or place or thing) is being searched for.
Case in point, there’s a well known race car driver by the name of Danny Sullivan. There’s also a well known journalist who writes about the search industry by the name of Danny Sullivan.
Continue reading Google on Using a Knowledge Base of Articles to Make Searches Smarter
Google has unveiled an approach to determining authority pages for query terms and business locations and categories on a site, and making other pages on the same site more relevant for that information, even if it isn’t mentioned on those other pages.
Are there authority pages on the Web for some search terms or business locations or categories?
Can it be helpful for the content and categories of some pages on a website to be imputed to other pages of that site, so that those pages rank higher in search results?
Continue reading Google Determining Search Authority Pages and Propagating Authority to Related Pages
If you have a web site that classifies products or services or pages into different areas, and your offerings might be offered in a shopping search engine or other services that draw information from multiple web sites, how you classify what you offer may play a role in how that shopping search engine classifies, or creates new classifications when it displays your products or services or pages.
A Yahoo patent application describes an automated process, where items entered into different sets of categories can be categorized in other broader categorization schemes.
These broader category schemes could be for product search, for advertisments, for user-tagged items such as photos, for services such as job listings, as well as other areas where there are many web sites that have their own unique categorization systems.
The Value of Categories
Continue reading Search Engines, Classifications, and Assignment of Categories
A newly published Yahoo patent application describes a couple of ways to filter out some of the URLs that it might crawl, to keep those pages from being indexed and presented to searchers.
Those URLs are referred to in the patent filing as “transient” links because they change from visit to visit, often because they are advertisements that have URLs with tracking codes included within them, or contain session IDs to track visitors.
An approach is provided for identifying transient links on a Web page. The approach ensures that transient links are not crawled and archived, thereby saving resources for crawling valid links leading to useful information.
Outgoing links on a web page are identified, and after a period of time, a new copy of the web page is obtained and the outgoing links identified. The respective sets of links are compared and links which do not appear in both sets of links are identified as transient.
Continue reading A Yahoo Approach to Avoid Crawling Advertisement and Session Tracking Links
When a judge looks at evidence entered into court, he weighs a number of factors. One of them is whether the evidence offered is relevant to the case at hand.
The other is how important that evidence might be.
Now, a piece of evidence by itself doesn’t have to be groundbreaking to important, but for example, testimony related to the character of a 40 year-old defendant in a criminal proceeding by his third grade teacher may be somewhat relevant, but probably not all that important.
Importance Scores in Search Engines
When a search engine ranks pages for a set of search results, it also usually looks at two different and distinct types of calculations, which it combines together to serve pages to searchers. Those scores likewise focus upon how relevant a result might be to a query entered, and how important that page or picture or video might be.
Continue reading An Expansion of Importance Scores for Web Page Rankings?
I discussed one of the more interesting patent applications from Google last year in Google looks at multi-stage query processing. What made it so intriguing was that it described different stages and aspects of ranking results by the search engine.
A related patent application was published this week, Document compression system and method for use with tokenspace repository, goes back to that multi-staged query processing system, and makes claims for some of the more technical aspects of how information is contained within the indexes used during that process.
The abstract for the patent filing provides a high level look at some of the techniques used:
The disclosed embodiments enable multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme.
Continue reading Google on Multi-Tiered Indexing and Multi-Staged Query Processing