Google was granted a patent today from the USPTO on Universal Search, which provides searchers with a mix of search results from different categories, such as news, images, advertisements, web pages, and kinds of results when they type in a search query
The original patent application was filed on December 31, 2003, and Google announced the introduction of Universal Search in May of 2007. The patent describes some different kinds of document categories that may be shown in search results, such as:
- Sponsored links,
- News documents,
- Product documents,
- Documents summarizing discussion groups,
- General web documents, and;
- Other document classifications
The Official Google Blog described a few more categories that could be shown to searchers in their announcement, Universal search: The best answer is still the best answer, including Maps, Books, Video, as well as additional contextual links to other categories of documents such as “blogs,” “books,” “groups,” and “code.”
Continue reading Google Universal Search Patent Granted
If you look at a typical page that shows up after you perform a search at one of the major commercial search engines, you’ll see that those search result pages don’t differ too much from each other.
Some sets of search results do include news, images, maps, amd other results that go beyond just a list of web pages that may contain the keywords used in a search.
But, how interested would you be in entering the address of a web page and seeing related search queries for that page, or related people or places or other pages?
Inversion Searches Showing Related Queries
This kind of search, referred to as an “inversion search,” by some Microsoft inventors, is the topic of a new patent application from the Washington-based search provider.
Continue reading How a Search Engine Might Provide Searchers with Related Search Queries For Web Pages
Web pages can be messy; they can have more than one topic on a page, and use templates that surround those topics adding little meaning to the meat of the content, filled with links and labels, advertising and boilerplate, copyright and other notices.
With a diversity of topics, those pages may not be easily crawled and recorded and indexed and found, by search engines and searchers.
When we think of search engines and how they work, we often break what they do down into three main parts – discovering new pages and new content on old pages, indexing content on those pages following rules that show a preference for important pages and unique content, and presenting relevant and meaningful information to searchers and their intents (or at least matching their keywords) in response to queries that they enter into a search box.
We usually don’t think of search engines as indexing parts of pages, chunks of information that might exist side-by-side with very different topics, and yet many pages are messy like that.
But we’ve had signs from the white papers and patent filings that we see from search engineers, that they might try to segment and capture information about different topics found on the same page.
Continue reading Microsoft Granted Patent on Vision-Based Document Segmentation (VIPS)
Come gather ’round people
Wherever you roam
And admit that the waters
Around you have grown
And accept it that soon
You’ll be drenched to the bone.
If your time to you
Is worth savin’
Then you better start swimmin’
Or you’ll sink like a stone
For the times they are a-changin’.
– Bob Dylan
Can the rate of change upon web pages influence how Google might rank pages of a site?
In part one of this series, I looked at how Google’s patent on Information retrieval based on historical data focused upon Freshness.
This second part of the series explores how Google might look at content changes on Web pages, and how the frequency of those changes might influence how those pages may rank at the search engine. Keep in mind that we don’t know for certain whether Google is even using the processes described in this patent. But it is a possibility.
Continue reading Updating Google’s Historical Data Patent, Part 2 – Changing Content
Before becoming a co-founder of the new search engine Cuil, Anna Lynn Patterson worked at Google upon a way of looking at how often different phrases appeared together on pages on the Web, described in a series of patent applications which share a common description, with different claims sections that itemize different parts of that description.
I summarized the description from one of the patent filings in my post from December 29, 2006, in Phrase Based Information Retrieval and Spam Detection
One of the patent applications from that series, Automatic taxonomy generation in search results using phrases, which I hadn’t originally come across back in 2006, was granted today, and covers the idea of taking documents that share related phrases, and clustering them with the related phrases to provide search results that might cover a range of categories related to search queries.
Continue reading Google Phrase Based Indexing Patent Granted
Images on a web page can provide a chance to express ideas in a visual way that can convey a considerable amount of information, and may also add to the attractiveness and perceived quality of a site.
When search engines rank pages in search results, images may have some impact in those rankings.
A search engine might look at the captions associated with pictures, or alt text provided as an alternative for when people browse the Web without images turned on or when those browsers are using screen reading software.
Search engines might also look at text surrounding an image, especially within the same HTML container, or block or segment.
Those indexing services could also associate other content on a page with an image, including the page’s title.
Continue reading How Search Engines May Use Images to Rank Web Pages
In March, one of the more interesting patent filings from Google was granted, Information retrieval based on historical data.
I had discussed it on forums when the original patent application came out in March of 2005, but didn’t provide a write up of the document here. I realized a few weeks ago that I probably should.
The historical data patent is important because it discusses a large number of techniques that a search engine might use in fighting “spamming techniques” that might artifically “inflate” the rankings of web sites, and it works to identify “stale” sites that may be ranked higher than fresher sites that might contain more recently updated information.
I’ll be writing a few posts over the next few weeks about the patent, and try to include some updates that have happened since it was first published. This first post looks at how the “freshness” of a page or document might influence its rankings in search results.
Continue reading Updating Google’s Historical Data Patent, Part 1 – Freshness
The Web is filled with page after page after page of data. That data is usually organized differently from one site and one page to another, and contained in text, in pictures, in videos, in audio, in columns, in rows, in frames, and many other formats.
When a search engine spider comes to a page on the Web, it will try to go through all of the text it finds, make note of links to other pages, consider alt text for images, and view meta data tags.
Search engines spiders will decide whether or not the content of pages should be indexed by the search engine, and determine which links to follow next.
Continue reading Search Engines Extracting Table Data on the Web