In March, one of the more interesting patent filings from Google was granted, Information retrieval based on historical data.
I had discussed it on forums when the original patent application came out in March of 2005, but didn’t provide a write up of the document here. I realized a few weeks ago that I probably should.
The historical data patent is important because it discusses a large number of techniques that a search engine might use in fighting “spamming techniques” that might artifically “inflate” the rankings of web sites, and it works to identify “stale” sites that may be ranked higher than fresher sites that might contain more recently updated information.
I’ll be writing a few posts over the next few weeks about the patent, and try to include some updates that have happened since it was first published. This first post looks at how the “freshness” of a page or document might influence its rankings in search results.
Continue reading “Updating Google’s Historical Data Patent, Part 1 – Freshness”
Googlebot Extracting Table Data
The Web is filled with page after page after page of data. That data is usually organized differently from one site and one page to another, and contained in text, in pictures, in videos, in audio, in columns, in rows, in frames, and many other formats.
When a search engine spider comes to a page on the Web, it will try to go through all of the text it finds, make note of links to other pages, consider alt text for images, and view meta data tags.
Search engines spiders will decide whether or not the content of pages should be indexed by the search engine, and determine which links to follow next.
Continue reading “Search Engines Extracting Table Data on the Web”
The ability to annotate pictures, songs, video, and web pages is growing as site owners attempt to find new ways to build communities and attract visitors to their sites.
Enabling people to add their thoughts and opinions, their words to a page means that a site can grow and evolve with the community that it was built for, and that it becomes more interesting and valuable to other visitors.
Some very popular sites that allow for annotations include Flickr, Delicious, StumbleUpon, and YouTube, and the original intent behind Google may have been to build an annotation system. Knowing what other people are saying about a site or a photo or a video can be pretty interesting.
A number of questions can be raised about annotations:
- How do you encourage people to leave an annotation of their own?
- Will offering annotation suggestions help?
- If so, how do you rank the suggestions that you provide?
- Might it make a difference based upon the kind of device that people are using, such as a desktop computer, or a hand held camera phone?
- Can personal history of past annotations help with future annotation suggestions?
- Can you build community with a site offering annotations, and can community input and input of all members of a site be helpful in offering annotation suggestions?
- What role might annotations play in how search engines return which results to searchers?
Continue reading “Creation and Ranking of Image Annotation Suggestions”
One of the challenges that a search engine like Google faces is that in order for it to be useful globally, it needs to provide search for an audience that speaks many different languages.
It’s not surprising then, that the search engine has delved into learning as much as it can about many languages, and even providing a translation service – Google Translate, and a service that allows you to search for translated words and phrases in other languages – Google Language Tools.
Google also offers a Google Translate gadget that you can put on your pages to allow visitors to translate your page into their language.
Google has also worked to make their Machine Translation service mobile by making it available on iPhones.
Continue reading “Machine Translation at Google”
When you’re searching for something on the Web, does it matter whether you use the singular or plural version of a word in your search?
For example, let’s say that you are looking for a new pair of sneakers to go jogging in, and you want to find the right combination of comfort and support, so you decide to look into the best sneakers for running. Does it make a difference in search results when you type in running shoes or running shoe in a search box?
If a search engine just returned results to you based upon your choice of a singular or plural queries, would you get the best results? Should a search engine explore both versions, and try to provide you with a mix of results based upon what it believes are the best results, after looking at results from the singular and plural queries?
A quick look at the top ten results at Yahoo and at Google for both “running shoe” and “running shoes” (both searches without the quotation marks) showed some overlap in pages returned for singular and plural versions at each search engine, but the vast majority of search results seem to focus upon returning results for the plural version of the word, instead of the singular version.
Continue reading “How a Search Engine Might Handle Singular and Plural Queries”
What are Stopwords?
When someone searches at one of the major search engines, they often type in keyword phrases, composed as if they were written for human readers. Those phrases may contain words or phrases that show up very frequently in pages on the web, and may have little to do with the information being sought by the searcher.
Search engines that focus upon retrieving search results based upon keywords found in queries have often ignored those frequently appearing and irrelevant words contained within search query terms.
Those words have been referred to in the past by Google as “stopwords,” and could be words like: a, and, is, on, of, or, the, was, with. Similar groups of words that appear very commonly on web pages, and are also unconnected to an actual search could be referred to as “stop-phrases.”
The word “a” in the query “a London hotel” is a stopword.
Continue reading “Google Stopwords and Stop-Phrases”