Images on a web page can provide a chance to express ideas in a visual way that can convey a considerable amount of information, and may also add to the attractiveness and perceived quality of a site.
When search engines rank pages in search results, images may have some impact in those rankings.
A search engine might look at the captions associated with pictures, or alt text provided as an alternative for when people browse the Web without images turned on or when those browsers are using screen reading software.
Search engines might also look at text surrounding an image, especially within the same HTML container, or block or segment.
Those indexing services could also associate other content on a page with an image, including the page’s title.
Continue reading How Search Engines May Use Images to Rank Web Pages
In March, one of the more interesting patent filings from Google was granted, Information retrieval based on historical data.
I had discussed it on forums when the original patent application came out in March of 2005, but didn’t provide a write up of the document here. I realized a few weeks ago that I probably should.
The historical data patent is important because it discusses a large number of techniques that a search engine might use in fighting “spamming techniques” that might artifically “inflate” the rankings of web sites, and it works to identify “stale” sites that may be ranked higher than fresher sites that might contain more recently updated information.
I’ll be writing a few posts over the next few weeks about the patent, and try to include some updates that have happened since it was first published. This first post looks at how the “freshness” of a page or document might influence its rankings in search results.
Continue reading Updating Google’s Historical Data Patent, Part 1 – Freshness
The Web is filled with page after page after page of data. That data is usually organized differently from one site and one page to another, and contained in text, in pictures, in videos, in audio, in columns, in rows, in frames, and many other formats.
When a search engine spider comes to a page on the Web, it will try to go through all of the text it finds, make note of links to other pages, consider alt text for images, and view meta data tags.
Search engines spiders will decide whether or not the content of pages should be indexed by the search engine, and determine which links to follow next.
Continue reading Search Engines Extracting Table Data on the Web
When you’re searching for something on the Web, does it matter whether you use the singular or plural version of a word in your search?
For example, let’s say that you are looking for a new pair of sneakers to go jogging in, and you want to find the right combination of comfort and support, so you decide to look into the best sneakers for running. Does it make a difference in search results when you type in running shoes or running shoe in a search box?
If a search engine just returned results to you based upon your choice of a singular or plural search term, would you get the best results? Should a search engine explore both versions, and try to provide you with a mix of results based upon what it believes are the best results, after looking at results from the singular and plural versions?
A quick look at the top ten results at Yahoo and at Google for both “running shoe” and “running shoes” (both searches without the quotation marks) showed some overlap in pages returned for singular and plural versions at each search engine, but the vast majority of search results seem to focus upon returning results for the plural version of the word, instead of the singular version.
Continue reading How a Search Engine Might Handle Singular and Plural Queries
When someone searches at one of the major search engines, they often type in keyword phrases, composed as if they were written for human readers. Those phrases may contain words or phrases that show up very frequently in pages on the web, and have little to do with the information being sought by the searcher.
Search engines that focus upon retrieving search results based upon keywords found in queries have often ignored those frequently appearing and irrelevant words contained within search query terms.
Those words have been referred to in the past by Google as “stopwords,” and could be words like: a, and, is, on, of, or, the, was, with. Similar groups of words that appear very common on web pages, and are also unconnected to an actual search could be referred to as “stop-phrases.”
The word “a” in the query “a London hotel” is a stopword.
Continue reading Google Stopwords Patent
You search for “Foo Fighters,” and the search engine takes your query and starts searching its databases to identify results. It might look through a video database, to see if there are any good videos to show you. It may dig through a News database to see if there was any recent news tied to the phrase, or an Image database to see if there were any popular pictures of the band. The search engine may see if any advertisers were running campaigns using the band’s name.
Some of that searching is done by trying to take the exact phrase that you used in your search, “Foo Fighters,” to find a set of results that you might be satisfied in seeing. But, there are steps that a search engine could try to take that might give you even better results.
Associating Search Terms with Categories
Continue reading How Using Categories for Queries Can Help Searchers, Writers, and Search Engines
When stockbrokers who spend their day searching for financial information about different businesses type the word “Starbucks” into Google’s search box, chances are that they are more likely to be looking for stock price information than the closest place that they can get a mint mocha chip frappuccino.
When a city-dwelling college student, who likes to meet up with friends at new places all the time, using his cell phone to find and map out those places, types the word “Starbucks” into his phone’s browser, the first thing he wants to see is probably a map to the nearest Starbucks.
Can a search engine be smart enough to serve a stock price quote at the top of search results to the stock broker, and a map to the college student, even if both are using hand held devices to connect to the Web?
Relevance and Informational Needs
Continue reading Google on One Boxes and Grouping Result Types to Fill Informational Needs
As a webmaster, when you put a page up on the web, there may be parts of that page that you may not want to have indexed by a search engine.
Many web pages contain information that isn’t unique to each page, such as the navigation for a site, copyright notices, advertising, links to other sites such as blog rolls, and other sections that may not contain information about the main topic of the page itself.
Yahoo’s Robots-Noindex Classes
In May of 2007, Yahoo made a post on the Yahoo Search Blog about how webmasters could let the search engine know that content in certain sections of pages shouldn’t be returned in search results to searchers, titled Introducing Robots-Nocontent for Page Sections.
Continue reading Which Sections of Your Web Pages Might Search Engines Ignore?