The Web is filled with page after page after page of data. That data is usually organized differently from one site and one page to another, and contained in text, in pictures, in videos, in audio, in columns, in rows, in frames, and many other formats.
When a search engine spider comes to a page on the Web, it will try to go through all of the text it finds, make note of links to other pages, consider alt text for images, and view meta data tags.
Search engines spiders will decide whether or not the content of pages should be indexed by the search engine, and determine which links to follow next.
Continue reading Search Engines Extracting Table Data on the Web
When you’re searching for something on the Web, does it matter whether you use the singular or plural version of a word in your search?
For example, let’s say that you are looking for a new pair of sneakers to go jogging in, and you want to find the right combination of comfort and support, so you decide to look into the best sneakers for running. Does it make a difference in search results when you type in running shoes or running shoe in a search box?
If a search engine just returned results to you based upon your choice of a singular or plural search term, would you get the best results? Should a search engine explore both versions, and try to provide you with a mix of results based upon what it believes are the best results, after looking at results from the singular and plural versions?
A quick look at the top ten results at Yahoo and at Google for both “running shoe” and “running shoes” (both searches without the quotation marks) showed some overlap in pages returned for singular and plural versions at each search engine, but the vast majority of search results seem to focus upon returning results for the plural version of the word, instead of the singular version.
Continue reading How a Search Engine Might Handle Singular and Plural Queries
When someone searches at one of the major search engines, they often type in keyword phrases, composed as if they were written for human readers. Those phrases may contain words or phrases that show up very frequently in pages on the web, and have little to do with the information being sought by the searcher.
Search engines that focus upon retrieving search results based upon keywords found in queries have often ignored those frequently appearing and irrelevant words contained within search query terms.
Those words have been referred to in the past by Google as “stopwords,” and could be words like: a, and, is, on, of, or, the, was, with. Similar groups of words that appear very common on web pages, and are also unconnected to an actual search could be referred to as “stop-phrases.”
The word “a” in the query “a London hotel” is a stopword.
Continue reading Google Stopwords Patent
You search for “Foo Fighters,” and the search engine takes your query and starts searching its databases to identify results. It might look through a video database, to see if there are any good videos to show you. It may dig through a News database to see if there was any recent news tied to the phrase, or an Image database to see if there were any popular pictures of the band. The search engine may see if any advertisers were running campaigns using the band’s name.
Some of that searching is done by trying to take the exact phrase that you used in your search, “Foo Fighters,” to find a set of results that you might be satisfied in seeing. But, there are steps that a search engine could try to take that might give you even better results.
Associating Search Terms with Categories
Continue reading How Using Categories for Queries Can Help Searchers, Writers, and Search Engines
When stockbrokers who spend their day searching for financial information about different businesses type the word “Starbucks” into Google’s search box, chances are that they are more likely to be looking for stock price information than the closest place that they can get a mint mocha chip frappuccino.
When a city-dwelling college student, who likes to meet up with friends at new places all the time, using his cell phone to find and map out those places, types the word “Starbucks” into his phone’s browser, the first thing he wants to see is probably a map to the nearest Starbucks.
Can a search engine be smart enough to serve a stock price quote at the top of search results to the stock broker, and a map to the college student, even if both are using hand held devices to connect to the Web?
Relevance and Informational Needs
Continue reading Google on One Boxes and Grouping Result Types to Fill Informational Needs
As a webmaster, when you put a page up on the web, there may be parts of that page that you may not want to have indexed by a search engine.
Many web pages contain information that isn’t unique to each page, such as the navigation for a site, copyright notices, advertising, links to other sites such as blog rolls, and other sections that may not contain information about the main topic of the page itself.
Yahoo’s Robots-Noindex Classes
In May of 2007, Yahoo made a post on the Yahoo Search Blog about how webmasters could let the search engine know that content in certain sections of pages shouldn’t be returned in search results to searchers, titled Introducing Robots-Nocontent for Page Sections.
Continue reading Which Sections of Your Web Pages Might Search Engines Ignore?
Yahoo was granted a patent this week which describes how anchor text in links may be used to increase the relevancy ranking of a page pointed to by that anchor text. The patent was originally filed in 2002, and it discusses how anchor text might work while naming the Altavista search engine as a possible place where the methods it describes might be implemented. Yahoo acquired the company that owned Altavista, and the technology is theirs.
While the patent is fairly old, it provides some details about how anchor text might be used by a search engine in a search index that may not be widely known.
It’s fairly common knowledge that the major commercial search engines pay attention to the anchor text in links pointing to pages, and may consider a page to be even more relevant for a query term if the term not only appears on a page, but also appears in the linked anchor text pointing to a page. Some pages may even be determined to be relevant for words that they don’t contain, but which show up in links to those pages.
Continue reading Yahoo Patents Anchor Text Relevance in Search Indexing
Search engine optimization is an ever growing and ever changing field, and as search engines and the Web change, so does SEO.
There are no classrooms, nor college courses, no single one site or conference series or book that can help you keep up with those changes.
Paying attention to a lot of blogs, news reports, press releases, and other sources of information can help provide some insights about changes in SEO, and discussions at forums and conferences and social sites can present a lot of signals and noise about what might be new in search. It’s not always easy, and not always even possible to distinquish between the signals and the noise sometimes.
I look at a lot of patent filings and papers from the search engines here because they can provide views of how search engines may work from the perspective of the search engines. I consider them primary sources because they come directly from the search engines, but even those sources often only provide glimpses of possibilities rather than actual insights into how search engines function.
Perhaps the best value that may be taken from search engine patent filings isn’t so much the processes that they describe, but rather the hints of assumptions behind some of the methods and systems that they present.
Continue reading How a Search Engine Might Rank Bookmark Sets, Playlists, Directory Pages, and other Collection Items