How a Search Engine Might Analyze the Linking Structure of a Web Site

How well do search engines understand the linking structure of a web site? Do they have ways to organize and classify individual links and blocks of links that they see on the pages of a site?

Do they treat links and collections of links that they find on more than one page of a site differently than links and collections of links only on one page? If they find more than one group of links on a page that contain many of the same links, though at the top and bottom of the page, how might they treat those links?

I came across a patent filing from Microsoft from last summer that explored many of these topics, as well as others. It hadn’t drawn much attention, so I decided to take a closer look at it here.

Segmentation and Link Blocks

Continue reading “How a Search Engine Might Analyze the Linking Structure of a Web Site”

Creating an SEO Content Inventory

One of the things that I like to do for sites that I work upon is to create an SEO content inventory.

I find it helpful to have information all in one place about the content that might appear on different pages of a site, and it can be very useful as a planning tool. The idea isn’t new, and usability.gov has a nice description of why it can be helpful to conduct a content inventory on their pages from a design stance.

Jeffrey Veen also published a post a number of years ago about using a tool like this when he works on information architecture and design issues for clients, in Doing a Content Inventory (Or, A Mind-Numbingly Detailed Odyssey Through Your Web Site).

One of the differences between the approach that usability.gov and Jeffrey Veen use, and the one that I like to use is that I include more details involving search engine optimization. For instance, in my inventory, there’s a space for the “present” page title, meta description, and meta keywords, and “future” title, meta description, and meta keywords.

Continue reading “Creating an SEO Content Inventory”

Using Rare Words to Estimate Search Engine Index Sizes

Can looking at how many times rare words appear in a search engines index give us an idea of the size of the database for that search engine?

About a week ago, I wrote about some of the most common English words in the indexes for Google, Yahoo, Bing, Ask, and Google Caffeine. I took a look at 50 words that are amongst the most frequently appearing words in English, and estimates from those search engines about the number of times that those words showed up.

Comparing the number of results between the different search engines for those common words really didn’t tell us anything about the relative sizes of the indexes for those search engines for a number of reasons.

One is that the number of results shown are rough estimates only. It’s also possible that the way that estimates are calculated from one search engine to another are very different. Some of the pages listed among those results are likely duplicate pages at different URLs, or may have contained misspellings of the words. Some of the words may be abbreviations or acronyms, as well (such as “it” being an abbreviation for information technology).

Continue reading “Using Rare Words to Estimate Search Engine Index Sizes”

Should Webmasters Pick Their Own Quicklinks in Search Results?

Sometimes Google, Yahoo, and Bing will show additional links for a search result under the description for that result. These are often referred to as Site Links, Sitelinks, or Quick Links by the search engines. An example is the sitelinks that Google shows in a search result for the WordPress site when someone searches at Google for wordpress:

Google search result showing sitelinks for wordpress.com

None of the search engines presently allow site owners or webmasters to choose the links that show up as sitelinks or quicklinks in those search results. Google provided some hints as to how sitelinks might be chosen in the description of the patent on Google Sitelinks. Yahoo also gave us some information about the sources of information that they use when they include quicklinks for a search result in a white paper on Yahoo quicklinks. Microsoft also released a whitepaper on how they might include links such as sitelinks to give searchers a chance to find what they call final destination pages.

Some of the choices for sitelinks and the text for those links that Google chooses aren’t really helpful for searchers or ideal for webmasters, such as the the site link at the bottom right in an “SEO by the Sea” search result as seen in the next image:

Continue reading “Should Webmasters Pick Their Own Quicklinks in Search Results?”

Google News Rankings and Quality Scores for News Sources

Are large news agencies, with a wide scope of international coverage on multiple topics, with large numbers of reporters, and finely edited articles better sources of news than smaller and more local papers, or narrow niche blogs?

A patent on ranking articles in Google News was granted this week that was originally filed in 2003, and it discusses a number of ranking factors that it might use to present news article based upon the “quality” of the news sources involved.

What is very interesting about it is that it provides some insight into the assumptions behind those ranking factors. I suspect that Google may have changed their stance on some of the assumptions behind those factors since then.

The patent doesn’t include a full range of signals that Google probably considers in ranking news stories, such as the freshness of the news (as noted in Google’s patent filing on Universal Search), or whether or not a certain source is the original.

Continue reading “Google News Rankings and Quality Scores for News Sources”

Most Common Words in Google, Yahoo, Bing, and Ask, with Google Caffeine

Just which words show up most frequently on the Web? I’m not sure that question can be answered, but it’s something I’ve wondered for a while.

With a beta version of Google’s future update, code named Caffeine recently released to allow people to experiment with, I thought I would do a few comparisons.

I found a few lists of the most common words in the English language, and came up with a top 50 to see how frequently those were estimated to show up in Google, Yahoo, Bing, Ask, and Google Caffeine. Those are shown in a table and a chart below.

I’m not sure how informative this might be, even after looking at it. It’s not a very scientific test as well. There are a few reasons for that:

Continue reading “Most Common Words in Google, Yahoo, Bing, and Ask, with Google Caffeine”