100 best SEO documents of all time, part 2

More all time best SEO documents:

From Resource Discovery to Knowledge Discovery on the Internet

I ended part one of this series with a document that is 50 years old. Part 2 starts with a white paper that begins by talking about the memex of Vannever Bush, and takes a look at search engines in the days before Google, and the methods that those used to index pages on the web. A great look at search engine technology circa 1998.

Efficient Crawling Through URL Ordering (pdf)

One of the challenges of SEO is making sure that web pages get indexed. If we consider the three major aspects of a search engine to be the crawling of pages, their indexing, and then how they are served, it all begins with that initial crawl.

So, how does a spider decide which pages to visit, and which URLs to go to after it has collected information about a page. The ideas of “importance metrics” and “ordering metrics” in determining which pages are collected by those spiders and sent on to be indexed can be helpful in understanding how a search engine works.

Crawling the Web: Discovery and Maintenance of Large Scale Web Data (pdf)

Take the previous document, add a couple of years of research and the mentor-ship of some bright people in the field of search, and you might end up with a thesis statement like Crawling the Web…

Sure, it is 188 pages long. But it introduces concepts of freshness, of determining “importance” early in a crawl, and how to parallellize a crawl process to speed up the indexing of web pages. The conclusion, with its look at future work has some interesting things to say about the increasing separation of style and content on web pages, and the flourishing of dynamic web pages.

Searching With Invisible Tabs

Kind of whimsical and fun, and yet serious, too. Danny Sullivan’s idea of search engines that are smart enough to serve information appropriate to the query used is something that we are likely moving towards.

I mentioned above that search engines have at least three major functions: crawling, indexing, and serving. Just what does the idea of invisible tabs mean to the serving of results and the practice of SEO? Something to think about.

Microcontent: How to Write Headlines, Page Titles, and Subject Lines

It’s funny how often I refer to this document when talking about how to come up with text for titles of pages. I’m sure that most folks consider it an article that comes under the umbrella of usability rather than SEO, but it’s filled with some good sense.

I remember thinking of it in the late 90s when submitting pages to regional and topical directories, and having to come up with titles that were only so many characters long, and short descriptions, longer descriptions, lists of keywords.

Getting Confidence From Lincoln

You may be asking yourself after seeing this document, and the one before it if this is really a list of the best SEO documents, or if I’ve taken a turn towards usability. I have, but there’s a reason. Deciding which words to choose to optimize for, and to put on your pages isn’t just about finding words that somewhat fill the topic of the site, and get a decent amount of search traffic.

It helps that you understand the objectives of the site owner, and have enough of an awareness of the targeted audience of a site that you know which words people expect to see on the site, and which words will build confidence in them that they have arrived at the right place.

I’ve brainwashed myself into believing that good keyword phrases are often good trigger word phrases too. It’s the scent of information mentioned in this article that gets people to click on a link in search results. That link is most often the page title selected for the page.

In SEO, you’re building pages that rank highly in search engines, but you are also building them for the people who use those search engines.

The Keyword Density of Non-Sense

I know a funny look comes across my face when someone in the office uses the expression “Keyword Density.” Sure, it’s a term that is easy for a client to understand. But, it’s a little misleading. Dr. Garcia’s article explains why. The last two articles described how we can use words on a page. This one looks at how search engines may be indexing those words. A good thing to consider when creating the content of a page.

Keywords Co-occurrence and Semantic Connectivity

This is possibly one of the most viewed threads in a forum that focuses on search engine optimization. Just as it helps to have some idea of how usability can affect how well pages do in search engines, some knowledge of information retrieval (IR) can also make a difference.

The thread starter is Dr. Garcia, who often posts in forums under the name Orion. He has made an impact with his introduction of IR concepts into the world of SEO, in the heart of an SEO forum.

Patterns in Unstructured Data

Another information retrieval concept that has been mentioned in SEO forums is Latent Semantic Indexing (LSI). This paper does a nice job of avoiding math, and explaining the notions behind LSI in a manner that is fairly easy to understand.

Are search engines using LSI? Maybe. Is it worth learning about? Definitely.

Google: TrustRank, much ado about nothing?

I also don’t know if TrustRank is being used by search engines. I don’t know if the criticism in this article is well-founded, or is misleading. Sure, there’s a white paper on TrustRank, and a patent was applied for to go with it.

What I like so much about this blog post is that it shows the type of critical thinking that I believe is important to bring with you when reading a white paper from the search engines, or a patent application, or a forum thread or SEO article. I think that might be important to keep in mind when reading any of these 100 best SEO documents of all time.

Posts in this series of the 100 Best SEO documents:


6 thoughts on “100 best SEO documents of all time, part 2”

  1. Hi,
    Interesting list and especially that in seo/sem, which is changing so fast, it is rather difficult to choose such best documents and publications. Post from 2005 and still valid. Of course I have read only a few of them :)

  2. Hi Adam,

    These were some hard choices, and I never quite got past the first 30 or so. The problem wasn’t so much that there weren’t good choices left, but rather that there were possibly too many. Since that time, I’ve spent a lot of posts on patents from the search engines, and academic and industrial whitepapers. I’m not sure that I could pick a top 10 or top 100 from those. Maybe I’ll try sometime soon.

  3. There’s no need to pick out the best ones, just read a few and you’ll start seeing the same patterns. I still refer to the original Stanford paper that Larry Page and Sergey Brin wrote. Up until Panda and even now, those analogies still apply.

Comments are closed.