More all time best SEO documents:
I ended part one of this series with a document that is 50 years old. Part 2 starts with a white paper that begins by talking about the memex of Vannever Bush, and takes a look at search engines in the days before Google, and the methods that those used to index pages on the web. A great look at search engine technology circa 1998.
One of the challenges of SEO is making sure that web pages get indexed. If we consider the three major aspects of a search engine to be the crawling of pages, their indexing, and then how they are served, it all begins with that initial crawl.
So, how does a spider decide which pages to visit, and which URLs to go to after it has collected information about a page. The ideas of “importance metrics” and “ordering metrics” in determining which pages are collected by those spiders and sent on to be indexed can be helpful in understanding how a search engine works.
Take the previous document, add a couple of years of research and the mentor-ship of some bright people in the field of search, and you might end up with a thesis statement like Crawling the Web…
Sure, it is 188 pages long. But it introduces concepts of freshness, of determining “importance” early in a crawl, and how to parallellize a crawl process to speed up the indexing of web pages. The conclusion, with its look at future work has some interesting things to say about the increasing separation of style and content on web pages, and the flourishing of dynamic web pages.
Kind of whimsical and fun, and yet serious, too. Danny Sullivan’s idea of search engines that are smart enough to serve information appropriate to the query used is something that we are likely moving towards.
I mentioned above that search engines have at least three major functions: crawling, indexing, and serving. Just what does the idea of invisible tabs mean to the serving of results and the practice of SEO? Something to think about.
It’s funny how often I refer to this document when talking about how to come up with text for titles of pages. I’m sure that most folks consider it an article that comes under the umbrella of usability rather than SEO, but it’s filled with some good sense.
I remember thinking of it in the late 90s when submitting pages to regional and topical directories, and having to come up with titles that were only so many characters long, and short descriptions, longer descriptions, lists of keywords.
You may be asking yourself after seeing this document, and the one before it if this is really a list of the best SEO documents, or if I’ve taken a turn towards usability. I have, but there’s a reason. Deciding which words to choose to optimize for, and to put on your pages isn’t just about finding words that somewhat fill the topic of the site, and get a decent amount of search traffic.
It helps that you understand the objectives of the site owner, and have enough of an awareness of the targeted audience of a site that you know which words people expect to see on the site, and which words will build confidence in them that they have arrived at the right place.
I’ve brainwashed myself into believing that good keyword phrases are often good trigger word phrases too. It’s the scent of information mentioned in this article that gets people to click on a link in search results. That link is most often the page title selected for the page.
In SEO, you’re building pages that rank highly in search engines, but you are also building them for the people who use those search engines.
I know a funny look comes across my face when someone in the office uses the expression “Keyword Density.” Sure, it’s a term that is easy for a client to understand. But, it’s a little misleading. Dr. Garcia’s article explains why. The last two articles described how we can use words on a page. This one looks at how search engines may be indexing those words. A good thing to consider when creating the content of a page.
This is possibly one of the most viewed threads in a forum that focuses on search engine optimization. Just as it helps to have some idea of how usability can affect how well pages do in search engines, some knowledge of information retrieval (IR) can also make a difference.
The thread starter is Dr. Garcia, who often posts in forums under the name Orion. He has made an impact with his introduction of IR concepts into the world of SEO, in the heart of an SEO forum.
Another information retrieval concept that has been mentioned in SEO forums is Latent Semantic Indexing (LSI). This paper does a nice job of avoiding math, and explaining the notions behind LSI in a manner that is fairly easy to understand.
Are search engines using LSI? Maybe. Is it worth learning about? Definitely.
I also don’t know if TrustRank is being used by search engines. I don’t know if the criticism in this article is well-founded, or is misleading. Sure, there’s a white paper on TrustRank, and a patent was applied for to go with it.
What I like so much about this blog post is that it shows the type of critical thinking that I believe is important to bring with you when reading a white paper from the search engines, or a patent application, or a forum thread or SEO article. I think that might be important to keep in mind when reading any of these 100 best SEO documents of all time.