100 Best SEO Documents of All Time, part 4

Almost seven years ago, I started thinking about what documents I would recommend that people read if they wanted to learn as much about SEO as possible. SEO by the Sea was a little more than a couple of months old, and I started a series of posts that I called the “100 best SEO documents of all time.” I started the series knowing the first 30 papers, blog posts, and patents that I wanted to include in the series, and somehow never got past those first thirty.

The posts were the three posts immediately before the gathering that originally gave SEO by the Sea its name. I went from blogger to event organizer, and never quite returned back to the series that I started. In the past couple of days, the first post got some attention on Twitter, and I promised to update the series.

The next ten documents are ones that I’ve been thinking about quite a bit after reading them, and what they might mean for the future of search.

At the Intersection of Search and Social

The Anatomy of a Large-Scale Social Search Engine (pdf) – April, 2010

This first paper was written by the Aardvark team in the days before the company was acquired by Google, and the title was part homage to the original Google paper, The Anatomy of a Large-Scale Hypertextual Web Search Engine. It described a search that was more inspired by how people find information in a village than in a library. The Aardvark project was discontinued by Google, but I wouldn’t be surprised if it returned as part of the intersection between search and social networking in Google’s Search Plus Your World.

Ranking User Generated Web Content – October, 2009

I wrote about this patent filing in the post, How Google Might Rank User Generated Web Content in Google + and Other Social Networks, and there’s a decent chance that the credential scoring described in the patent filing is similar in many ways to that which is used by Google to rank social search results. The patent gives us a look at some of the signals that a search engine might look at to give reputation or credential scores to people who write blog posts or microblog posts at places like Google Plus, and at Q&A type sites. It also provides a look at how the meaningfulness of responses and interactions might be measured, and how authority and expertise in different topics might be determined.

Bigger Index Using Smaller Files and Incremental Updating

Some of the processes and technologies described in many of the patents and papers that come from the search engines aren’t quite ready for prime time. It’s not because they might not contain good ideas, but rather because the technology might not have caught up to them yet. And then we get advances in technology that make changes possible. One of those changes was an infrastructure update at Google with the name Caffeine. The paper and the patent that follow describe changes at Google that made the search engine much faster, as well as capable of holding a lot more information with the same amount of servers.

Large-scale Incremental Processing Using Distributed Transactions and Notifications – November, 2010

The Percolator-based indexing system (known as Caffeine [25]), crawls the same number of documents, but we feed each document through Percolator as it is crawled. The immediate advantage, and main design goal, of Caffeine is a reduction in latency: the median document moves through Caffeine over 100x faster than the previous system.

Document treadmilling system and method for updating documents in a document repository and recovering storage space from invalidated documents – February, 2006

Predicting User Behavior with Web Page Features

While the past couple of documents looked at infrastructure updates at Google, the next two look at processes that involve handling very large amounts of data. The first of the papers describes how those types of data sets might be implemented using Google’s Map Reduce system. The second paper takes what was learned in the first paper, and applies it to features identified in landing pages and advertisements to predict the bounce rates of sponsored advertisements. The papers tell us that the methods developed could be used with other large data sets, like pages in web search, to predict user behavior based on features found in a sample set of pages.

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce – August. 2009

Predicting Bounce Rates in Sponsored Search Advertisements – June, 2009

What’s a link worth?

The next patent was one that verified something that many of us suspected for a few years, verified by Google’s Matt Cutts and the director of product management for Yahoo! Search Technology, Priyank Garg. The July, 2008 interview with Priyank Garg by Eric Enge and the patent confirmed hunches about changes to the way that the major search engines were treating links. Instead of each link on a page carrying the same value as any other link, we learned about a number of features that the search engines might consider when determining how much weight they might pass along.

Ranking documents based on user behavior and/or feature data – Filed June, 2004

My write up of the patent cuts through some of the legalese – see: Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data

Big Big Data

The Unreasonable Effectiveness of Data – March/April, 2009

Great article, and an even better video on how having access to a very large amount of data can even make weak algorithms provide useful data. The video is around an hour long, but it’s highly recommended.

Organizing and Searching the World Wide Web of Facts Step Two: Harnessing the Wisdom of the Crowds – May, 2007

The new knowledge base results at Google and Bing don’t just pull information from sources like Wikipedia and Britannica, but they also look to query logs from the search engines to understand the kinds of things that searchers might be looking for when they perform searches. As the paper tells us, millions of searchers add to the data in search query logs daily, and those “facts” about the things being searched for can help the search engines learn what searchers may be interested in when they perform a query.

I’m going to try to finish this series over the next 6 weeks, with a new post every week. What we end up with may not be the 100 best SEO documents of all time, but hopefully we’ll get a lot of the really good ones in here.

Posts in this series of the 100 Best SEO documents:

Share

25 thoughts on “100 Best SEO Documents of All Time, part 4”

  1. Do the equations described in the book “Anatomy of a Large Scale Social Search Engine” change every time the search engine updates its algorithms. I mean like Google has its own Panda,Penguin etc.

    Do these algorithms relate to the changes in the the search engine update. I am actually interested to know what a search engine algorithm involves ?

  2. Thats a great list of valuable documents.
    Can you also add the publishing date of each document (at least month/year or just year). Maybe this would be helpful.

  3. SEO is so undervalued. During the week I met by chance a businessman who was scratching his head over why his website was likely a deserted mid west town with very little footfall. When I mentioned about 10-12 SEO tools I use his eyes lit up. Now I need to use some spare time in helping people uplift their sites so interesting times ahead. Will reference back to this post and the upcoming series. Thanks Robert

  4. Can’t help but think the saying, “it takes a village..” will finally have meaning once the Aardvark team gets the right tweaks on social information retrieval over the open graph.

  5. Absolutely love your collection here. What a resource! It’s amazing that you’ve been able to gather such a tremendous digital library of seo reference documents. As a practicing (and studying!) seo professional I can say that your blog is the most valuable one on the web. I’d pay for your blog. Thanks.

  6. I remember when the Facebook Like button came out my initial reaction was links were toast. I also thought that when Google+ came out too. I glad to say I’m pleasantly surprised how it all turned out. Links will never be toast but it’s interesting to watch how search engines utilize other technologies to value their worth.

  7. I had one important doubt regarding SEO. When we have duplicate content on the site, does the SEO of the entire website go down or does just the pages with duplicate content get effected. I have noticed that when duplicate content happens, the traffic of the other pages also gets effected, which are not duplicates. Is this really the case or is this just me assuming things?

  8. My worry with reliying on trusted old documments is that with the rate of change implimented by Google they can quickly become out of date and could even have an adverse effect when they change the rules again. Having said that as a newbie following a set of guidelines to optimise your site can really help.

  9. Very interesting list that you have here Bill. I personally do not agree there are such things as ALL TIME best articles or documents especially for SEO. Since SEO (and the Internet generally) evolve so rapidly. Big G makes hundreds of update every year. So what consided as excellent reading might no be so useful anymore even in 1-2 year time.
    However again great resource that you have here. Thanks

  10. Thanks, that’s indeed a lot of reading you propose. However, I actually do agree with you that you can say that there in fact are some “all-time best” documents on SEO, even if it evolves fast. Some of it will always be true, even if we implement our techniques differently from time to time. Also, there is indeed a quality factor of these sources.

  11. There are penguins, there are pandas, and yes, Google does raise pulses with their updates frequently, but the principles remain the same in my opinion. Example: Before we used idiot-proof safety matches to light up the delicious Sunday bbqs we now enjoy, our hairy ancestors would have to bust their humps using the ‘fire plough’ method (rubbing one stick against another) in order to enjoy their prime t-Rex rib. Things don’t change, they just evolve. And it sure helps knowing where you come from, right? Good reading, I’d say. :)

  12. Absolutely love your collection here. What a resource! It’s amazing that you’ve been able to gather such a tremendous digital library of seo reference documents. As a practicing (and studying!) seo professional I can say that your blog is the most valuable one on the web. I’d pay for your blog. Thanks.

  13. List is looking good so far Bill. Although, I guess due to the nature of the beast – it’s one that can never be completed anyway..

  14. Looks like I’m in for some reading! Especially the social stuff

    Amazing how the bounce rate doc in 2009 has still relevance in terms of stuff that is currently being rolled out and will prob be rolled out into the future. Scary that with enough data, you can work out what works and what wont even before people have visited the page to the extent it manipulates search engine results.

  15. This is a fantastic resource and a big well done to Bill for putting all of this together. I totally agree with Nelsons comments, Google gives everyone a gentle nudge now and again with its updates, but ultimately the fundamentals of SEO remain the same.

  16. I wish I had the time to read 100 articles! SEO is so time consuming I pretty much dont even have the time to learn more about improving my own SEO techniques. Its like a lose-lose situation, especially when you’re running a one-man operation trying to juggle 3 job roles at once (owner, webmaster, seo guy). If only it was cheaper to hire an SEO specialist, but i do understand why they charge so much, this stuff ain’t easy! despite what most blogs seem to tell u

  17. This is great — I’m glad you are updating because it seems like SEO tips have a half-life of 30 seconds. Would you recommend starting with your most recent documents or do you feel like the ones from your first post are still very applicable and are still your first choices?

  18. Keep going Bill – a great collection. Love Kendall’s words about half-life of 30 secs – SEO is non stop and so darn changeable -thank goodness it’s still work the hassle to keep learning – paying Google for PPC would be death!

  19. Hi Bill

    Having not visited your blog for a little while, I thought I should pop back for a refresher! So was thrilled to see you’ve reintroduced this series on SEO… its gonna be very use to read though and remind myself of the key points. As always, keep up the great work and thank you.

    Cheers

    Ross

  20. I am pretty much a newbie at SEO and have a lot to learn. It seems like the rules are constantly in flux. There is a ton of information on the web that talks about general best practices (optimizing code, getting sites to link to you) but the specifics of SEO seem to be a black art (mysterious to me at least). If I were to start from the beginning and build sites for that would perform best for today – what would be the best resources for study? I wish I had time to read through 100 articles…

  21. Having these resources was a great help! These resources have gave me a better understanding of SEO, I can’t begin to thank you enough for these very valuable documents! If you are new to SEO, take your time and read some of these resources, it’s hard to find quality tips and resources on the net, but when you do find quality information like this, man it feels good! Keep it up Bill, thanks!

  22. Great documents here, and very helpful for anyone who’s focusing on SEO techniques.

    In my opinion, one of the most interesting as well as valuable documents here is the on “what’s a link worth?”

    This shows clearly that search engines are, or at least planning on, valuing links on the same webpage in a different way, based on a number of possible factors that they can consider. This is surely something to look into and study up on!

    Thanks for sharing all of these excellent and very useful documents!

Comments are closed.