More all time best SEO documents:
From Resource Discovery to Knowledge Discovery on the Internet
I ended part one of this series with a document that is 50 years old. Part 2 starts with a white paper that begins by talking about the memex of Vannever Bush, and takes a look at search engines in the days before Google, and the methods that those used to index pages on the web. A great look at search engine technology circa 1998.
Efficient Crawling Through URL Ordering (pdf)
One of the challenges of SEO is making sure that web pages get indexed. If we consider the three major aspects of a search engine to be the crawling of pages, their indexing, and then how they are served, it all begins with that initial crawl.
Continue reading 100 best SEO documents of all time, part 2
I was asking myself what the classics were in the field of Search Engine Optimization. What documents would you introduce people to if you wanted to help them learn and understand what Search Engine Optimization was, and possibly what it could be.
I started a list earlier today, and quickly got up to 30 articles, patents and patent applications, white papers, forum threads, books, and more, and decided that I would list those here, and see if I could get up to 100 of the best. I’m going to try to post these over a number of days, and I consider the documents I’m listing to be nominees for the best that the industry has to offer.
Many of these come from search engineers, and a good number come from Search Engine Marketers, usability consultants, and other commentators on search engines, the web, and other folks who have played a part in making the web alive, and vibrant, and a place where commercial and noncommercial activities could thrive.
I’ve chosen a number of these documents because I’ve seen them mentioned a lot on SEO and marketing forums, where I seem to spend a lot of my time. Others I selected because they made me think about the web or marketing or the technology behind the internet in different ways.
Continue reading 100 best SEO documents of all time, part 1
Interviews are big recently in the world of search.
Just ask Aaron Wall, of SEO Book who has been on a roll lately with interviews of Dan Thies, Shawn Hogan of Digital Point, NFFC, and David Naylor.
Not quite as insightful, but still interesting, is Satirewire’s interview with Ask Jeeves.
Nick W, at Threadwatch, pointed out this morning that the Internet Archive is being sued for copyright infringement. Pages were used from the Archive in a lawsuit last year during a trade secrets dispute between two companies.
It seems that the company that is pursuing this case placed a robots.txt file exception to the use of its web pages up, which the Internet Archive usually takes as an instruction to not serve pages from a site. But, somehow the law firm representing the defendant in the case was able to procure 92 pages from the Archives to use in its court case.
Alex Wexelblat, of Corante, takes a deeper look at the case, including a link to a Star Ledger article (broken link) on the controversy, and a link to the Complaint (broken link) in the case. He also commented on this subject back when the Internet Archive files used as evidence in the original case.
Might this case have implications for search engines, and their caching of pages, under copyright law? Maybe. Though one thing that make mean that this case won’t go so far, is that material that has been copyrighted previously can be used as evidence in a legal dispute, noted as fair use of the material. A court may stop at that point without making other decisions that have far reaching consequences. As noted by Alex Wexelblat:
Continue reading Internet Archive Under Suit for Copyright Infringement
One of my favorite usability sites is surprisingly from the US government. It was built out of lessons during a redesign of Cancer.net. So many user-friendly ideas came out of the redesign, that the folks working upon it decided to share them with others at Usability.gov.
Their Research-Based Web Design and Usability Guidelines make a great tool to use when putting together a site, or trying to update one to make it friendlier to visitors.
So, why did I mention Search Engine Spam in the title of this post?
Well, according to Yahoo!, sites that aren’t very usable are spamming the search engine. On their What is search engine spam? page, Yahoo! tells us that, amongst other things, search engine spam includes “Pages that seem deceptive, fraudulent, or provide a poor user experience .”
Continue reading Usability and Search Engine Spam
A new patent application (a rare short one) from Monika Henzinger, of Google, adds a way to consider the freshness of a web page, based upon both the “last-modified-since” message received by a search spider about the page, and a review of the “last-modified-since” messages received from pages that link to that page.
See: Systems and methods for determining document freshness
There are other examples of how this works, but here’s one from the patent application:
As another example, if the number of “fresh” documents of the set of documents containing links to document p is greater than the number of “not fresh” documents of the set of documents containing links to documents (i.e., as determined by freshness attribute(s) associated with each document of the set of documents), then documents can be considered “fresh,” and a corresponding “high” freshness score F.sub.r may be assigned to documents. To illustrate, if each document of set of 100 documents containing a link to document p has a freshness attribute, such as, for example, a HTTP “last-modified-since” attribute, that indicates that 70 of the documents have been recently modified or updated and, thus, are fresh, then a “high” freshness score F.sub.r can be assigned to document p.
Continue reading Google Gets Minty Fresh
Inviting people to a town on the Chesapeake to hang out, and share some laughs, some good food, and some thoughts on internet marketing wouldn’t have been complete without providing a chance for those folks to sail around on the waterways.
I chartered a tour today for SEO on the SEA for a cruise on one of the last of the Chesapeake oyster ships to have been built, the Skipjack Martha Lewis. It seems kind of ugly to refer to the skipjack as an “oyster dredger” because that seems to imply something slow, and unslightly. Loren Baker, who knows much more of the history of Maryland watermen than I do, tells me that skipjacks were built to be swift on the waterways. They had to be. The first ones to the oyster beds were usually the ones who ended up with the best hauls.
And, relations between competitors weren’t always friendly. Border skirmishes happened, and the open waters were often laid claim to by strength of arms. There were even times when the government took action against those who harvested the riches of the sea. The oyster wars often saw watermen and government forces clashing.
Image from the Library of Congress, reference number LC-USZ62-76142, originally published in Harper’s Weekly, Mar. 1, 1884.
Continue reading The SkipJack Martha Lewis has been Chartered
Keepgoing.org has a great history of one of the first great web sites – the online Mad Magazine of its time. In The Big Fish, they take a look at Suck.com, ten years after its launch.
I came to the party late, and didn’t learn about suck.com until it had closed its doors, and stopped publishing. But this story is a great one, and there are probably a lot of lessons here to be learned by anyone interested in putting a web site online.
Promotion in the days before search engines made it big? Here’s how suck.com got the word out:
Anuff collected every magazine he could locate, at the Wired offices and at home, until he had a stack of perhaps 200, which he combed through, writing down every email address he found. “Every published email address of any journalist period ended up on this master list, and we spammed them all when we launched.” After that, there was little else to do except watch the server traffic, and wait.
Continue reading Who Knew the Web Would Suck.com? Promotion in the Days Before Search Engines