In Characteristics of the Web of Spain (pdf), by Ricardo Baeza-Yates, Carlos Castillo, and Vicente López, the authors take a close look at the web sites of Spain, and find a number of interesting results.
The paper was published last year, but I don’t see a lot of citations to it from English language sites listed in Google, and it probably deserves a lot wider readership.
One of the hurdles that the authors faced was identifying which sites were from Spain. The cost of a .es domain name is considerably more expensive than a .com name, and to use a .es domain name, a site owner needs to “prove that the applicant owns a trade mark, or represents a company, with the same name as the domain being registered.”
By taking sites that had IP addresses from networks physically located in Spain and sites with an .es top level domain (tld), these researchers were able to look at over 16 million web sites.
The executive summary of the paper lists a couple of the findings of their work:
63% of the studied Web sites are not linked to by other Web sites in Spain, which makes them harder to find.
About 60% of the sites on the Web of Spain has only one indexable Web page, and about half of these sites have other pages, but those other pages are difficult or impossible to access by current Web search engines.
But there are many others to ponder, and the detailed rationale for some of the decisions they make in their study, and the resources that they point to in backing those decisions make this a document worth spending some time with.
Great study. I’d love to see a similar one done with the United States Web.
- Ricardo Baeza-Yates
- Carlos Castillo
- Vicente López
- Cybermetrics VOLUME 9 (2005): ISSUE 1. PAPER 3 (a slightly shorter version of the paper).
- Characterization of National Web Domains (pdf – 4.11 mb) from Ricardo Baeza-Yates, Carlos Castillo, and Efthimis N. Efthimiadis (Looking at 120 million pages from 24 different countries)