In 2003, a paper titled Semantic Search was published by Ramanathan Guha of IBM, Rob McCool of Knowledge Systems Lab at Stanford University, and Eric Miller of W3C and MIT. Their stated goal in the paper was to take technologies such as web services and the Semantic Web and use them to “improve traditional web searching.”
This was written before Ramanathan Guha joined Google, started Google Custom Search Engines, created Google’s version of Trust Rank, and introduced Schema.
I’m working on putting together a history of the Semantic Web at Google. This early look at Semantic Web provides insights from at least one person who played a major role in how the Semantic Web is becoming part of Google search.
A section of the Semantic Search paper introduces four important concepts about Semantic Search that are still around today and are worth thinking about if you do SEO. Before those, though…
Navigational Searches vs. Research Searches
Before the paper even gets to those concepts, though, it introduces two different types of searches:
Navigational Searches: In these searches, a searcher submits a phrase or combination of words that are expected to find in documents on the Web. A straightforward, reasonable interpretation of these words is not asked for in terms of denoting a concept. The searcher uses a search engine as a navigational tool to find a particular intended document. This is the goal of most SEO, and the authors of this paper tell us, “We are not interested in this class of searches.”
Research Searches: a searcher provides the search engine with a phrase intended to denote an object about which the user is trying to gather/research information. The searcher doesn’t have a particular document in mind and doesn’t even guess at the existence. The searcher hopes that some documents that provide such answers may exist and will give him/her the information s/he is trying to find. These are the class of searches the authors tell us they are interested in when they use the phrase “Semantic Search.”
Google seems to be evolving towards becoming a search engine that can be useful in helping people with both types of queries.
Here are other elements that the paper tells us that can help us distinguish between the Semantic Web and the HTML Web.
Documents vs. Real-World Objects: – When we think about the HTML web, we think of a web filled with web pages, with pictures, videos, and other documents that a web crawler such as Googlebot might crawl, and use things such as links between them, and the relevance of words that appear upon them or with them or pointing to them (in the anchor text) to rank in search results, and to help us find them.
Unlike the HTML Web, The Semantic Web isn’t a Web of documents, but instead a “Web of relations between resources denoting real-world objects, i.e., objects such as people, places and events.” When something happens to one of these real-world entities, the information about them on the Semantic Web should change. The paper contains a picture of what looks like an early knowledge panel, which was interesting to see:
Human vs Machine Readable Information: – The important point about the Semantic Web is that it contains rich machine-readable information about resources. While most HTML on a web page tells visitors how the page should be displayed on a browser, most of the data on that page is almost all machine-understandable.
Relation between the HTML & Semantic Web: – The document tells us that the Semantic Web is an extension of the current Web, and that “there is a rich set of links from the nodes in the Semantic Web to HTML documents.” The HTML Web and the Semantic Web are supposed to be connected, and they help one another by connecting the two.
This paper was written before markup like Schema was created, and it tells us that while some pages may contain some semantic markup. As I noted at the start of this paper, author R. Guha was the person who officially introduced Schema to the world at Google in the Official Google Blog in 2009 – six years after this paper was published.
The last point raised in the paper is an important one:
Distributed Extensibility: – Different sites may contribute data about a particular resource. Amazon.com may have data about Yo-Yo Ma’s albums. eBay might contain data about auctions related to Yo-Yo Ma. TicketMaster may carry data about his concert schedule. AllMusic has data about where he was born (Paris); none of these sites needs permission from some centralized authority to include information about Yo-Yo Ma. As the paper says, “they can all extend the cumulative knowledge on the Semantic Web about any resource in a distributed fashion.”
11 thoughts on “The Early Days of the Semantic Web”
I always learn something new from your each and every blog post. Thanks for sharing the article by Ramanathan Guha on Semantic Search and explaining the Navigational Searches vs. Research Searches concept.
sweet – can’t wait to read the history of the Semantic Web at Google post 🙂
Sweet – Look forward to the most authoritive version about “Semantic Search and The Web” by Bill Slawski
i also expect to see the numerous failed attempt by those who tried various models and the patents that led upto the current version. These are always quite revealing in substance and have some historical markers leading into other areas.
Google, Bing, Yahoo, etc. are trying to accommodate multiple query types:
– Navigational, informational, transactional (based on Andrei Broder’s Taxonomy of Web Search)
– Site finding search
– Ad hoc search
– Entry page search
– Named page search
– Known-item search
I know there are more but these are the ones I remember. Something about showing commercial intent, too.
Shared! Thanks for your post. It’s hard to believe how far we’ve come in only 10 years.
You’re welcome, Miraj.
Happy to hear you enjoyed the article, and the different types of searches as he defines them.
I’m working towards it. 🙂
Hi Dr. Robert
I’m not sure that I will spend a lot of time in “failed” attempts. I don’t think I want to try to write something book-sized. But maybe that’s were the most interesting things will be. I guess we will see.
This is a different type of search than the “navigational” search described by Guha, McCool, and Miller, and the different kind defined in Broder’s paper.
That paper, and some of the other types of searches that you are describing are much more rooted in indexing of Web pages, rather than the indexing of data on the Web regardless of which page it appears upon, the way that the Semantic Web is. I Would love to hear a discussion between R. Guha and A. Broder in the halls of Google.
Broder does address some of these issues in a presentation from 2010 in which he is joined by some other authors: The new frontiers of Web search: going beyond the 10 blue links – http://www.eurospider.com/fileadmin/pdf/SIGIR_Industry_Track_2010/05_SIGIR-2010-BRODER.pdf
It would help if you gave some real life concrete examples of how both types of searches matter to the average person using search cababilites to find info on the web. This would make the info more relatable annd understandable for the general population.
Thank you for your suggestion. The paper that I wrote about from Drs. Guha, McCool, and Miller, didn’t provide examples like that, and therefore I didn’t either. I don’t think they were writing for the general population, and I rarely do as well.
We are seeing Google trying to provide answers to queries searched for on the search engine that respond and try to provide actual answers to those questions, rather than lists of links to web pages that might attempt to answer those questions.
That may be one major change that we see at search engines as they try to be more responsive towards searchers, and provide data as opposed to providing URLs as answers to searches.
Comments are closed.