Google Defines Semantic Closeness as a Ranking Signal

This post may get you thinking about the benefits of using heading elements and lists on web pages for SEO purposes from a slightly different perspective than you may be used to.

Google uses a large number of signals to decide upon the order of pages shown in search results. Some of those signals measure the quality or importance of a web page, while others may indicate how relevant a page is for a particular search query entered into a search engine’s search box.

One fairly obvious relevancy signal is whether or not the words in a query actually appear upon a page that might be a search result for that query. If those words appear on the page more than once, the page might be considered even more relevant for that particular query than other web pages where the terms only appear once, or not at all.

Another factor that might indicate how relevant a page is for a particular set of terms is how close those terms might be on a page. While you could easily count the number of words between individual query terms to determine how close they are to each other, the formatting of web pages presents some challenges to the approach of simply counting words between terms, such as in a list like the following:

An example HTML list, using the heading Saturn Facts and listing a number of astronomical facts about Saturn, including orbit, rotation period, mass, volume, and distance from the sun.

Imagine that the list in the image above is all that appears upon a particular web page. Since every item listed is about Saturn, as shown by the heading of the page, it could be said that semantically each list item is equally relevant to Saturn in terms of closeness, even though the items listed grow in visual distance from the heading of the list when calculated by the number of words between “Saturn” and list items.

This way of calculating semantic closeness means that the page this list appears upon is equally relevant for the terms “Saturn Mass”, “Saturn Volume”, and “Saturn Rotation.”

A Google patent granted this week explores how the search engine might view how close words are together when they appear in semantic structures like a list, to determine how relevant a page might be to queries that contain those words.

The patent was filed back in 2004, but it provides a way of thinking about how semantic structures on web pages might be interpreted by a search engine in a way that might not be obvious on its face.

Document ranking based on semantic distance between terms in a document
Invented by Georges R. Harik and Monika H. Henzinger
Assigned to Google
US Patent 7,716,216
Granted May 11, 2010
Filed: March 31, 2004

Abstract

Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents.

The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search query.

HTML Formatting used to Determine Semantic Structures

One part of the process behind this approach involves a search engine analyzing the HTML structures on a page, looking for elements such as titles and headings on a page, unordered lists (<ul>) and ordered lists (<ol>), nested tables, divs, and line breaks (<br>) that might be used to layout a list of items on a page.

Page headings might use an actual heading element such as an <h1> or a larger sized font such as <font size=16>, and text below that heading might be considered to belong to the heading.

In other words, the search engine is attempting to locate and understand visual structures on a page that might be semantically meaningful, such as a list of items associated with a heading. We’re told that this process may also look for other kinds of meaningful semantic structures other than just lists as well.

The patent gives us the following rules about headings and list items, when it comes to the distance between words appearing within them:

  1. If both terms appear in the same list item, the terms are considered close to one another;
  2. If one term appears in a list item and the other term appears in header, this pair of terms may be considered to be approximately equally distant to another pair of terms that appear in header and in another of the list items;
  3. Pairs of terms appearing in different list items may be considered to be farther apart than the pairs of terms falling under 1 and 2.

So, in the Saturn example above, the words “Saturn” (from the heading of the list) and “Distance” (from the last list item) are considered closer together than the words “Days” and “Rotation” even though “Days” is the last word of the first list item and “Rotation” is the first word of the second list item.

The HTML list from above showing that Saturn and Distance are semantically closer than Days and rotation.

Conclusion

This Google patent was filed way back in 2004, but it does present some interesting ideas about how the search engine might look to semantic structures like lists to determine one aspect of how relevant a page might be for a particular query.

Are you thinking about headings and lists differently now?

Share

48 thoughts on “Google Defines Semantic Closeness as a Ranking Signal”

  1. I’m glad I stumbled upon this article. Does this mean that having a list in your content will help you in terms of SEO? How significant is this as a factor for rankings?

  2. Search engines are getting smarter and smarter. I’m particularly awed by their ability to look for meaningful semantic structures. It’s something to think about.

  3. I’ve always avoided listing things thinking that it might not be good for SEO but this post of yours have enlightened me that links are not bad after all. Thank you.

  4. Hi Dominic,

    This concept of semantic structures in SEO could be helpful in optimizing pages. It provides a way of thinking about how a search engine might determine which pages might be relevant for certain queries that may be different than what people may have suspected in the past.

    Lists tend to be easier to read on a monitor than solid blocks of text, and they can be a useful way of presenting information in a manner that can be scanned easily by a visitor to a page. It’s likely that Google would use the distance between different words on a page as one signal of how relevant that page might be for those words if they appeared within a query, and this concept of semantic closeness within structures like lists adds something significant to that concept.

  5. Hi Andrew,

    You’re welcome. From a pure usability stance, bulleted lists can make it more likely that people will read the content on your page. See Jako Nielsen’s article How Users Read on the Web

    The patent from Google points to the possibility that lists may also be beneficial from an SEO stance as well. Even if Google isn’t using this approach, using lists on your pages may lead to more people reading your content more thoroughly.

  6. Very cool insight. Goes to show that, if you want to rank for “Saturn Volume”, you may not technically need to pair those 2 as a key phrase if you structure your content in a logical manner.

  7. Hi Jeremy.

    I was pretty excited while reading through the patent to come to that conclusion as well.

    I’ve been recommending the use of lists on sites for readability purposes, so it’s good to see that they also potentially have some value from an SEO stance as well.

  8. I have been good about designing websites utilizing headings like h1 tags for keywords. Adding a list to the mix code prove to be a good formula. I certainly couldn’t hurt. Are there any testing grounds? I’d like to se some examples of proper term paring and structure.

  9. Very nice article!
    I particularly find this awesome, because this gives Google even more possibilities to learn about associated words which is an important factor for any SEO.

  10. This article falls very much in line with some experimental layouts we’ve been working with recently. Specifically we’ve found that CSS floats have been useful in structuring navigation in a way that places more SEO friendly terms at the beginning of the navigation without sacrificing aesthetics. Good to see some other tactics put to work.

    Andrew Gouty
    Boulder SEO

  11. This is a pretty neat finding. What suprised me though is that it took from 2004 until now to get this patented! Surely you’ve done a good job summarizing!

  12. I have never really been a big fan of list and I have never herd anyone talk about the SEO benefits of them before.(I understand this patent was just granted). Looks like I will be using list a little more often where they are appropriate.

  13. Pingback: » Pandia Search Engine News Wrap-up May 15
  14. Interesting, but as a relative newbie, the details are a bit deep for me. Lists are quite tidy for some information, and I do use them when they fit in, but I’m not sure if I could work a list into some pages.

  15. Hi Superstar,

    If you don’t have a site that you can test things like this on, it’s not too late to start one. I’d recommend developing one or more. It’s a good practice, so that you don’t put your own site or a client’s at risk while you try things out.

    As for proper pairing and structure, I don’t think the actual implementation of this, in terms of lists is all that sophisticated. The patent tells us that while they would look at explicit lists, such as ordered or numbered lists, they would also look at lists created through the use of tables, or even break elements. They also note that while a heading element could be used, they would consider a heading using just a larger font size at the top of the list as being a heading or title for one of these explicit or implicit lists.

  16. Hi Andrew,

    That’s a good issue to raise.

    People have been using table tricks, absolute positioning with CSS, floats and other tricks to try to get important text to appear prominently in the HTML code of a page even if it isn’t as prominent in a visual view of the page itself. I haven’t been convinced that it was a good idea to do so, and after reading through and number of the patents and white papers from Microsoft, Google, and Yahoo about visual segmentation of content on web pages, and this patent, I’m even less convinced.

    If you focus instead on putting your important content within an area of your pages that are obviously the main content sections of a page, you may be better off.

  17. Hi Robin,

    Patents take a long time to get granted, and six years isn’t all that unusual. A lot of the granted patents I’ve seen recently were filed in 2004, and there are even some from 2003 and 2002.

  18. Hi Mike,

    I don’t believe that I’ve even seen anyone discuss the SEO values of headings and lists combined like this either. I have seen a number of posts recently that questioned the value of using <h1> elements from an SEO perspective after conducting a few experiments, but that was just for the text that appeared within the <h1>, and didn’t consider the concept of semantic closeness and how a search engine might consider how close a word appearing in a heading to a list might be to a term that appears as an item within that list.

    Maybe we’ll see the people who experimented with those <h1> elements test this semantic closeness concept as well.

  19. Hi Felix,

    A slightly different approach to explaing the concept then.

    If you’re optimizing a page for the term “New York ice cream”, ideally you would want the words “New” and “York” and “ice” and “cream” to appear on the same page together.

    A search engine, seeing a query “New York ice cream” (without the quotation marks) may try to find all the pages on the web that contain the terms “New” and “York” and “ice” and “cream”, even if they aren’t all together like that. In addition to returning web pages in search results that have the phrase “New York Ice Cream”, the search engine might show us pages that have a sentence like:

    “I went to a store in New York to buy cream and slipped on the ice.”

    The words in that sentence are only a few words apart, but they aren’t right next to each other. The closer together they are, the more “relevant” they might be for the query – New York ice cream.

    What this patent says is that if the words appear in a title and list items under that title, they might count the words between the title and list items differently than just counting all the words between them.

    Imagine that you had a list like this:

    <h1>New York Industries</h1>

    <ul><li>Milk</li>
    <li>Wine</li>
    <li>shipbuilding</li>
    <li>Tourism</li>
    <li>Ice Cream</li></ul>

    Since the list is a “semantic structure,” and “ice cream” is a list item related to “New York Industries,” it is considered to be semantically close to “New York Industries.” It’s as close as any of the other list items to that title. New York Industries Ice Cream – only one word between the “new York” in the title and “ice cream” as a list item.

    The words are closer together than in the sentence “I went to a store in New York to buy cream and slipped on the ice,” so a page with a list like that might be considered more relevant for my query than one containing that sentence if we were to just base relevancy on how close words from a query are together.

  20. Hi Bill,

    It has been so long since I have not read an article as relevant about SEO.
    Thank you for these precise details, I have read other of your articles is very complete and very educational.

    David (from France)

  21. See, Mom, I told you I need to spend more time on the computer making my code look beautiful!

  22. Based on your description, I don’t understand why “distance” would be more closely associated with “Saturn” than “rotation.”

    Vic

  23. Bill,

    In that example you indicate that all terms at the beginning of each bullet item are considered to be the same distance from the term at the top, hence my question as to why in your other example you state a term further down the list of bullet items has more relevance than an item higher on the list, even though they are both the first word of each bullet item.

    Thanks,

    Vic

  24. Hi Vic,

    I added an illustration above to my example. I was comparing the semantic distance between the word “Saturn” and “Distance” to the semantic distance between “Days” and “Rotation.” In other words, the semantic distance between a word in a title and a word in a list item is shorter than the semantic distance between two words in different list items.

    I hope the image helps make my example clearer.

  25. If I understand this article correctly, the distance between not only keywords, but subject related words also matters?

  26. Hi Mark,

    The patent really doesn’t discuss “subject related words,” as in words that might be said to be related because, for instance, they tend to appear together a lot within the same documents on the Web.

    Instead, it’s looking at the distance between keywords, and how that distance might be calculated differently if those keyword appear in some meaningful semantic structure, like a list of items. The distance between words in a list title, and list items aren’t measured by looking at the number of words between them, because each list item is considered to be the same distance away from the title as any other list item.

  27. Thank you.

    It does make sense that a search engine would give items within a list equal weight in relation to one another, and assume that they are equal distances apart from a heading of that list. It’s nice to see Google validate that in this patent.

    One of the things that is interesting about the patent is that the authors seem to assume that not everyone would be diligent in using HTML markup in a semantic manner, and the process behind it will consider lists that are set out within tables, or by the use of br elements to be equally valid (as lists) as ones that use ordered and unordered list elements.

    The patent tells us that they will also look at the heading or title for a list, and consider it to be a heading even if it uses a larger font size rather than an actual heading element. So, your instincts on using larger fonts in PDF document to approximate an H1 element seems to be a good one, based upon how Google says they might interpret the semantic structure of a heading and a list that follows it.

  28. I always try and use semantic html, if something is a list of repeating words/phrases then it should be in an (un)ordered list, thats why Nav bars and menus should be displayed as a list. It makes sense that all items in a list should have equal weight to one another so that any of those items have the same relation with a heading or subheading. I also think size of text is an important factor as with some PDF files in Googles index, the largest text seems to get treated with the same importance as a H1 tag.

  29. What a great article. Thanks for sharing your insights with us. I’ve thought that proximity had been a factor for some time as I’ve noticed that certain sites rank for terms that are not necessarily present on their pages or the anchor text of incoming backlinks. It makes sense that Google uses the semantics of a page to determine whether information is relevant or not, but do you think that certain aspects of the algorithm are only triggered when a site achieves a certain PageRank, or contrarily do you think the same algorithm is applied to all sites in a non-nuanced approach?

  30. Hi Jonathan,

    Thanks. The proximity of words on a page probably has been something that search engines have been considering for a while.

    One reason why you might rank for some terms that may not appear on your pages, or in anchor text that appears upon your pages is that Google may have expanded some searchers’ queries to include synonyms that may appear on your pages.

    It’s possible that Google may not index some aspects of pages when those pages fall below a certain PageRank, or may not crawl and re-index those pages as quickly. But PageRank is only one of possibly hundreds of different factors that a search engine may look at. For some types of pages, such as news articles, Google might pay attention to some signals that it wouldn’t look at, or give as much weight to, for an ecommerce product page or a blog post. The search engine may also consider or give more weight to different signals based upon the topics of pages as well. For example, the freshness of a page containing a historical document, such as the Declaration of Independence is likely less important than a page about a popular event happening this week.

  31. I’d absolutely agree with your comment above that having test sites where you can try these things out without killing an important site (such as one belonging to a client!) is really useful. Rather than just following what others say, you can test things out empirically for yourself.

  32. Hi Mark,

    It really is helpful. It also helps to make the site something that you’re pretty much interested in, so that you have some kind of incentive to add to it, and try things out with it.

  33. Hi Bill,

    Great article, This is the first time I have visited you blog, but consider yourself bookmarked! In hindsight of this info I can certainly see some great posibilities for the practical application, especially maintaining a natural appearance to content whilst optimising pages where clients are offering the same service to a range of localities.

  34. Hi Nick,

    Thank you, and welcome to SEO by the Sea.

    Good point on how this might be used to make a list showing a service might be available in different localities. I think there are a few different ways where this might come in useful. I was also thinking about how a title might be associated with each of the items in a list when those items are links.

  35. Yeah bill, i remember hearing this. It seems like Google is looking for link networks when giving weight to a page. If a page is liked by pages that are linked by more pages, it should have more weight of course! :)

  36. Hi John,

    The concept of semantic closeness doesn’t have anything to do with link networks, but rather how Google might look at words on a page when they are in a semantic structure like a list.

  37. Pingback: List Items and SEO
  38. Pingback: A Great Tool You’re Not Using: Wonder Wheel | The Milwaukee SEO
  39. Bill,

    Interesting article.

    Any views if they are using this technology. I have always tried to make use of lists as part of onpage. It would be interesting to see if any positive increases could be connected to this approach.

    Thanks

    Simon

Comments are closed.