The Importance of Page Layout in SEO

If a search engine could understand the layout of a web page and identify the most important part of a web page, it could pay more attention to that section of the page when indexing content from the page.

It could give links found within that section of the page more weight than links found in other sections of the page, and it could consider information within that area more weight when determining what the page is about.

We’ve seen the idea of breaking pages up into parts from a couple of the major commercial search engines:

A patent application from Yahoo explores how to approximate the layout of a web page, without actually displaying the page as a web page the way that a browser program does.

Not actually rendering a page like a browser might makes the process faster, which is important when a search engine has to look at lots and lots of web pages.

The patent filing also explores ways to identify what the most important section of a page might be from the approximated version of a layout. The patent filing is:

Techniques for approximating the visual layout of a web page and determining the portion of the page containing the significant content
Invented by Anandsudhakar Kesari
US Patent Application 20080033996
Published February 7, 2008
Filed August 3, 2006

Here’s the abstract from the patent application:

To approximate a visual layout of a web page without rendering the page, an object tree representing elements within the page is recursively traversed to determine bounds for the width of the elements, resulting in lower bounds induced for non-leaf nodes by elements within these nodes and upper bounds induced by ancestors and siblings of nodes.

For each element, the minimum required width (lower bound), the desired width were there no constraints, and the maximum available width (upper bound) based on constraints of parents are computed, and an approximate width is derived therefrom.

A positioning process positions each element within its corresponding parent container by advancing a cursor according to the elements’ approximate width and appropriate constraints.

The element that contains the most meaningful content is determined based on the amount of weighted content of elements and their position within the page.

Information Extraction Systems and Data Structures

The ways that information might be presented on web sites can often be described as structured, semi-structured, or unstructured.

Structured means that the pages are generated using a common layout or template, and contain the same information fields from one page to another.

Semi-structured sites may use templates that have a number of variations to them. For example, one page may include information and fields that other pages don’t have, or some pages might show a wider range of information and values. .

Some sites that may use a structured format might include job sites, or travel sites, or ecommerce product pages.

The majority of pages on one of those sites may display all the same information fields from one page to another, and if there isn’t information to fill a field, the field is shown anyway, but might show that there is no information for that field. An online bookstore might be set up that way, too.

A semi-structured format just might not display fields that are empty, or may show some new fields if there is unique information to show.

Information Extraction (IE) systems are used to gather and manipulate the unstructured and semi-structured information on the web and populate backend databases with structured records.

One of the challenges faced by an information extraction system is to quickly and accurately extract information from HTML pages.

So, how does an information extraction system find the good stuff on a page full of HTML code, and bypass the useless content?

It might look for some cues from the HTML, such as

(a) Style of the content, like color, emphasis, size, etc.;

(b) Geometric layout of page elements of the page, like the absolute and relative placement of elements; and,

(c) A visually significant region on the page which appears to contain the main content.

Looking at HTML to get cues about layout and which section might contain the main content of a page can be difficult, without using something like a browser to display a page the way that people actually see it.

But the cost of looking at a page in that way can be computationally expensive, and if a good approximation can be done that doesn’t involve that kind of expense, then it may be ideal for information extraction purposes.

Identifying the Most Significant Element of a Page

A search engine doesn’t really want to pay too much attention to sections of a page that it might consider noisy, such as navigation bars, or banner or targeted ads when extracting information from a page.

It probably doesn’t want to focus upon a footer part of a page, with information like a copyright notice, or the header of a page which may contain a site logo repeated from one page to another on the site.

The most significant element of the web page would be estimated by this visual layout process, by trying to find the element that contains most of the meaningful content on the page.

That most significant element of the page would be based on the amount of weighted content of elements and the position of the elements within the page as approximated by the visual layout process.

Conclusion

The patent application goes into a lot of detail on a method to estimate the layout of a page, and to understand the positions of elements within a page, as well as identifying the most significant element of pages.

If you build web pages, and you want an idea of how a search engine might be looking at and weighing the content of your pages, you may want to spend some time with this patent filing.

Considering that Google and Microsoft also have developed methods to segment the contents of web pages, It’s not a bad idea to get a sense of how they all might be breaking pages down into parts.

Share

58 thoughts on “The Importance of Page Layout in SEO”

  1. i guess this is something worth reading since this could be a great tip especially for those who have static webpages.. Since this are the once who do not rank well in search engines…

  2. Bill,

    Is there any way how to view images on the US Patent office site? This patent has lots of drawings which can shed more light on it. I just can’t display them, it points me to Quicktime installation but it doesn’t work :/

    Thanks

  3. Pingback: Link Diagnosis » Blog Archive » Do you evaluate on-page link position when acquiring a link?
  4. You’re welcome, Janusz.

    The images can be helpful when reading a lot of these patent filings, and that plugin does work well. Thanks for your kind words in your post.

  5. @ Richard, I’m not certain that static pages are being harmed in rankings by search engines, but content management systems and ecommerce platforms can make managing a site a lot easier…

    @ Jacques, You’re welcome. I have some more to come very soon that is related.

  6. We are truly going to a more and more semantic web. As a SEO (newbie I must admit) with a Front End Web Development background, all this just make sense for me.

    Making semantic and more accessible web site worth it in the long run… I’m more than happy to see the Web going in that direction. Great post Bill :)

  7. This is interesting.

    Any mention in the filing about if or how the use of W3C xhtml markup standards ( transitional, strict etc. ) could be used by a search engine to determine the important parts of a web page?

    This filing looks like it may signal the coming importance of these web design standards in the eyes of the search engines. All the more reason to keep your page structure ( html and xhtml ) and your design ( CSS ) separate from one another.

    BTW Bill, I am glad to see you got the Taleb books.

  8. Thanks, Samuel,

    I like the direction that the process in this patent application points towards, too. People creating pages that are semantically richer, and more accessible and usable are going to end up benefiting.

    Thank you, People Finder.

    The patent application doesn’t discuss document type or markup standards. Many patent applications provide overviews without going into extreme detail, so we don’t know for certain. Writing semantically rich code that makes it easier for the search engine to understand, and carefully paying attention to what content you are putting where, within those pages may be helpful.

    Thanks again for the Taleb books. I just started on the Black Swan. Very interesting so far.

  9. Great post, Bill! This is the first time visiting your blog and your article definitely explains why web designers need to create web pages that are semantically richer. I will definitely visit again.

  10. Hello Bill and All – so this completes our conversation on page positioning from Cre8taSite Forums. Position on page does matter and SEO is like real estate and now about location, location, location.

  11. @ Michigan Web Design, Thanks for your kind words, and for the mention in your blog post.

    It does seem like search engines are getting smarter about things like a page’s layout. Don’t know where we are yet with them implementing things like this, but there are signs in Google Local search that they are segmenting reviews from pages, where those pages review multiple restaurant and other businesses.

    @ marianne, It may. We need to keep in mind that patent filings only indicate possibilities.

  12. If the engines dictated certain tags on page which would extract the required content, I for one wouldn’t have a major issue with that…For example always put your content inside a DIV which has an ID of content, obviously that doesn’t scale well for what is out there at the moment, and encouraging developers to do so wouldn’t be easy, but if they gained say an extra feature within the SERPS for the sites that implement it, the people who are serious about search results would soon get their developers off their ass to do something about it. A byproduct would also be more work for developers!

  13. It seems apparent to me that by now search engines are able to do some basic page content analysis and recognize patterns in sites with repetitive page layouts, specifically to highlight the value of the main content of each page and decrease the value of the “noise” (sidebars, footer, ads, etc).
    I do have a hard time picturing a visual-based approach that doesn’t include rendering the pages using an engine that operates in a way similar to those found in mainstream web browsers, because pages are designed to work well in these environments to begin with. This render would include a full understanding of the HTML markup and its styling information.
    Although the idea of advanced algorithms that can predict layout with minimal rendering processing sounds appealing from the point of view of improving results, I feel that search engines already do a pretty good job of figuring out what’s going on with the layout of the page (especially if the website is using proper HTML,) and I can’t imagine a visual analysis factor making much of a difference unless it calculates a full render of the pages.

  14. @ Webmonkey-Ireland,

    What you’re suggesting with a Div ID identifying content sounds like the content targeting in Yahoo’s beta Y!Q search system, which really hasn’t seemed to capture too many peoples’ attentions.

    Talking about extra features in search results, I am curious as to what Yahoo’s new Open Search will bring us.

  15. Hi Jose,

    You raise a number of excellent points. I do think that it might be possible to efficiently gain some insight into the aspects of a page without doing rending of a page like a browser would.

    Don’t know if you caught this post of mine from around a year ago – Yahoo Research Looks at Templates and Search Engine Indexing. I’ve been wondering if we would see more detailed looks at the layouts of pages come from Yahoo after the paper described in that post.

  16. This was a very interesting write-up Bill. Like others have stated, I like the direction that the process in this patent application points towards, as well. More and more people are creating sites that are semantically improving. By this occurring, more and more people will also benefit.

  17. Is there any software available that can quickly show me what the most important section of a page is?

  18. I’ve just started learning more about SEO, and how on-page SEO works. It seems like now, it would make more sense to spend time making sure that your page is designed using HTML and CSS, rather than using tables like so many do. Yes, from a coding standpoint it is much nicer to use HTML/CSS, but now could it actually help with your SE rankings too?

  19. Hi Chris,

    I had the chance a couple of years ago to spend some time teaching someone SEO who had an extensive background in CSS and web accessibility, and who had worked on at least one massive government site, making it 508 compatible.

    Many of the things that you do for accessibility purposes, and for well-written semantically meaningful design also help with SEO, and it was fun to see him come to that realization day after day as we explored how search enigne spidering programs crawl through pages, and search engines index content. Unique page titles, headings, meaningful alt text on images that are included as content instead of as decorations, and so on, are all very helpful. A search engine benefits from having code and content clearly labeled, and described for it.

    Understanding that a search engine may place more value on content and links in sections of pages that it believes are more important is also a good thing to keep in mind when determining how to layout and label parts of pages.

    Tables still have value when they contain data that is best shown within tables (see TablesVsLists), and when they have clear headings for that data.

    Paying attention to whether or not search engine spiders can crawl pages you want indexed, using words on your pages that people who want to find what you offer will use to search for your site and expect to see on your pages, making those pages as search engine friendly as possible – those are all as important as creating a design that conveys meaning, and a layout that is easy for visitors to use.

  20. Pingback: For Web Designers « The SEO Lady
  21. Page layout is very important and I must agree with John! This is an Excellent Post!

  22. Pingback: HTML5 et référencement, quel est le programme?
  23. Pingback: SEO Daily Reading - Issue 41 « Internet Marketing Blog
  24. Thanks. It’s pretty interesting to see how search engines are evolving, and the idea of segmenting content like this may make a difference in how pages are ranked by search engines in the future.

  25. Very interesting post. This draws concerns with regard to Word press themes and templated site designs. This suggests they may be partisan to a site based on its layout criteria. They could be possibly screening for redundancy.

  26. Hi Joe,

    I thought the idea of looking for signals on a page that it was using a template (without having to compare that page to other pages on the same site to make that determination) was interesting, too.

    Certainly, a site using wordpress or other blogging software or well-known content management systems is going to use characteristics that might show the use of a template. I don’t believe that the point behind that is penalize a site as much as it is to differentiate between content on a page that is likely to be the main content, and other content that is likely to be the same from one page to another, such as copyright notices and other possible boilerplate.

  27. Bill, i have been looking at this recently (March-April 2010) following what i thought has been a new algo change (at least in the uk anyway) and this (your post) is making more sense, but for another reason (link placement – has it finally been resolved by google?)…

    for a loooooooong time, an SEO company in the UK (justsearching.co.uk) has (had) been #1 for ‘seo’ and ‘search engine optimisation’, but have dropped quite dramatically recently.

    They were powered (seo wise) 99% by links from client sites (around 400+ as they are quite a big company) with the phrases ‘seo’ and ‘search engine optimisation’ but crucially, in all but a tiny few examples, at the VERY bottom of any page that they were linked from.

    so, this company with pretty much only footer links has sunk (this is not the only example that i have seen), but also, at the same time, another UK seo company (smart-traffic.co.uk), that has also lots of links from various sites, but crucially, places them mid-text and in various locations for clients, has now hit the #1 spot.

    this is more than a coincidence, and after a couple of months of looking at things, its looking a likely thing – that google is now able to really (ie not just speculation) work out where items are placed on a page, whether it be content or links.

    just my two cents…

  28. Hi Darren,

    It’s very much possible that Google is weighing links differently from different locations on a page, though it’s hard to tie something like this directly to one or two sites and their rankings because of the possibility of other factors and possible actions playing a role. It sounds like you’ve spent some signficant time studying the sites in question, though. Thanks for sharing your observations.

  29. hi bill

    yes, thats all it is, observations, and even tho i have looked at around 20 large sites that i either run, look after, or watch, it is still such a tiny fragment of things that it could be all coincidence, i agree.

    but…my seo ‘spidey senses’ are tingling…after 1 or 2 sites i was dismissive, then after 5 or 10, i was intrigued, and at around 20 sites with (some) similar signs…

  30. Hi Darren,

    I know what you mean about sensing that the cause might be based upon the locations of links. You look at enough sites in a critical manner, and manage a number of sites over a long period of time, and patterns do start emerging.

  31. interesting observation, I usually post my backlink at the bottom of my clients website as well … as to be discrete

    I’ve recently read a similar article on the weight of links coming from the main block of a website, though it seems this is not a new phenomenon (I’m catching up)

    thx

  32. Hi Webdesign DragolinDesign,

    The idea behind a search engine possibly segmenting a page like this, and providing different weights for links in those sections has been around for a number of years, though we really hadn’t seen anything specifically from Google that said that they might be doing something like this until recently. See my post on:

    Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data

  33. Hi Bill,

    these are some interesting thoughts. What’s your opinion about HTML5? It has HTML element for headers and footers (, ) within a website, so it’ll be even easier for webcrawler to identify which’s which. I like the more semantic approach. It makes a lot of sense.

  34. Hi Webdesign Bergen op Zoom,

    I’ve been watching as the HTML 5 drafts have been evolving, and the elements for headers and footers definitely have the potential to make it easier for a web crawler to identify the different parts of a page. I think adding those elements to HTML 5 will make it more likely that many designers will label those sections. Of course, the flexibility of HTML means that other designers may not bother, but it’s a step in the right direction.

  35. I think HTML 5 is a little bit ahead of its time. While it has awesome features and a solid promise it just doesn’t have the backing the web world needs yet. Give it time guys.

  36. Hi Tim,

    It has been a fairly long time since HTML 4.01 became a standard. You may be right that HTML is a little ahead of its time, but it’s quite possible that backing will grow over time. I have seen a few books published on HTML 5 already.

  37. Excellent tips Bill! I’ve been using HTML5 with ARIA roles and REL attributes in order to emphasize the most important page elements for screenreaders and search engines. It really seems to help in my opinion. Plus, my code has never been cleaner!

  38. Hi Angela,

    Thank you. I’m not sure how much attention the search engines are paying yet to some of the unique elements of HTML 5 when they come across it on web pages. Good to hear that you think it has helped.

Comments are closed.