As long as there have been search engines, there have been people trying to take advantage of them to try to get pages to rank higher in search engines. It’s not unusual to see within many SEO site audits a section on negative practices that a search engine might frown upon, and Google lists a number of those practices in their Webmaster Guidelines. Linked from the Guidelines is a Google page on Hidden Text and Links, where Google tells us to wary about doing things such as:
- Using white text on a white background
- Locating text behind an image
- Using CSS to position text off-screen
- Setting the font size to 0
- Hiding a link by only linking one small character—for example, a hyphen in the middle of a paragraph
Those are some of the same examples described in a patent granted to Google today at the USPTO:
Systems and methods for detecting hidden text and hidden links
Invented by Fritz Schneider and Matt Cutts
Assigned to Google
US Patent 8,392,823
Granted March 5, 2013
Filed: August 25, 2009
A system detects hidden elements in a document that includes a group of elements. The system may identify each of the elements in the document and create a structural representation of the document.
The structural representation may provide an interconnection of the group of elements in the document. The system may also determine whether one or more elements of the group of elements are hidden based at least in part on locations or other attributes or properties of the one or more elements in the structural representation.
Unsurprisingly, one of the co-inventors behind the patent is Google distinguished engineer Matt Cutts, who has spent a good part of his long career at Google exploring the many different ways that people might try to spam the search engine, and find some solutions.
I really enjoy seeing patents like this one, which may not tell us something new, but provide a reference resource that other people, including clients, can be pointed towards. They sometimes fill in some gaps on how a search engine might do something, and provide some history.
For example, this patent is based upon an earlier one that was first filed in 2003, and it’s not hard to imagine people at the Google of that time trying to figure out how to automate a way to identify text and links that might be hidden by being the same color as the background they appear upon, or being obfuscated by cascading style sheets, or written in lettering so small that it appears to be a line rather than actual text.
The Guidelines above mention the use of a single small character in a paragraph being used as a link, and the patent mentions that extremely small (1 pixel X 1 pixel) images have also been used as hidden links on pages.
As the patent also notes, CSS allows webmasters to mark a block of text as hidden, or to position it outside of visible areas of a page. Java script can also be used to hide text, and to modify documents to replace text.
Part of the process behind identifying hidden text or links on a page may involve analyzing the HTML structure of a page and its elements, such as divisions or section, headings, paragraphs, images, lists, and others. It looks at a Document Object Model (DOM) of pages to learn things about those different elements, their sizes, positions, layer orders, colors, visibility, and more.
The patent provides a few different examples of when hidden text might be found on a page, such as in the following:
In this example, server 120 may detect that the webmaster has overridden the value of the <h2> tag. Normally, the “h2″ tag is a heading size, in which H1 is very large, H2 is a little smaller, H3 is smaller still, etc. Here, the webmaster has used CSS to override the value of h2 to mean “for all text in the H2 section, make the text color almost completely black, and make the height of the font be about one pixel high.”
A viewer of this document would not see the text because it is so small, but a search engine may determine that the text is relatively important because of the H2 heading label. In this situation, server 120 may determine that the text in the H2 section is very small, which can indicate that the webmaster is attempting to hide the text in this section.
There are some times when designers use hidden text because they want to use a font on a page that isn’t a standard system font that might come with Windows or Apple or Linux computers, and the page won’t render the way they want. Google’s John Mueller has noted in the past on Google’s Webmaster Help Forum that is probably not a problem:
If you are using image replacement techniques and replacing the text with an image that is equivalent (with the exact same text in approximately the same visibility) then that is generally fine. This provides a nice user experience and still lets those who cannot access the images (eg crawlers or vision-impaired users) use your website normally.
Hope it helps!
As I noted above, one of the things that I really appreciate about this patent is that it provides another place to point people to when discussing things like hidden text and links other than just Google’s help pages on the topic. It also puts the problem in the framework of a business that is trying to address a challenge rather than a web institution laying out a guideline that it expects people to follow.