Hidden Text and Hidden Links
As long as there have been search engines, there have been people trying to take advantage of them to try to get pages to rank higher in search engines. It’s not unusual to see within many SEO site audits a section on negative practices that a search engine might frown upon, and Google lists many those practices in their Webmaster Guidelines. Linked from the Guidelines is a Google page on Hidden Text and Links, where Google tells us to wary about doing things such as:
- Using white text on a white background
- Locating text behind an image
- Using CSS to position text off-screen
- Setting the font size to 0
- Hiding a link by only linking one small character—for example, a hyphen in the middle of a paragraph
Those are some of the same examples described in a patent granted to Google today about hidden text and links:
Systems and methods for detecting hidden text and hidden links
Invented by Fritz Schneider and Matt Cutts
Assigned to Google
US Patent 8,392,823
Granted March 5, 2013
Filed: August 25, 2009
Abstract
A system detects hidden elements in a document that includes a group of elements. The system may identify each of the elements in the document and create a structural representation of the document.
The structural representation may provide an interconnection of the group of elements in the document. The system may also determine whether elements of the group of elements get hidden based at least in part on locations or other attributes or properties of the one or more elements in the structural representation.
One of the co-inventors behind the patent is Google distinguished engineer Matt Cutts, who has spent a good part of his long career at Google exploring the many different ways that people might try to spam the search engine and find some solutions.
I enjoy seeing patents like this one, which may not tell us something new but provide a reference resource that other people, including clients, can get pointed towards. They sometimes fill in some gaps on how a search engine might do something and provide some history.
For example, this patent gets based on an earlier one that was first filed in 2003. It’s not hard to imagine people at the Google of that time trying to figure out how to automate a way to identify hidden text and links, hidden by being the same color as the background they appear upon, or becoming obfuscated by cascading style sheets, or written in lettering so small that it appears to be a line rather than actual text.
The Guidelines above mention using a single small character in a paragraph getting used as a link, and the patent mentions that small (1 pixel X 1 pixel) images have also gotten used as hidden links on pages.
As the patent also notes, CSS allows web admins to mark a text block as hidden or position it outside of visible areas of a page. Javascript can also get used to hiding text and change documents to replace text.
Part of the process behind identifying hidden text or links on a page may involve analyzing the HTML structure of a page and its elements, such as divisions or sections, headings, paragraphs, images, lists, and others. It looks at a Document Object Model (DOM) of pages to learn about those different elements, their sizes, positions, layer orders, colors, visibility, and more.
The patent provides a few different examples of when hidden text might get found on a page, such as in the following:
In this example, server may detect that the webmaster has overridden the value of the <h2> tag. Normally, the “h2” tag is a heading size, in which H1 is very large, H2 is a little smaller, H3 is still smaller, etc. Here, the webmaster has used CSS to override the value of h2 to mean “for all text in the H2 section, make the text color almost completely black, and make the height of the font be about one pixel high.”
A viewer of this document would not see the text because it is so small, but a search engine may determine that the text is relatively important because of the H2 heading label. In this situation, a server may determine that the text in the H2 section is tiny, indicating that the webmaster is attempting to hide the text in this section.
Conclusion
There are some times when designers use hidden text because they want to use a font on a page that isn’t a standard system font that might come with Windows or Apple or Linux computers, and the page won’t render the way they want. Google’s John Mueller has noted in the past on Google’s Webmaster Help Forum that is probably not a problem:
Hi Eric
If you are using image replacement techniques and replacing the text with an equivalent image (with the same text in approximately the same visibility), that is generally fine. This provides a nice user experience and still lets those who cannot access the images (e.g., crawlers or vision-impaired users) use your website.
I hope it helps!
John
As I noted above, I appreciate this patent because it provides another place to point people to when discussing things like hidden text and links other than just Google’s help pages on the topic. It also puts the problem in the framework of a business trying to address a challenge rather than a web institution laying out a guideline that it expects people to follow.
It is always great to have a hard piece of evidence to show clients that confirms methods you have been warning of, we still see this technique being used time and time again. Perhaps we may see something appear in webmaster tools that either allows you to check manually whether this tactic is being used or automatically delivers a message, warning that this is not best practice.
Did anyone still use these kinds of tricks? This stuff seems kind of “10 years ago”.
The measures seem like perfectly reasonable measures anyone with a bit of intelligence and a few hours of learning about html could come up with. That this kind of thing is considered patentable I find ludicrous.
Hi John,
Honestly, I didn’t write about this patent to protest it, or question why it would be patentable. That’s really immaterial to me.
I wrote about it to better understand the business processes that a very large site like Google might go through, the intellectual property they attempt to develop to be able to do something like detect hidden or invisible text on hundreds of millions or billions of websites in a very short amount of time, the assumptions they make about the roles of search engines and the sites they list, and the methods that people might use to try to abuse the ranking signals that a search engine relies upon or even use hidden text in a legitimate manner, like an FIR design technique.
I’ve tried to present the ideas behind the search engine in as simple a manner as I can, but after reading through the patent, and the challenges of both scale and a possibility of coming up with false positives, I can say that it isn’t that simple. It’s not a matter of spending a few hours of studying HTML by any means. I see people with extensive backgrounds in HTML development and Web standards coming up with new ways to do “legitimate” things like Image Replacement techniques for logos and headings and menus years after someone suggested using that kind of legitimate approach to hidden text.
This patent isn’t about finding some hidden text on one page. It’s about keeping a search engine from being abused by people who attempt to take advantage of it.
Hi Bill,
This is great news for us: White hat followers. But I just little confused about the term “server 120” you have mentioned in blue highlighted paragraph.
So can just explain little bit about what it is and what job it does to fight again Hidden text/links in the document. Will be appreciated..Thanks!
The patent isn’t intended to help anybody but Google. How does this help the end user? Don’t try to manipulate the search engine? Duh.
The point is that they can patent “detect hidden text” so that other search engines like Bing and DuckDuckgo can’t use those exact methods.
It is kind of absurd you can patent things like “If the text color is the same as the background color, then the text is hidden” – now grant me my patent.
Hi Rajesh,
The number from that quote I used just refers to a number in one of the illustrations in the patent that points out a server.
Kind of surprising that they just now got a patent on this. But like you said, it is good to have things like this to point to when trying to get clients or even developers to understand how things work. I have had a long-standing argument with a well-known niche platform’s developers who hide all kinds of spammy phrases and keyword anchor links behind images claiming it is the same as an alt tag. Yes, people still use hidden text and links!
I think this is known for so many year, announcement is coming a bit too late. Probably information leaked more than 10 years ago.
Seems very obvious and a strange thing to seek a patent for. Will Google go after other search engines because they ignore hidden text and want to produce an authentic index?!
Hi Tom,
I don’t think I find it strange at all. The original version of this patent was filed in 2003, and the innovation might not be detecting hidden text and links as much as it is finding a way to try to automate the process for search engines. It’s very unlikely that Google would go after any other search engine for finding ways to identify web spam and attempts to manipulate search results. Other search engines, including Microsoft and Yahoo have been very open and sharing of those kinds of methods, especially with yearly AIRweb (Adversarial Information Retrieval on the Web) conferences. Ignoring hidden text doesn’t result in an “authentic” index, but rather one that’s been manipulated to the potential point of uselessness.
Hi Carl,
What I like about this patent is that it fills in a few gaps about how Google has been approaching methods like this, and treats it as a business process rather than a “guideline.”. People are also coming up with new approaches to produce hidden text to this day, for things like Image Replacement Techniques. The patent also provides a nice reference point to show people other than just Google’s webmaster tools.
Hi Nick,
I agree – I think having this information about hidden text and links in a patent is going to make it easier to not have arguments like that.
This sounds like Patent Spam IMHO. Maybe the US Patent Office should come up with their own version of Google’s own Penguin and mark patents like this as useless to anyone and everyone except for the applicant/s and remove it completely from the Index of Patents. Of course the Office of Patents should use their own unpublished algorithm to seek out these patents and also hire some secret employees who manually review and remove such patents from the database according to their own taste and discretion.
It’s Patent spam like this that dilutes the actual intention and role of the US Patent Office, it clogs up the system so to speak and causes real and useful patent applicants to be placed in the eternal “back of the line” because they are NOT a very deep pocketed and heavily lawyer-ed up behemoth.
I actually just realized I am the only dissenter here of “Google GOOD – Google God” so let me translate what I said so everyone here can understand.
Bahhh bahhh bahhh, bah, baaahhhh. Bahhh bahhh baahh….. this is useless, I don’t really speak sheep 🙁
Hi Jeff,
A version of this patent was filed in 2003, and the description section reflects that. But, in terms of what it does, it was very innovative for that time. Of course if you spend enough time and effort looking over a page, you could probably uncover the different ways that text or links might be hidden on a page. But try to do that for hundreds of millions of pages, or billions of pages.
As a way of preventing at least a couple of types of manipulation of rankings for a search engine (false relevance based on hidden text and false importance based upon hidden links), something like this patent isn’t just important, but even essential. Otherwise search engines would be completely useless, and filled with results with hidden text, with hidden links, with redirects to pornography and weightloss pages and other stuff that’s completely irrelevant.
The innovations here are the ability to scale to such a large degree, and to address ways that a search engine’s algorithms can be attacked. Without patents like this one, we wouldn’t have search engines.
This is great news. I was disheartened recently to find a competitor who even after the penguin and panda updates was ranking highly and had an incredibly high PR. After a review of their back links I found they were coming from high PR sites but the links could only be found in the source code of the sites and not visibly seen. They were hidden and an IT friend told me they could have possibly hacked into these sites and planted them there?
Thanks for writing this (and all the other) search patents up in a way that’s easy to understand. If someone were to hide text using methods such as white on white, pushing it off the page, or changing the size to be really small via CSS, and then block the CSS file with robots.txt, would Google still be able to detect it? Just curious here, promise 😉
Thanks for breaking down this patent.
I recently did a small test to see how Google treats content in CSS driven tabs, toggles, etc. Do these things hide text? Yes. But, they can also be good for usability. Sometimes a page that’s X thousand words long can seem daunting (especially in something like an FAQ document) to a reader. I found that Google def. indexes the text, but I wouldn’t be surprised (at all) if they devalued links that exist in parts of the page that aren’t immediately viewable to the user without clicking on additional parts of the page. Just my 2 cents.
P.S. I noticed there’s another Spencer who commented above. This is a different Spencer!
Though the patent was just granted I would think that Google has been using this in its algo for some time. From specific and existing examples I have seen involving hackers injecting links into sites, whatever google is doing does not seem to be very effective. They need to go back to the drawing board.
Hi Bruce,
It’s very likely that people have been trying to use methods like these for years. Google originally filed for a version of this patent back in 2003, though we haven’t seen any version publicly released until within a few days of my post. Google’s guidelines mention these ways that people might try to use to hide text and links on pages, but the patent provides a lot more details on the kinds of methods that people might use (but they only had to include enough examples to be awarded the patent, so there likely are others that they know about but didn’t list).
It’s surprising that people still think they can get away with these tricks. Have the many Google updates over the past year taught them nothing?! Some website owners will have to learn the hard way that there are no short cuts – and the Google gods know when you cheat!
I have had a couple of clients ask me to do these types of things so it would be great to have a place to point them towards that will back up what I am telling them.
You can actually see that the guidelines haven’t been changed since 2011 (the first time it was crawled by wayback machine) I’ll bet they were the same for a while before that aswell.
http://web.archive.org/web/20111227141347/http://support.google.com/webmasters/bin/answer.py?hl=en&answer=66353
Like you said this isn’t anything new but it’s great to have a reference. Especially if I have to explain to someone why they shouldn’t hide text. I’m surprised there are still people trying this stuff. I guess there will always be Black Hats out there.
Still I can see blogs that contain hyperlinked dots forwarding to homepage or even google.com. It makes me laugh cause those ‘methods’ don’t work since ages.
I still hear of people doing the hidden text thing. I guess they must stumble across some seo guide from 10 years ago and not realise how out of date it is.
Did this ever work anyway?
Unfortunately I still find that some of my new customers have hidden links in their code that a prior SEO company unscrupulously placed. This will be a helpful resource to show the folks asking to place hidden links in their site.
It is good that finally Google have this kind of method. As what you have meant this is kinda solid evidence for clients, at least they will have an idea how we do things. Patents like this are proof on how they fill gaps on search engines.
Thanks, Spook SEO
Like I noted above, this has been around for a long time, but I agree completely about the value of being able to point to a patent like this to show clients more about what Google says about how they are attempting to address issues like this.