What are Reciprocal Links?
Reciprocal links are links between two sites that have been created and link to each other because they cover similar topics or provide complementary goods or services. The owners of the sites may find that it helps to link to each other for their visitors’ benefit.
Site owners may do this to give visitors a chance to see both sites. Sites may be related or as a show of partnership. Following the mythology and folklore surrounding search engine optimization – you may have read or heard that reciprocal links are bad and search engines don’t like them.
The Truth About Reciprocal Links Is More Complicated Than That
What about blogs that link to each other on every page in blog rolls? Or links between sites owned by the same owner that is reasonable. These could be a storefront on a different domain, a blog at a different domain or subdomain associated with a site, or a group of sites from the same company or organization; that focus upon different topics?
What do search engines think of resource pages? Those could be where sites include pages of links and descriptions to other sites that they think their visitors might find helpful and useful? What happens if some of those sites link back? Does it make a difference if those resource pages include a statement that they will list your site on their page in exchange for a reciprocal link back? (I have seen someone face a manual action for doing that, and the search engine removed the action once that resource page with that message was removed.)
Search Engine Warnings on Reciprocal Links Between Pages
The major commercial search engines do provide some information about linking in their guidelines:
Google’s page on Link Schemes warns site owners that some kinds of linking might impact the ratings of their web sites negatively. Those have mentioned excessive reciprocal links, also including:
- Links intended to manipulate PageRank
- Links to web spammers or bad neighborhoods on the web
- Excessive reciprocal links or excessive link exchanging (“Link to me and I’ll link to you.”)
- Buying or selling links that pass PageRank
Yahoo, in their Search Content Quality Guidelines, provides examples of content that they don’t want included in their search engine. These include:
- Sites cross-linked excessively with other sites to inflate a site’s apparent popularity (link schemes)
Windows Live Help (Where Bing is now), in their page on Guidelines for successful indexing, included amongst their list of “techniques that might prevent your website from appearing in Live Search results,” the following:
- Using techniques, such as link farms, to artificially increase the number of links to your webpage.
How Helpful are these Guidelines to Most Searchers or Web Admins, or Bloggers?
The chances are that some percentage of people who use Google or have their websites indexed by the search engine are familiar with PageRank. They may not know what these guidelines mean by “link schemes” or “link farms.”
Why are search engines so concerned about reciprocal links?
Classifications for Search Ranking Signals
When you perform a search, the pages that respond to your search rank and are ordered by the search engine are based on many signals used by the search engine. These try to provide you with pages that might best match up with what you intended to find on the Web.
That kind of ranking is a challenge for search engines. There can often be many thousands or millions of pages containing the words you used to perform your search. So search engines want to try to provide the best pages that they can at the top of the results. Or they may try to find better pages than the other search engines are showing.
These different signals that a search engine might use to determine the order of pages in search results could be classified in a few different ways.
Content Based, Link Based, and User Behavior Based Ranking Signals
One set of classifications consists of breaking those signals into three different types: content-based, link-based, and user behavior-based.
Content-based signals look at the actual content that appears upon the pages of a website. Link-based signals pay attention to the links between your site and other sites on the web. User-based signals look at data that indicates how people might react to the pages of your site. This could mean whether they view the site directly or see it in search results on a search engine.
Query Dependant and Query Independant Ranking Signals
Another way that search engines might classify the signals that they use to rank pages can depend upon whether or not that signal is related to a query that you might use to search with or not. This way of classifying those signals breaks them down into two different groupings – how important they might consider a page and how relevant a page might be to a specific search term or phrase.
Signals that look at the importance of the “quality” of a page might look at the quality of the content of a page. It might look at the number and perceived importance of links to that page. Or how people use the page. These can include bookmarking it, spending time on it, annotating it somehow, or using it in some manner that might not be tied to a specific query. Search engines often refer to these kinds of signals for ranking a page as query independant signals. They don’t rely upon a query that might have been used to find that page.
Signals that look at the relevance of a page might look at how relevant that page might be to a specific query term or phrase. This could involve what words might appear in links pointing to the page and in words surrounding those links and associated with them, and in how people might use the page in a way that is associated with a specific query term or phrase such as clicking on a link to the page when it appears in search results for a specific search for a specific term or phrase. It could consider people spending a certain amount of time on that page after a search brings them to it. Search engines often refer to these kinds of signals for ranking a page as query dependant signals. They rely on a specific query used to find a page.
Mixing Signals and Reordering Page Rankings
A search engine can use a mix of a good number of signals to determine which order it might show pages to searchers in response to a search. It might also take those ordered results and reorder them before presenting them to searchers based upon other factors involving those pages. These could include which country the searcher might be from, which language they have indicated they prefer to see results in, and many others.
I’ve written about how and why a search engine might reorder search results a number of times, including the following two posts:
- 20 Ways Search Engines May Rerank Search Results
- 20 More Ways that Search Engines May Rerank Search Results
Links Between Pages as a Ranking Signal
While there are many different kinds of signals that a search engine might use to rank web pages in response to a search, one of the important differences between web pages and pages that you might find in a collection of documents on an intranet is that web pages can link to each other with hyperlinks. Those links can help search engines identify which pages might be the most important ones if it pays attention to those links. The premise behind using links as references to other pages comes from thinking about citations in academic papers and how they refer to other resources.
When someone writes an academic paper that their peers will review, they will often include a list of citations to other academic papers as sources of references or data relied upon in their paper. It might assume that an academic paper that is referred to frequently by other papers is important. And it might assume that papers referred to by “important” papers are also important, even if many other academic papers themselves don’t refer to them.
Those assumptions about citations in academic papers are one of the influences behind PageRank. It takes advantage of hyperlinks between pages on the Web to determine which pages are important.
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page’s importance or quality. PageRank extends this idea by not counting links from all pages equally and by normalizing the number of links on a page.
Reciprocal Links Between Pages
While links between pages on the Web might be helpful, search engines are also concerned and suspicious about links between web pages. This includes reciprocal links.
Site owners have worked to take advantage of links between pages to make their pages look more important than they actually might be. Their primary focus hasn’t been to share links that provide value to people who visit their sites, or transparently connect to other sites that might be under their ownership or control, or link to pages that they value based upon the content of those pages. Instead, they link solely to manipulate link-based ranking signals to get their sites to rank more highly for search results.
Yahoo’s Patent Application on Excessive Reciprocal Links
A newly published patent application from Yahoo discusses how it might look at those links between pages for reciprocal links and attempt to determine whether those links exist to manipulate search results. The patent filing is:
Identifying excessively reciprocal links among web entities
Invented by Timothy M. Converse, Priyank Shankar Garg, and Konstantinos Tsioutsiouliklis
Assigned to Yahoo
US Patent Application 20090013033
Published January 8, 2009
Filed July 6, 2007
A method for identifying reciprocal links is provided. The host’s link to the particular host at a particular host and the set of hosts to which the particular host links are determined. The intersection and union of the two sets of hosts are also determined, and the sizes of the intersection and union are calculated.
The concentration of reciprocal links at the particular host is calculated based on the sizes of the intersection and union. A ratio of the intersection size to the union size determines the concentration of reciprocal links. The particular host’s rank in a list of ranked search results may be changed due to identifying a high concentration of reciprocal links.
Related Yahoo Patent Filings Involving Linking
This patent filing on excessive reciprocal links notes that it relates to a couple of other patent filings from Yahoo.
Search engines can use a method to keep an eye on who links to creating something known as a link graph or web graph. A link graph is a visual representation of the web that views a web page as a node and links between pages as edges or lines between those nodes. The Exceptional Changes in Web graph Snapshots patent application looks for changes to that link graph over time to identify suspicious activity. The abstract from the filing tells us:
Techniques are provided through which “suspicious” web pages may be identified automatically. A “suspicious” web page possesses characteristics that indicate some manipulation to artificially inflate the web page’s position within ranked search results.
Web pages may be represented as nodes within a graph. Links between web pages may be represented as directed edges between the nodes. “Snapshots” of the current state of a network of interlinked web pages may be automatically generated at different times. In the time interval between snapshots, the state of the network may change.
By comparing an earlier snapshot to a later snapshot, the search engine can identify such changes. Extreme changes, which vary significantly from the normal range of expected changes, can be detected automatically. Web pages relative to which these extreme changes have occurred may be marked as suspicious web pages, which may merit further investigation or action.
Suspicious Links Between Pages
The other patent filing pays attention to links from sites that it has already identified as “suspicious,” Link-Based Spam Detection. A snippet from that one:
In this section, the concepts of a spam farm, link page ranking (commonly referred to as “PageRank”), and trust-ranking are described.
A spam farm is an artificially created set of pages that point to a spam target page to boost its significance. Trust-ranking (“TrustRank”) is a form of PageRank with special teleportation (i.e., jumps) to a subset of high-quality pages.
Using techniques described herein, a search engine can automatically find bad pages (web spam pages) and, more specifically, find those web spam pages created to boost their significance through the creation of artificial spam farms. Those would be collections of reference pages. In specific embodiments, a PageRank process with uniform teleportation and a trust-ranking process is carried out. Their results are compared as part of a test of the “spam-ness” of a page or a collection of pages.
While these other two patent filings focus upon links between pages, they don’t look at how excessively pages or domains might link between themselves directly or indirectly through many pages or domains like this newly published patent application does.
It’s quite possible that the processes described in all three of these patent filings, as well as several others, might be used together to try to keep the use of linking as a ranking signal from being abused.
Reciprocal Links and “Suspicious Entities”
As I mentioned above, you may have read or heard that a reciprocal link is bad. The truth of the matter is more complicated than that.
The Yahoo patent filing gives us their definition of a reciprocal link:
A web page contains a “reciprocal link” when one of its “outlinks” is also one of its “inlinks.” A reciprocal link exists when a web page links to another web page, linking back to the web page.
So, a reciprocal link exists whenever two sites link back and forth to each other.
Circular Links Could Also Be Considered Bad
The patent also tells us that it will also consider links that are circular as reciprocal links. So, for example, a page from site A points to site B, a page from site B points to Site C, and site C points to site A.
If the links between pages (or domains or hosts) are a small percentage of the links on each page or domain or host, the process described in this patent filing may not kick off. I say “kick-off” Because this is an automated process rather than a manual review at this point.
If the percentage of links is larger than that, many steps might be taken by the search engine.
Review of Reciprocal Links
The sites might be reviewed manually by “human investigators.” They might also be examined by a program from the search engine that has been trained to look for signals of suspicious activity.
The patent application does tell us that pages or domains, or hosts might have a high percentage of reciprocal links for legitimate reasons:
For example, a particular web page may have many reciprocal links with a group of web pages because these web pages discuss the same subject matter in a complementary fashion. The web page authors have found it expedient for those web pages to refer to each other.
In another example, two web pages refer to each other using reciprocal links because those groups belong to two company websites where the companies are part of the same conglomerate.
A review of those pages might lead to a determination that they are “suspicious.” That could lead to an automatic demotion of those pages or domains or hosts in search results.
Some Pages May Be Included in an “Allowlist”
Some pages might be included in an “allowlist” of web pages or hosts or domains as automatically excluded from being identified as suspicious. These are sites that are known to be “popular” and “legitimate.” Not surprisingly, the patent application uses Yahoo.com as an example. 🙂
In an alternative approach, pages or domains that have been identified as “suspicious entities” might not be automatically excluded or demoted from search results. They may be further reviewed based on their content. For example, the page may be explored to see if it contains words related to pornography or prescription drugs.
Reciprocal Links Conclusion
The use of links by search engines as a ranking signal to determine how well a page might rank in search results. It is just one of many ranking signals that a search engine may use. Site owners might attempt to have links pointing to their pages only to increase the rankings of their pages in search results.
The three patent filings that I’ve referred to in this post are just ways that a search engine might try to identify when people attempt to inflate their rankings by linking solely to manipulate their rankings.
The chances are that if the links on your blog or site are open and transparent, and reasonable (rather than excessive). They provide value to your visitors, and reasonably cover similar topics or complementary ones. Your links may go out to sites that might link back to yours. A Search Engine might find those links to be legitimate. If you include indications on your pages that you will link back to others who link to you to boost rankings in search results, you may have more reason to be concerned. If you engage in link farms or link schemes or reciprocal link programs, a search engine might find your pages to be “suspicious” and maybe taking a closer look.
Other papers From Yahoo Researchers on Detecting Spam in Links
You may want to dig deeper into this topic. Here are some papers from Yahoo researchers on detecting spam pages by looking at links:
- Combating Web Spam with Trustrank (pdf)
- Link Spam Alliances
- Link Spam Detection Based on Mass Estimation (pdf)
- Link Based Characterization and Detection of Web Spam (pdf)
- Using rank propagation and probabilistic counting for link-based spam detection Slides (pdf)
- Link Based Spam Detection Slides (pdf)
- Know your Neighbors: Web Spam Detection using the Web Topology (pdf) slides(ppt)
- Link Analysis for Web Spam Detection (pdf)
- Web Spam Detection: link-based and content-based techniques (pdf)
- Technical Report YR-2008-001 – Witch: A New Approach to Web Spam Detection (pdf) (video)
- Web spam Identification Through Content and Hyperlinks (pdf)
I mentioned above that search engine ranking signals can be classified as content-based, link-based, and user behavior-based. A few of the papers above look at both links and content to find webspam. Another recent approach from Yahoo looks at user behavior and query logs to find spam pages:
Last Updated July 1, 2019.