Which Google Link Analysis Approach May Have Changed?
In the Google Inside Search blog, Google’s Amit Singhal published a post titled Search quality highlights: 40 changes for February that told us about many changes to how Google ranks pages, including the following:
Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular, we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.
I decided to look at different link analysis methods that Google has used in the past to try to identify one that they may have stopped using. I couldn’t decide which one they may have stopped, but it was interesting seeing all of these in one place.
A lot of people were guessing which “method of link analysis” might have been changed, from PageRank being turned off, to anchor text being devalued, to Google ignoring rel=”nofollow” attributes in links, to others. I was asked my opinion by a few people and mentioned that there were a number of potential approaches that Google might have changed.
I’ve made a list of a dozen possibilities and granted Google patents that describe them, but Google uses link analysis in a lot of ways, and what Google turned off might involve something else entirely, and/or something that might not even be described in a patent.
Here’s my list:
1. Local Inter-connectivity
Search Results are ranked normally in response to a query, and then before they are displayed to searchers, the links between the pages in that smaller subset are explored and some results may be boosted in the results based upon links between those results.
The book In the Plex mentions that the inventor behind this patent, Krishna Bharat, developed an algorithm similar to the HITS algorithm that was incorporated into what Google does in 2003. This patent was granted in 2003, and it’s similar in a number of ways to the HITS algorithm.
This process might be somewhat unnecessary these days, especially if Google is reranking search results based on something like the co-occurrence of terms in a result set based upon phrase-based indexing. – Ranking search results by reranking the results based on local inter-connectivity
2. Finding Related Sites
If you perform a search that appears to be for a specific site, you might see a list of other pages at the bottom of the search results, with a heading (that’s also a link) that heads “Pages similar to www.example.com”. If you click upon it, you’ll see search results for [related:www.example.com], The method that determined which pages were related was based upon links pointing at those pages using a link-based analysis.
Could Google have found a better way f finding related pages? It’s possible, but the pages showing don’t seem to have changed. – Techniques for finding related hyperlinked documents using link-based analysis
3. Adaptive Page Rank
This patent describes a faster approach to calculating PageRank, taking some shortcuts. It can take a while to calculate PageRank, and a method like the one described here could speed that up.
Google has a lot more pages indexed now than they did when the patent behind this approach was filed, and they may still need this shortcut. They’ve also advanced technologically, and may not. – Adaptive computation of ranking
4. Cross Language Information Retrieval
It might be possible to use anchor text from a link on a page in one language to understand what webpage that link is pointing to in another language, to understand what the targeted page is about.
Google has done a lot of work in building statistical machine translation models over the past 5-7 years and that technology might serve them better than an approach like this one. – Systems and methods for using anchor text as parallel corpora for cross-language information retrieval
5. Link Based Clustering
Google has probably clustered similar web pages by looking at other pages that link to pages appearing in search results, and seeing what other pages they link to.
Google might have replaced this clustering approach with one that focuses instead more upon the content and/or the concepts contained on those pages. – Link based clustering of hyperlinked documents
6. Personalized PageRank Scoring
Determining personalized page scores for web pages based upon links pointing to pages that appear for specific queries in search results and whether the anchor text in those links are related to those query terms.
Google might use a different approach, such as one that may look at large amounts of data about searchers, pages, and queries to calculate a personalized page score for pages. – Personalizing anchor text scores in a search engine
7. Anchor Text Indexing
Using anchor text for links to determine the relevance of the pages they point towards. It’s quite likely that Google continues to use an approach like this, but in a modified manner that might be influenced by things like phrase-based indexing – Anchor tag indexing in a web crawler system
8. Link Analysis using Historical Data
In 2005, Google published a patent application that describes a wide range of temporal-based factors related to links, such as the appearance and disappearance of links, the increase and decrease of backlinks to documents, weights to links based upon freshness, weights to links based upon authoritativeness of the documents linked from, age of links, spikes in link growth, relatedness of anchor text to page being pointed to overtime.
Google may have used some of the factors described in this patent and continue to use them or replaced them with something else, and it might have ignored others, – Information retrieval based on historical data
9. Link Weights based upon Page Segmentation
We’ve known for a few years that Google will give different weights for links based upon segments of a page where a link is located. It’s quite likely that something like this might continue to be used today, but it might have been modified in some manner, such as limiting in some way the amount of value a link might pass along if, for instance, it appears in the footers on multiple pages of a site.
Then again, Google probably has already been doing that. – Document segmentation based on visual gaps
10. Reasonable Surfer Model Link Features
Google’s Reasonable Surfer model describes a good number of features that might be taken together to determine how much value a link might pass along from a page in relation to other links on that page, and it’s possible that one or more of those values are no longer considered in a way that they might have been in the past. – Ranking documents based on user behavior and/or feature data
11. Links between Affiliated Sites
Some sites may be deemed to be related or affiliated, to others in some manner, such as being owned by the same person or people. The value of those links might be diminished because of that relationship, in comparison to other “editorially determined links.”
How that affiliation is calculated might have changed. – Determining quality of linked documents
12. Propagation of Relevance between Linked Pages
Assigning relevance of one web page to other web pages could be based upon the distance of clicks between the pages and/or certain features in the content of anchor text or URLs. For example, if one-page links to another with the word “contact” or the word “about”, and the page being linked to include an address, that address location might be considered relevant to the page doing that linking.
There are a few different parts to this method of having the relevance of one page on a site propagated to other pages on the same site, and one or more of those could have changed if it is in use. – Propagating useful information among related web pages, such as web pages of a website
What “method of link analysis” do you think Google turned off?
Updated July 5, 2019