How Google’s Knowledge Graph Updates Itself by Answering Questions

How A Knowledge Graph Updates Itself

unsplash-logoElijah Hail

To those of us who are used to doing Search Engine Optimization (SEO), we’ve been looking at URLs filled with content, and links between that content, and how algorithms such as PageRank (based upon links pointed between pages) and information retrieval scores based upon the relevance of that content have been determining how well pages rank in search results in response to queries entered into search boxes by searchers. Web pages connected by links have been seen as information points connected by nodes. This was the first generation of SEO.

Chances are good that many of the methods that we have been using to do SEO will remain the same as new features appear in search, such as knowledge panels, rich results, featured snippets, structured snippets, search by photography, and expanded schema covering many more industries and features then it does at present.

Continue reading “How Google’s Knowledge Graph Updates Itself by Answering Questions”

How Google might Identify Primary Versions of Duplicate Pages

We know that Google doesn’t penalize duplicate pages on the Web, but it may try to identify which version it prefers to other versions of the same page.

I came across this statement on the Web about duplicate pages earlier this week, and wondered about it, and decided to investigate more:

If there are multiple instances of the same document on the web, the highest authority URL becomes the canonical version. The rest are considered duplicates.

~ Link inversion, the least known major ranking factor.

Man in a cave
unsplash-logoLuke Leung

Continue reading “How Google might Identify Primary Versions of Duplicate Pages”

Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries

Augmentation Queries

In general, the subject matter of this specification relates to identifying or generating augmentation queries, storing the augmentation queries, and identifying stored augmentation queries for use in augmenting user searches. An augmentation query can be a query that performs well in locating desirable documents identified in the search results. The performance of an augmentation query can be determined by user interactions. For example, if many users that enter the same query often select one or more of the search results relevant to the query, that query may be designated an augmentation query.

In addition to actual queries submitted by users, augmentation queries can also include synthetic queries that are machine generated. For example, an augmentation query can be identified by mining a corpus of documents and identifying search terms for which popular documents are relevant. These popular documents can, for example, include documents that are often selected when presented as search results. Yet another way of identifying an augmentation query is mining structured data, e.g., business telephone listings, and identifying queries that include terms of the structured data, e.g., business names.

These augmentation queries can be stored in an augmentation query data store. When a user submits a search query to a search engine, the terms of the submitted query can be evaluated and matched to terms of the stored augmentation queries to select one or more similar augmentation queries. The selected augmentation queries, in turn, can be used by the search engine to augment the search operation, thereby obtaining better search results. For example, search results obtained by a similar augmentation query can be presented to the user along with the search results obtained by the user query.

Continue reading “Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries”

Learning to Rank

My last Post was Five Years of Google Ranking Signals, and I start that post by saying that there are other posts about ranking signals that have some issues. But, there are other pages that you may want to look at while you are learning to rank webpages, and I didn’t want to turn people away from looking at one recent post that did contain a lot of useful information.

Cyrus Shepard recently published a post about Google Sucess Factors on Zyppy.com which I would recommend that you also check out.

Cyrus did a video with Ross Hudgins on Seige Media where he talked about those Ranking signals with Cyrus, called Google Ranking Factors with Cyrus Shepard. I’m keeping this post short on purpose, to make the discussion about ranking the focus of this post, and the star. There is some really good information in the Video and in the post from Cyrus. Cyrus takes a different approach on writing about ranking signals from what I wrote, but it’s worth the time visiting and listening and watching.

Continue reading “Learning to Rank”

Five Years of Google Ranking Signals

Organic Search Google Ranking Signals

1. Domain Age and Rate of Linking
2. Use of Keywords
3. Related Phrases
4. Keywords in Main Headings, Lists, and Titles
5. Page Speed
6. Watch Times for a Page
7. Context Terms on a Page
8. Language Models Using Ngrams
9. Gibberish Content
10. Authoritative Results
11. How Well Databases Answers Match Queries
12. Suspicious Activity to Increase Rankings
13. Popularity Scores for Events
14. The Amount of Weight from a Link is Based upon the Probability that someone might click upon it
15. Biometric Parameters while Viewing Results
16. Click-Throughs
17. Site Quality Scores
18. Disambiguating People
19. Effectiveness and Affinity
20. Quotes
21. Category Duration Visits
22. Repeat Clicks and Visit Durations
23. Environmental Information
24. Traffic Producing Links
25. Freshness
26. Media Consumption History
27. Geographic Coordinates
28. Low Quality
29. Television Viewing
30. Quality Rankings

Semantic Search Google Ranking Signals

31. Searches using Structured Data
32. Related Entities
33. Nearby Locations
34. Attributes of Entities
35. Natural Language Search Results

Continue reading “Five Years of Google Ranking Signals”

Google Patent on Structured Data Focuses upon JSON-LD

Search Using Structured Data

Structured Data is information that is set out in a way which makes it easy for a search engine to read easily. Some examples include XML markup in XML sitemaps and schema vocabulary found in JSON-LD scripts.

A search engine that answers questions based upon crawling and indexing facts found within structured data on a site works differently than a search engine which looks at the words used in a query, and tries to return documents that contain the same words as the ones in the query; hoping that such a matching of strings might contain an actual answer to the informational need that inspired the query in the first place. Search using Structured Data works a little differently, as seen in this flowchart from a 2017 Google patent:

Flow Chart Showing Structured Data in a Search

In Schema, Structured Data, and Scattered Databases such as the World Wide Web, I talked about the Dipre Algorithm in a patent from Sergey Brin, as I described in the post, Google’s First Semantic Search Invention was Patented in 1999. That patent and algorithm described how the web might be crawled to collect pattern and relations information about specific facts. In that case, about books. In the Google patent on structured data, we see how Google might look for factual information set out in semi-structured data such as JSON-LD, to be able to answer queries about facts, such as, “What is a book, by Ernest Hemingway, published in 1948-1952.

Continue reading “Google Patent on Structured Data Focuses upon JSON-LD”