If a search engine suggested topics for you to write about because those topics weren’t represented well in their search results, and was considered inadequate search content, would you write about it? Would a search engine take that step?
A Google patent application published this week explores that topic as well as describing some approaches that they might use to gauge the quality of their search results.
The patent is a continuation of a patent granted to Google in February of this year, Identifying inadequate search content, and is interesting for a number of reasons such as Google’s Chief Economist (Hal Varian) and the head of Google’s Webspam Team (Matt Cutts) being amongst the listed inventors.
The newer version was filed shortly before the original patent was granted, and the claims sections from each present the invention in somewhat different manners but the descriptions for both patent filings are very similar.
Both explore how the search engine might use statistics associated with queries, and a review of how relevant and important the search results are for those queries to determine the quality and quantity of search results that appear for them. I did notice an addition dealing with results for queries in more than one language added to the new version. The patent application is:
Identifying Inadequate Search Content
Invented by Jeffrey David Oldham, Hal R. Varian, Matthew D.Cutts, Matt Rosencrantz
Assigned to Google
US Patent Application 20100138421
Published June 3, 2010
Filed February 3, 2010
Systems and methods for identifying inadequate search content are provided. Inadequate search content, for example, can be identified based on statistics associated with the search queries related to the content.
Query Statistics and Document Statistics
It’s likely that Google collects an incredible amount of information about searches people perform at the search engine. One good example comes from a Google patent granted June 1, Accelerated large scale optimization, which gives us a list of the kinds of data they might collect. That information could be stored as “triples of data” where each of those triples involves information about users, queries, and documents. That patent tells us that Google might collect more that five million distinct features associated with users, queries and documents, such as:
- The country in which user u is located,
- The time of day that user u provided query q,
- the language of the country in which user u is located,
- Each of the previous three queries that user u provided,
- The language of query q,
- The exact string of query q,
- The word(s) in query q,
- The number of words in query q,
- Each of the words in document d,
- Each of the words in the Uniform Resource Locator (URL) of document d,
- The top level domain in the URL of document d,
- Each of the prefixes of the URL of document d,
- Each of the words in the title of document d,
- Each of the words in the links pointing to document d,
- Each of the words in the title of the documents shown above and below document d for query q,
- The number of times a word in query q matches a word in document d,
- The number of times user u has previously accessed document d, and
- Other information.
That information might be collected as a triple of data, otherwise referred to as an “instance.” A triple of data would take information about a user, a query, and a document. For example, one of these instances might tell us:
- The country where the searcher is located
- The language the query was written in
- The words in the title of the document
The Inadequate Search Content patent doesn’t provide much detail about the statistical process that it might use, but it’s possible that it might use something similar to the process described in the Large Scale Optimization patent. It limits its description to telling us that it might review user-behavior information such as:
- Whether the user clicks on a result,
- How long the user examines the results of a click,
- Whether the user tags or recommends this site,
- Search queries,
- Search results,
- Time and date information associated with search queries,
- Refinements of search queries occurring during a search session, etc.
We also know that Google uses a considerable amount of information to rank pages in search results. Some of these involve creating a score based upon how relevant a page is for a query performed by a searcher, such as the words used in a title of a page, and some of them involve a score based upon the quality or importance of a page, which could include a PageRank score. The combination of relevance and quality scores can determine how highly a page might rank in response to a particular query.
Google also may rerank search results based upon a number of features such as the preferred country and languages of people performing a search, whether or not the results contain duplicate or near duplicate content, and others.
Uncovering Inadequate Search Content
Google might look at the statistics associated with a particular query, and the relevance and quality of pages that show up in response to that query, and then attempt to assign a topic to the query, based upon related queries, and collect statistics about that topic.
If the topic includes unserved and underserved queries, meaning that there either isn’t relevant content found for those queries or the demand for the content outweighs the quality or quantity of results, and the queries in question are somewhat popular based upon those statistics, then the search engine may inform content creators that the results for the query or the topic are inadequate.
The statistics associated with queries can contain information about language, geography, demographics, and time. So, a query that might be popular around particular holidays, days of the week, or times of the day, but inadequate to the demands of searchers may be identified.
Sharing Information About Inadequate Search Content with Content Creators
The patent provides a number of possible ways that this information might be shared with content providers, and identifies a number of different content providers it could be shared with.
Information about underserved topics could be suggested to publishers who provide content on the web for free, to publishers who provide information through a subscription-based model, to publishers who show advertising about certain topics on their web sites.
Searchers might be informed that the results for a topic are limited and could use someone to create content on a topic. The patent even suggests the possibility of creating a topic search engine, where potential publishers can search for queries and topics that are underserved by the search engine with inadequate search content.
The search engine might also provide an automated content generation system which aggregates information related to queries and topics that are inadequate, and suggests that it might limit the sites that appear to specific sites, possibly where the content is provided through a license to use it.
The information could also be shared with user contribution sites such as wikis, and be used to create stub articles that users of those sites could then expand upon.
The patent also mentions the possibility of selling information about underserved topics to web publishers.
One topic included in the newer patent application’s description that wasn’t in the original patent description involved identifying topics or queries or both which might be poor in one language while adequate in others, which could be useful to people willing to create new content involving a topic in other languages.
Offline publishers might use information involving inadequate topics to create print publications. As we’re told in the patent:
For example, if searches for “biography of Millard Fillmore” and get only a few results, such publishers might contemplate commissioning a book about Millard Fillmore’s life.
Long Tail and Ecommerce
This information may also be useful to people engaged in ecommerce. For instance, if this system determines that a search for a product such as “purple alligator leather belt” is popular and underserved, someone offering products on their web site (or physical storefront) might decide to offer purple alligator leather belts.
Inadequate Search Content and Query Broadening
Another important aspect of the patent that isn’t given much discussion within the patent involves how the search engine might respond to topics that don’t provide many results, or enough relevant results to meet the demand for them.
Where some queries provide inadequate search content, the search engine might use that information to identify areas where they should broaden a query for a searcher so that they might find relevant and quality content.
One example might be to provide results in other languages (likely with a “translate this result” link).
There are times when we perform searches at a search engine and may be completely unsatisfied with the pages that we receive. In that instance, it’s possible to blame the search engine for the lack of results rather than web publishers for not having created pages that include our query terms or cover related topics.
It’s possible that there may be information on that query or topic that isn’t in a very search engine friendly format, which couldn’t be indexed by the search engine. It’s also possible that there just aren’t very many quality pages that might provide results on those topics.
We don’t know if Google will start sharing information with us in the future about inadequate search content for specific queries or topics, but if they do, it could provide opportunities for businesses and organizations to more easily identify opportunities to provide content for popular queries that are unserved or underserved.
Is it a step that Google should take?
Added August 9, 2019 Google has introduced something they are referring to as the Question Hub, which is geared towards providing content creators with information about topic where there is inadequate search content, just like the patent I wrote about in this post might do. It appears to be available in India, and not in the United States yet. But, I signed up to be invited as a content creator, because knowing topics that there is inadequate search content for would be good to know about.
89 thoughts on “How Google Might Identify Topics With Inadequate Search Content”
Wow! If Google starts suggesting topics to write about, then it is no longer a search engine only. It then begins to take the role of a global information management system. I am not sure I am comfortable with that as a human. As a blogger, I think it is a great idea.
I think that’s a very valid concern. The search engine goes from being an indexer and information gathering tool to becoming more of a direct influencer.
Considering that Google not only includes information from the Web, but also a growing amount of information about printed materials with book search results included in what it shows for queries, suggestions for topics to write about could be pretty influential.
And I wonder, even if a query is popular, and if there isn’t much about it in a search engine, or many relevant documents, is that always a good indication that it is an ideal subject to expand upon? Sometimes there’s a reason why people haven’t expanded upon a subject. Also, queries that do have many relevant results, upon high “quality” pages, may contain incorrect or misleading information. Just because many have written about a subject doesn’t mean that the pages about it can’t be improved upon.
Between Google’s DNS service, its toolbar, Analytics, and everything else, I’m pretty sure Google already is a global information purveyor. This if anything just makes it blatant.
I’m honestly not sure if this would beat analyzing Twitter’s trends for this sort of thing, though – there is a limited amount of time before a topic starts getting popular before the market is saturated.
I wonder who and how content creators would be contacted to fill such holes. The concept is interesting, but as David mentions, I imagine a topic could fill with results quickly, once awareness increased. And as you mention Bill, just because there’s not much information, does not mean it needs to be expanded upon.
LOL … I have to admit I still use Google as a search engine too. 🙂
Do not get me wrong … I love some of the tools Google provides to the world. It would make my live a lot harder if I could not use Gmail, Trends, Alerts, Analyitics, etc. However I see some possible harm in the fact that Google is becoming omnipresent on the web. The thought of Google using this patent to successfully beat Wikipedia with Knol, did make me think of their ever growing power. The thought of Google slowly becoming synonymous with the world wide web (ie. no WWW without Google) can be worrying.
PS: thank you for fixing the typos in my last post
Mark – Google stopped being a search engine a long time ago.
They now push PPC, Product Results, Local Business listings, and now Brand Links
The average Joe clicks ABOVE the fold of organic listings so what’s the point of coming up with another dumb patent?
Quality Search has now turned into Paid Search. Google hates Made For Adsense (MFA) sites, but yet their search engine is MFA (made for advertisements)
This Patent is just another marketing ploy. Actually, all their patents are MARKETING PLOYS.
If they really cared about people who search then they wouldn’t push PPC (and other garbage) above the organic listings.
What is the point of creating a patent for organic listings when nowadays most people click on PAID LISTINGS not organic listings?
PPC or Paid listings – who’s got the most money to spend not who’s got the best content.
Organic listings – what’s that? does that even exist today?
One of the thoughts that came to mind when I was reading through this patent was how South Korea’s Naver.com captured a very large share of Korea’s search traffic by allowing South Koreans to answer questions from searchers. See the 2007 New York Times article: Crowd’s wisdom helps South Korean search engine beat Google and Yahoo.
While Twitter can help identify popular topics, it doesn’t look at the searchable Web (those pages that can, and have been indexed by the major search engines) to see if there is information on the Web about those topics. This patent filing isn’t so much about finding popular topics as it is about finding queries and topics that there may not be many relevant results for on the Web, or where there are some results but possibly a demand for many more.
I would actually welcome Google providing suggestions to people on queries or topics where there is a demand but sparse relevant content.
The patent does explore who those many different content creators possibly could be, and how they might be notified that specific searches could be improved upon.
My biggest issue with queries where there may not be much in terms of relevant results isn’t so much that people might try to provide more results, but rather that there may be areas where there might be many relevant results but those results might not provide the best answers possible. Some topics are more important than others. For example, I’d rather see researchers spending their time trying to help cure cancer than writing new biographies of Millard Fillmore. 🙂
I have to agree with Robert. Google stopped being a search engine long ago. They created and acquired quite a few sites/applications that are not classic search engines. Think: Docs, Maps, Buzz, Gmail, Blogger, Feedburner, Urchin/Analytics, Wave, YouTube, Picasa and last but not least Knol. The patent might be the result of a strategy to make their own sites fill in the gaps for underserved queries. Depending on where you are logged in Google could suggest you to write a Blogger post, write a Knol, make a YouTube movie, add information to Maps, etc. so they can then serve that content for previously underserved queries. It could even be part of a strategy for helping Knol to be the new Wikipedia. They do have a tool here that can tell them which content needs to be created on Knol to make it more comprehensive than Wikipedia.
When I read Bill’s comment on naver.com I noticed I forgot to mention Aardvark as one of the sites owned by Google that could benefit from this new patent.
I saw your tweet apologizing that you may have taken your comment above a little too far. I guess it’s understandable that when we rely so much upon something like a search engine that if we perceive there to be some kind of bias, it hits a nerve.
Your comment actually reminded me of a couple of paragraphs that are probably pretty well known to people who follow Google closely, written in the very early days of Google by Sergey Brin and Lawrence Page, from The Anatomy of a Large-Scale Hypertextual Web Search Engine. See 8 Appendix A: Advertising and Mixed Motives, which discusses advertising as a business model for search engines, and the potential for bias that advertising can bring.
I’ve read and written about a considerable number of patents from Google (without any urging by them or compensation to do so and even with a concern that it might bother them that I do), and I’m convinced that there are a lot better ways for them to market themselves than through patent filings.
Hi Arjen (Coding Strategist),
I still use Google as a search engine. 🙂
It’s true that Google offers many more services than just their search engine, but I’m not sure that I see the harm in them expanding beyond search to offer those other services.
Right now, they haven’t started providing information about which queries or topics might have inadequate results, and while the patent suggests a number of possibilities, there isn’t too much amongst those that show that Google might limit access to that information to only people who would post that information on Blogger, or Knol, or through one of their other services.
While this patent offers some possibilities into what Google might do in the future, it’s possible that they might not offer that kind of information at all. It will be interesting to see how they might move forward if they do.
I’ve done some searches on Google on a few topics and and didn’t find many relevant or helpful results, but following up on those with searches on Yahoo or Bing often didn’t give me any better results. It could be easy to blame the search engine for the lack of relevant results, but often there aren’t many results because there aren’t many pages about those topics.
While Google could provide this kind of information only to people who use their services, I’m not sure that they would. There are a number of examples of information that Google provides to web publishers regardless of whether they would use that information on Blogger or a non-Google blog or web site, such as the Google Keyword Suggestions Tools, Google Insights for Search, Google Trends, Google Alerts, Google Translate.
Good post, Bill.
In addition, Google appears to be discounting blog comment links as well as blogroll links.
I thought you would find this interesting, Bill:
Google used to show several dozen links pointing to my site from your blog as well as over a hundred links pointing to my blog from the link you posted in your blogroll.
As of the past few days, they have removed all of them when I do a link: search on my main domain as well as my blog.
I am not sure if this is temporary or permanent. I just noticed it today.
Hi People Finder,
Once upon a time, Google used to show only the highest ranking backlinks (or at least a number of those) to a site using the link operator. Presently, Google only shows a small sub-sample of the backlinks for a site. If you register for Google Webmaster Tools, they will show a greater number of backlinks that they know about.
The number of links shown there is usually greater, from my experience, but may not show all links they know about as well. Yahoo’s Site Explorer usually shows a greater number of links to a site than Google, but it’s probably a bad assumption to make that the backlinks Yahoo is showing are ones that Google should also know about as well.
See the following video from Matt Cutts on the subject:
Hi Coding Strategist,
Many of those tools are very helpful to me as well.
I do understand your concern about Google increasingly becoming a walled garden, where Google search results focus upon providing searchers with content found on Google properties as opposed to other places on the web. Interestingly, one of the places that they mention might benefit from “inadequate topic notices” would be wikipedia. From the patent:
I remember that for many years, my parents were using AOL to connect to the Web, and rarely left AOL pages when they went online. The AOL browser used to show an incredible amount of popups, which would frustrate me when I went to visit them. I showed them how to minimalize the AOL browser and use IE or FireFox and explore other places on the Web. During those days (years), AOL was pretty much the whole web for them, so I understand the concern.
No problem on the typos – I know how frustrating it is to submit a comment and not be able to go back and edit them. 🙂
Thanks for the info on the link issue. I never could understand why Yahoo showed so many more links than Google for a given site.
Hi People Finder,
It’s surprising sometimes how many more links Yahoo indicates than Google. It’s probably worth digging into to try to understand why Yahoo’s totals tend to be so much higher.
Google should never `officiallyÂ´ offer such a Trending-Topic-Service. Once begun, it would make journalism completly unobjectivâ€¦
Google, Facebook, Yahoo, Twitter, Bing, Aol, Amazon, etc., are extremely good at utilizing massively scalable technologies to get us to create content for them without cash compensation to us. Where does it end? When do people take the red pill?
Bill, David: yes, Twitter *could* be a better source than Google, but there’s one small problem. Twitter to the best of my knowledge does *not* disclose anything about what people *search* for to anyone, possibly not even partners. If they do, it’s certainly a very closely-held set of information. Contrast that with Google or Bing, where Google and Microsoft give away the analytics tools to help webmasters succeed and punish the spammers every way they can.
Don’t get me wrong – Twitter and my personal blog will remain my primary web presence for the foreseeable future, and I’m sure *eventually* Twitter developers like myself who’ve chosen to stay the course and focus on the platform will be rewarded. But right now, I can only tell what people are selling on Twitter, not what they are seeking in Twitter Search.
My advice is to invest in the technology – at some point, Twitter search optimization will be possible and as necessary as Google search optimization is today.
I think alot of people take alot of googles actions the wrong way. Googles objective first and foremost is to create a better search, and in turn a better web. If google cannot find the content that people are searching for then why not suggest its creation. Afterall thats what we use search engines for, to find information..
I must agree with Anthony. Google is a company for profit and must their customers with better search experiences. It has to reinvent itself and improve itself all the time.
Already happened, didn’t it? Didn’t Youtube bemoan the lack of Spanish video content, leading to Demand Media jumping to order?
From “When YouTube’s sales team bemoaned the tiny supply of Spanish-language videos for it to run advertisements against, YouTubeâ€™s Hoffner called up Demand. Within weeks, Demand Studios started issuing Spanish-language assignments.”
Being a big monopoly as Google is, constant changes are expected for the provision of better services to all users. And with that, Google could earn more and still be the most widely used around the globe. Google never failed to create new things and tools that are really useful.
I’d argue the other side of the coin to Mark. I don’t see a problem with a search engine saying that what it has found so far can’t be considered sufficient.
Wikipedia does it with, “This article is a stub”. Google already pretty much controls how information is moved from point A to point B and who should be getting the biggest say. I expect that this might come over time that someone searching for information on specific topic will be told that they should consider adding what they already know. Quality… well, that will forever be a gripe of legit webmasters and searchers alike.
“There are times when we perform searches at a search engine and may be completely unsatisfied with the pages that we receive.”
The problem is that this model – using search data to drive content production – is already in place. The likes of Mahalo and eHow do it now. But the problem is that they don’t necessarily produce great content that improves the results. This game is a race to the bottom – who can produce the cheapest content the quickest that seems to match the query.
I totally agree with what Anthony said, Google’s main aim is to provide information, so it’s just but normal for them to find ways to provide quality, updated and reliable information to searchers around the world. The more useful information provided, the better.
while sometimes the phrase “less is more” is true, i dont think it can be applied to search engines… googles algorithm is designed to match the best results for your query, It is very good at it. Having more information to base those results on, can only better the quality of results. You have to remember that a search engine does not understand english, It can only really come to any conclusion by comparing datasets, usually it will use a majority to come to this conclusion. Having more data can only improve results!! Again as i said earlier googles primary objective is to create a better search/web so im actually suprised it has not done this earlier. Yes google is a business, but to put this move by google down to monetry gain alone is just naive and quite frankly short sighted!
Bill, I apologize, but this comment is slightly off topic, but I had to respond to Robert Enriquez…
@Robert Enriquez…It is funny that you mentioned the following…
“Google stopped being a search engine a long time ago.
They now push PPC, Product Results, Local Business listings, and now Brand Links
The average Joe clicks ABOVE the fold of organic listings…”
Just today I was noticing Google’s new layout on their results page that they enabled a month or two ago. The paid results used to be on a slightly different colored background. It still is, but now it looks like that background is so light, I have to lean down to pick up on it and I think that organic result # 1 is now paid link # 3. Is this the beginning of the end???
Interestingly, I’ve seen a few tweets from some journalists about this post that seem to feel somewhat positive about Google possibly doing this. I’d love it if someone with a journalism background would comment further or blog on their thoughts about Google sharing information about inadequate topics.
The web does seem to be a self perpetuating content generating machine, at least with businesses that provide tools for people to interact and create.
I see some benefits to topic and query suggestions that, while they may help Google provide better information to searchers, can also benefit the publishers who create that content. I mentioned an example of ecommerce sites using this tool to identify possible products that they could sell in my post.
Being able to identify gaps in search coverage may also be helpful in identifying gaps in markets and business models.
In response to your second post, Twitter is definitely worth watching in the future, but it doesn’t provide information presently the way that Google and the other search engines regarding searches. It can help us identify “trending” topics in different locations, but not the scale of those topics or the demand to learn more about them by others.
Thank you. In response to both of your comments, I agree with you that this kind of topic suggestion appears to focus upon making the search engine better, while leaving it up to others to decide what content they might create, and how they may create it. At this point, on paper, there isn’t anything to suggest that Google is doing anything to influence the creation of watered-down content that will lead people to Google owned sites to find that content.
Thank you, Mike.
I agree with you. I do think that Google is in a position where it has to constantly show that it is innovating. I think that’s both one of its strengths, and one of its burdens.
I hadn’t heard about the interaction between YouTube and Demand Media. Thanks.
I have seen a few tweets from people suggesting that they would rather see Google provide information about inadequate queries and topics to the public than to see those informational needs addressed by companies like Demand Media.
Good point about stub pages on wikipedia. The patent filing suggests that one way topic notifications on inadequate topics and queries could be used is in the creation of stub pages like wikipedia. If those topic notifications, or an inadequate topic/query search engine is created, hopefully they will be available to all so that web publishers are free to compete with each other on building quality content about those topics.
Is this method from going something that will inspire a race to the bottom, or a quality control approach that identifies either a lack of pages on a topic, or a lack of quality and important pages, and shares that information equally with all publishers? If Google moves forward with this, I guess we will see.
There are sites that spit out pages on a wide range of topics regardless of whether or not Google suggests topics. Google is still going to rank any pages created differently based upon relevance and quality.
I think we have come to expect Google to introduce new things, and make changes, and would start wondering if they stopped.
I think we’re both feeling pretty positive about Google suggesting topics like this.
Having said that, I do welcome dissenting opinions – for instance, if you think that topic suggestions from a search engine could lead to more content created on Google based sites like Knol or YouTube of Blogspot, or pages running adwords/adsense, please leave your thoughts below…
â€œGoogle stopped being a search engine a long time ago”
Thats plain innaccurate! Almost everything google does is based around its search.. Yes I agree google is moving into other areas with gmail and gdocs and gwave but you will notice that everything has google search intergrated into it.. They are simply bringing their search engine into other areas of the web/desktop, and yes, they also may intergrate adsense into these application but thats just good business sense. If Im looking to buy something online and I use google search, so what if the result I click on is a paid for result, as long as its accurate to my search term.
Look at how many fantastic services and features google offers the user for free.. If using these services means I see a few adsense ads now and again so be it.
It does seem like it is getting harder to tell the “sponsored” results from the organic results. I don’t think that is a good movement.
One of Google’s strengths is search, so their entry into software for TV set top boxes, their navigation system for motor vehicles, and their phone software all feature search prominently. I don’t mind seeing ads, but I have to agree with Mark that I want to be able to tell the difference between sponsored results and non-sponsored results.
Google highlights very well the sponsored results from organic results. If you compare it to the other Advertising that has been existing before google, it is really genius, some kind of revolutionary. But in the Future Google will manage to melt them together i think so friend.
The FTC has sent some warnings out the the search engines in the past about making it clearer that some results are paid for, or “sponsored” and others aren’t. I think it might be getting a little harder to tell whether or not the sponsored links at the tops of search results are paid for, and I’m not sure that’s a good direction to take.
I can’t possibly think of many areas on the web that are of “inadequate content” however I feel it’s a great idea by Google in their quest to dominate the world! Heh. It’s a good move in terms of filling in the gaps to make them the biggest encyclopedia of knowledge on the planet, it has and probably always will be scary to learn just how much user data is being collected by Google for planning new features, tools and apps.
I disagree with the statements that Google stopped being a search engine a long time ago though, 90% of what they do is based around search and features some instance of a ranking algorithm, this is what they are good at and liklihood is, will always be the case.
With billions of queries conducted a month at Google, chances are that a good percentage of those will show sets of result pages that either don’t provide much in the way of relevant results or where there aren’t enough relevant results to meet the demand for them.
Google could use the data they collect about searches and the quality of results to build a system like Demand Media, where they hire people to write pages, create videos, and publish pages where they found inadequate content, but the patent seems to indicate that they would make this information about inadequate content available to the public, to use in any way that they might want.
Making the inadequate information available to the public along with search volume data will not take too long before this is capitalised by link builders/SEO’s as “easy ranking” I would imagine.
Regardless of whether or not Google provided this type of information, an SEO conducting keyword research should be able to get a sense of how adequate search results might be for specific queries by looking at the pages that appear as results for a query and at some additional data associated with those pages.
Making information available to everyone might make it easier for SEOs to identify queries with inadequate results, but it also would make it easier for other people considering publishing information on the web to identify those queries and topics as well.
Interesting I would not have thought of that to find topics for articles. Good idea.
I agree with the above posters that Google has become focused too much on the advertisers. Just like I don’t like stumbling upon an MFA site, I don’t want Google to direct me to someone that “bought” their way to predominance on Google.
I hope they don’t lose sight of the reason people use Google.
if google do use this, and use this in search result suggestion, then a lot will be changed in the SEO industry.
we shall find good quality (high search volume low competition) niche/keywords more easily.
google is great if they do provide un-popular results as well as popular results to us.
The patent is pretty neutral when it comes to how people might use information about where and when Google might identify inadequate search results. It does mention that advertisers might be interested, but it also presents a fairly wide range of other web publishers who might find this useful as well.
I don’t think that its purpose is to help people create more made for adsense sites, but rather to make web publishers more aware of opportunities to create better results for the topics and queries identified.
I believe that this will be helpful to site owners who are interested in providing better information to searchers. We’re going to have to wait to see if Google goes ahead with this, but it would be interesting.
Google is not always to blame for bad results, and i think they are headed in the right direction in understanding the content and improving the results.
I agree – sometimes there just aren’t pages, or search engine friendly pages, that are relevant for some queries. I think a process like the one described in the patent would make their results stronger by enabling web publishers to address those terms and topics that are unrepresented or under represented in search results.
“I think thatâ€™s a very valid concern. The search engine goes from being an indexer and information gathering tool to becoming more of a direct influencer.”
Sounds like a valid concern in politics – “hey @Politico – there aren’t enough positive pieces about ABC, nor enough slamming criticisms of XYZ [and we’d love to rank you higher and send you traffic if you wrote those pieces about them]”
Politics, and perhaps a lot of other fields. Very good point – Will tomorrow’s campaign managers be racking inadequate content trends? Will that be a new feature in the evolution of search and SEO? Maybe.
If Google is not an AI, (Artificial Intelligence), I don’t know what it is.
It already suggests alternate directions in search with the Wonder Wheel.
It uses LSI,(Latent Semantic Indexing), in it’s quest for relevance.
It “speaks” many languages, can translat4e them all for you, corrects your spelling and even offers to search for the misspelled terms.
It has ethics, “Do no evil”, and offers the best advice it can, freely.
It is a global citizen with residences in most geographical areas.
And finally, it makes relevance judgments the same way a human would. By the interpretation of the visual display.
I agree that quality is in the eye of the beholder and Google has no choice but pursue the most relevant.
“Quality content” is a myth the same as “Ideal keyword density”, and IBL, (In Bound Link), influence.
Thanks. You raise a lot of interesting topics in your comment.
Google’s head of research, Peter Norvig, was asked (video) by readers of Reddit how close Google is to using AI, (something he knows more than a little amount about), and he stated that they still have a good distance to go on that front. But they definitely do some surprising things.
There are a lot of similarities between what Google is doing with the wonder wheel, and what Quintura is doing with its visual search.
I don’t believe that Google is using LSI, but it’s quite possible that they’ve incorporated some PLSI into what they are doing – there are more than just a couple of Google patents that describe how they might be using it for both organic and paid search.
Google’s approach to statistical machine translation, and its use of that approach to doing things like finding synonyms is still in its infancy, but I think it’s impressive.
I do like a lot of Google’s policies and approaches to business and business ethics.
Representatives from Google do make a lot of use of the phrase “quality content” when describing how to create pages that will do well in search results. There is a lot more than just creating great content, but I don’t think it hurts. I’m not willing to call it a myth as much as saying that it’s only part of the picture. Great content that no one sees because the pages it is on is unspiderable, or because it has no links pointing to it, or because it doesn’t use the language that people who are interested in it might use to search for it still goes unseen.
Google has such massive resources that they can “brute-force” natural language processing, just as Deep Blue brute-forced grandmaster chess. The fact that Google still isn’t “intelligent” says only that the Turing Test is harder to pass than grandmaster chess is. I’m *really* looking forward to the IBM computer trying to win at “Jeopardy!” 😉
Probabilistic Latent Semantic Indexing could be used but while retrieval
experiments on a number of test collections indicate substantial
performance gains over direct term matching methods, as well as over LSI, it is not the method of preference.
I would think that it is less than likely as direct term matching is not one of the major factors in PLSI.
Don’t get me wrong.
I really like what Google is doing and have been following them closely since they first came online in September 1997. I had an interest in search from the time I got online in ’94.
They are making our jobs sooooo very interesting.
What you’re saying sounds like the conclusion to the paper The Unreasonable Effectiveness of Data from Alon Halevy, Peter Norvig, and Fernando Pereira, which ends with this line:
I hope to someday see those Deep Blue Jeopardy challenges…
I know LSI has some substantial limitations as well.
One of the things that I was thinking about when referring to Google and PLSI was a Google paper on personalized news:
Google News Personalization: Scalable Online Collaborative Filtering
Google definitely does make what we do interesting.
IMHO, Google News Personalization: Scalable Online Collaborative Filtering is unnecessary.
When one signs up for Google’s new page we subscribe to a bunch of sections. Silicone Alley Insider, Web Analytics at Yahoo Groups, bad science, etc.
The content of these sections is determined by the selections and publishing schedule set by each group’s manager.
No filtering is necessary, each section only receives appropriate material.
Some of my interest has been in a 11 year old article, http://www.cs.brown.edu/~th/papers/Hofmann-SIGIR99.pdf
I must admit I go more than slightly cross-eyed when I read it.
On another front, thereâ€™s a company called Recorded Future that says it can use information scoured from tens of thousands of websites, blogs and Twitter accounts to predict the future. And before you laugh, itâ€™s got some heavyweight backers including Google and the CIA.
Recorded Future uses the term â€œtemporal analyticsâ€ to describe what it does. It extracts information including entities, events and the time that these events occur from the thousands of news publications, blogs, niche sources, trade publications, government web sites, financial databases and more that the company continually scans. Using this information the company says it is able to find the relationships between people, organizations, actions and incidents, not only in the past and present, but also in the future.
Not only is big brother watching, it is predicting.
The changes to Google News are fairly recent ones, but the concept of collaborative filtering is something that may play a role in other places within Google’s algorithms.
The Hoffman paper on PLSI is very good, but I’ll confess that it makes me slightly cross-eyed as well. 🙂
I had read something recently about Recorded Future, and it reminded me of a patent filed by Yahoo a few years ago, which I wrote about in The Oracle at Yahoo: Using Yahoo News to Search the Future.
I would expect that the processes involved would be different in a number of ways, but as a decision making tool, it might be useful – and frightening as well.
pLSI got superseded by probabilistic topic models / Latent Dirichlet Allocation (LDA) somewhere along the way.
That’s what I think I’m seeing from the search engine patents as well.
This article and the relevant patent that google filed really scares me and i don’t know who I can talk to about it. It cuts very close to an idea I thought about years ago related to googles search products/search results. I am afraid that my idea is either worth a complete fortune to google or 1 of their search engine competitors. In which case I’ve been sitting on it for years because im an idiot and don’t know how to sell my idea.
Or, My other fear is the idea might be considered unlawful business practice. The reason this option is so scary is because if it is considered unfair business practice, there would be know way for Q public to know whether google was already using my proprietary idea behind closed doors. I don’t know who i can talk to about this without someone stealing my idea. Can anyone familiar with business law point me in the right direction as to who should discuss this matter ?
I think your best option is probably to talk with an attorney who has a fair amount of expertise in intellectual property and patent law. Patents are a complex enough field that attorneys who specialize in them are allowed to advertise that specialization.
Will Google only notify some major sites about the search queries that requires more content or is it going to be available for everyone?
It would be better if Google places a small message on popular search queries with less content requesting users to create more content about it. I think everyone should get the chance to create content not just few websites.
The patent provides a number of possible alternatives, but it doesn’t say that it would limit the availability of this kind of information to only a select few sites or individuals. I like the approach of including a message like that on search queries as well, but I wouldn’t be against being able to search for unserved or underserved query terms as well.
This seems like google are trying to solve a problem which isn’t really a problem because in this day and age if a website/page doesn’t exist for a search then it could only be because it isn’t needed.
@Craig, it’s not a case of it just not existing but also that it would seem that there is insufficient or possibly in inadequate information on a subject.
Personally to think that Google has all the useful information (or that all useful information is already known and documented by mankind at all) is a rather ignorant standpoint. What about learning new information, finding new elements, new discoveries, new… new… new? By your reasoning we’ve reached total enlightenment?
Solve a problem? I wouldn’t say so, but rather help speed the method of identifying, recording and returning information.
But just my take on it 🙂
Fair enough i never really thought of it like that. Thanks for helping me understand it better
Hi Craig and Robert,
Sometimes when you search for something, and you look through the results, you may not find any that are a good fit. That’s not necessarily because the search engines aren’t doing a good job, but rather that there really just isn’t anything on the Web (that’s crawlable and indexable) that is a good result for your query.
Being able to indentify queries that people do actually search for, but don’t find adequate results for is an opportunity for people who are interested in writing about those topics, or selling goods or services related to them, but might not know that there’s a specific informational demand for results based upon those queries. I’d definitely like to see Google make this kind of information available to the public at large.
Google is not always to blame for bad results, and i think they are headed in the right direction in understanding the content and improving the results.
google is great if they do provide un-popular results as well as popular results to us.
I totally agree with what Anthony said, Google’s main aim is to provide information,
so it’s just but normal for them to find ways to provide quality, updated and reliable information to searchers around the world.
@Tommy: Where is the login in the provision of “unpopular” results?
There are plenty of factors that allow Google to determine the popularity of any given web page and even when accounting for the naturally bigger exposure the larger brands will receive, it is still possible to develop the algorithm that ranks pages accordingly.
If a page is unpopular, it is so for a reason. It doesn’t fall within Google’s best interests nor anyone elses for that matter to place any real weight in this content. Unless I misunderstood your point?
Correction: I meant ‘logic’ not ‘login’ in the first line of my comment above.
I agree with you that sometimes there just aren’t very good results for some queries to be found on the Web.
They do seem to be finding new ways on a regular basis to try to improve quality of the results that they show.
Good question. I think sometimes some search results are ones that might present a biased or unpopular view, but still rank highly based upon both relevance and link popularity signals. It’s possible that Tommy might have been pointed that out.
Comments are closed.