I’ve been faced with a pretty difficult decision, choosing the last of the patents, or patent families to include in this series of posts about the most important search-related patents to people who promote sites on the Web. I find I just can’t choose one.
Synonyms
For the last few weeks, I’ve been arguing with myself over a choice of at least two sets of patents. One patent that I wanted to include involved responding to informational needs by going beyond matching keywords to expand the query terms used in search results to include synonyms and pages on related concepts. There are several related patents granted to Google that describe how the search engine might identify synonyms, and it’s worth spending some time with all of them.
- Search queries improved based on query semantic information
- Identifying a synonym with N-gram agreement for a query phrase
- Determining query term synonyms within query context
- Identifying common co-occurring elements in lists
- Longest-common-subsequence detection for common synonyms
- Document-based synonym generation
- Machine Translation for Query Expansion
Large Data Sets
Another patent, or in reality a group of patents that I kept on coming back to is a set that focuses upon “large data sets.” Yes, they use the phrase multiple times in each patent, as well as in titles for those patents. And when they write “large,” they mean really really big.
A couple of years back, a search engineer from one of the top commercial search engines shared with me the thought that the search engines collect so much data about how people search, use search engines, and browse the Web, that their difficulty wasn’t so much gathering the data, but rather figuring how to use it.
If you’ve purchased books at Amazon.com, you’ve likely experienced their recommendation engine which shares with you books that people who viewed or purchased some of the same books as you might have viewed or purchased.
Imagine a search engine building a model that might look at a combination of data from and about users, queries they used, and documents they might or might not have selected. Each of these combinations is referred to as an Γ’β¬Εinstance. An instance is a “triple” of data: (u, q, d), where u is user information, q is query data from the user, and d is document information relating to pages returned from the query data.
This model would be a prediction model that would rank pages based upon the likelihood that a particular page or another kind of document would be selected by a particular search at a particular time and day from a certain location.
I did cover a couple of those patents in a fair amount of detail in my post, Google and Large Scale Data Models Like Panda. Rather than going over those again, I’d recommend visiting that post.
Why I look at Patents
So instead of recommending one last patent or set of patents, I’d rather use this last post to point out the reasons why I spend a lot of time looking at search-related patents:
1. Search related patents provide insights into many assumptions that search engines and search engineers hold about search, searchers, and the Web.
2. They sometimes predict and provide a preview of things that the search engines might launch.
3. They sometimes give us previously unknown details about things that the search engines have been doing.
4. Search related patents describe some of the research conducted by people at search engines, even if the methods and processes behind some of them may not have been implemented.
5. We get a glimpse of search engines as businesses, their desires to improve the quality of services they provide, their methods for measuring and testing what they do, and the different directions they might take the things they offer into.
6. Search patents offer the possibility of raising questions worth asking and experimenting with about how search works or might work.
I spend a few hours every week looking for the holy grail of search, the patent that explains the latest and greatest shifts and changes to the algorithms that power how Google and Yahoo and Bing work. Most patents don’t rise to that level, but instead offer tantalizing hints of a jagged bigger picture, like picture puzzle pieces that don’t necessarily always fit together.
The search engines don’t patent every idea that they have, and some patents may even mislead by their very existence, pointing down paths that the search engines may never follow. I’m often left with more questions than answers when reading through a patent, or at least some of the best of them. Those are the patents that force me to ask myself things like:
- What would it mean if the search engines did this?
- How could I tell that the search engines aren’t doing this?
- What might the search engines be doing instead?
- How might people attempt to abuse this method?
- Does the technology exist to do this yet?
You don’t have to read through lots and lots of patents to do SEO. But I find it helps me.
I hope you have enjoyed this series. Thanks for reading.
All parts of the 10 Most Important SEO Patents series:
Part 1 – The Original PageRank Patent Application
Part 2 – The Original Historical Data Patent Filing and its Children
Part 3 – Classifying Web Blocks with Linguistic Features
Part 4 – PageRank Meets the Reasonable Surfer
Part 5 – Phrase Based Indexing
Part 6 – Named Entity Detection in Queries
Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables
Part 8 – Assigning Geographic Relevance to Web Pages
Part 9 – From Ten Blue Links to Blended and Universal Search
Part 10 – Just the Beginning
Great post. I like this series alot, even though im a small business owner this helps me understand google a bit more π
Bill,
You continue to out-do yourself. Thanks, yet again, for your hard work.
Thank you for taking the time to put this information together. This really helps small business owners understand Google’s thinking and where they might be headed and allows us to stay one step ahead.
And here I was just listening to your Terry Van Horne interview on SEO Dojo when my RSS aggregator pinged me to let me know you had a new post π
Thanks Bill! If I ever meet you I will surely buy you a drink for all your unique efforts in the SEO space.
Why you look at SEO patents #7 – so we don’t have to! π
Fascinating going through Larry’s original PageRank patent…
All of your posts are quite insightful, different and ahead of the game when it comes to SEO. Although, I wouldn’t even call it SEO. It’s like watching a researcher examine the business of search engines and computing analytics behind “the algorithm”. Thanks for sharing this…because it brings light to what Google is planning to do for their revolutionary big change this year.
If nothing else, Patents may bring value to someone else, thus increasing the company’s worth. Looking forward to reading through the series.
Thanks Bill,
Your “10 Most Important SEO Patents series” was rocking. All posted patents explore many unexpected & unbelievable things.
This is such a rock-star series. Well done!
Thanks for this series, it is an absolute goldmine.
Great series Bill. I like checking out patent applications because usually, buried 2/3 of the way through, is where the patent attorney must have explained to the engineers “by law you have to disclose the ‘best mode’ of implementation you’re aware of”. Then they sort of try to bury the best nugget in a half-sentence in the middle of some boring stuff so it’s not too obvious – it’s exciting to find that one tiny nugget, if you can stand reading through the really boring parts.
It’s amazing to me that Press Releases go through all kinds of review processes in companies, but most companies only bother to have the inventors and a patent attorney review patent applications in full – sometimes really neat insights into key technology can make their way out into the wild this way.
Tons of great insights from this series. I especially enjoyed your analysis of large scale data sets such as Panda.
Thanks for staying on top of all of the patents and passing on your findings to the rest of us!
Just read the whole serie of posts. Man, it’s good content – thanks for sharing π
Love this series. It’s nice to read the actual patent language to confirm some of the things we have known or suspected. I always like to “read between the lines” so to speak. Often times you can learn as much by “how” things are said as much as “what” things are said.
Thanks for your hard work.
Hi Estes,
Thank you. Part of the fun for me of putting this series together was the struggle to decide what to include and what not to include. I’ve found that there are probably some other patents that I should have paid more attention to in the past that I’m going to have a chance to cover in future blog posts. Hopefully those will be just as helpful. π
Hi Camilla,
Thank you. I’m happy to hear that you’re enjoying this series as a small business owner, rather than a full time SEO or internet marketing. Sometimes making search related patents accessible to an audience who doesn’t live and breathe search and SEO can be challenging.
Hi Gyi,
Thank you and the frequent comments and encouragement that you often provide.
Hi Brent,
Thanks. I don’t get enough of a chance to have some in depth conversations with other SEOs, so the chat with Dave and Terry was a lot of fun. Would be a pleasure to get a chance to meet you in person.
Hi Tom,
You can take my posts that way, but I do actually love it when people dig into the patents that I write about and possibly come up with things that I might not have included in a post. π
I really liked the more human side of the original PageRank patent, too.
Hi Susan,
Thank you. I guess there is an element of business analysis in what I try to do here when I write about patents or whitepapers from the search engines.
I think in many ways that’s something that we might be able to learn as much or more from than the specific methods or processes that a patent or paper might describe, since it can give us some possible ideas about things that we might see in the future.
Hi Thomas
Yes, patents are a way of taking intellectual property and preserving and protecting it, and making it tangible in someway, as an asset of a company. Thanks.
Hi Rajesh,
Thank you very much. I find myself surprised on a regular basis at many of the things I come across in these patents.
Hi Yousaf,
Thank you.
Hi Juanne,
Thank you for the compliment. Not really going for the “rock star” thing, but I appreciate the thought. π
Hi Ted,
Thanks.
One of my favorite patent filings is one in which Google described how they might use Visual Gap Segmentation to improve how they might take a page filled with reviews for different places, and isolate them so that they could be used for each of those places. And hidden in a little paragraph near the end was a statement that Google might use the segmentation process to distinguish headings from footers from sidebars from main content areas, and do things like give more weight to links and content found in certain areas over links and content in other areas.
I’ve wondered how much difference there is in the review process that patents do go through at different companies. And if you look at patents long enough, you can see certain patterns in how they might have been constructed.
For example, some of the patents that come from some of the older research scientists at Yahoo read like they were first written as research papers, and then processed through a search engineer/patent attorney team. Google’s historical data patent reads a little like it was copied off a whiteboard during a mind mapping session between a large group of search engineers brainstorming on ways that they might use time based data to limit search results filled with spam and stale data.
The processes in place at many companies do seem to limit some review of what should and shouldn’t be included in a patent, and I think think that kind of extended review process could potentially harm the likelihood that a patent application might be granted. I do love finding the kinds of nuggets of information that you mention.
Hi Mark,
Thank you. The large scale data patents are fascinating, and I find myself wondering how much technology had to be put into place to be able to implement the ideas within those. It’s pretty clear that Google spent a lot of time and effort in working on things like the Google File System and MapReduce and other technologies to even have a chance to manage data on that large a scale, and make something useful of it.
Sharing means creating the opportunity to discuss the kinds of things I sometimes uncover, and grow from those discussions, so thank you back.
Hi Nikolaj
Thank you very much. Happy to hear that you enjoyed the series.
Hi Chris,
Thank you. I find myself getting excited when I come across patents that describe things I’ve observed search engines doing, but didn’t know much about those from the search engineers perspectives, such as their reasons for taking certain approaches, or their names for some of the features we see at the search engines or the processes tht they might be following.
And it is really interesting seeing how those ideas might be presented, and trying to understand the reasons for why they are presented the way that they are. Are they purposefully limiting some of their discussions about certain algorithms to protect those, or to limit privacy concerns, or because there are parts of systems in place that haven’t been built yet. That does make things pretty interesting.
i love looking at patents by some of the search engine companies, that gives a good insight into what might become the future of SEO π
Hi Bill, thanks for everything you do for the seo community. Actually that was pretty useful and, well, honestly i’ve never cared about patents in my life but after reading that i will surely pay more attention to them.
I am always looking for new or different SEO tactics, and I greatly appreciate your knowledge and even going one step further and listing different patents. We joke in the office, but I am sure you can understand. Every time Google has an algorithm release, they put out the release and we then work on figuring out the exact changes. Just when you get a handle on them, or so we think, another release comes out. This blog we wrote is a little outdated, but it is still relevant. http://bit.ly/HRJnhd
Hi Andreas
Same here. We sometimes see whitepapers from the search engines as well that they might present at different computing conferences, but a lot of the papers that they do write are for internal use only. Would love to see some of those. π
Hi Danilo,
Really appreciate your kind words. Patents can be tough to slog through sometimes, but every so often you run into one that makes you think about a lot of the things you do very differently. It’s worth all the work when you do.
Hi Justin,
Nice. Your post is like a history lesson of the changes that Google has gone through over time. Thanks for sharing it.
Google also doesn’t always tell us when they are trying something new, and sometimes you see something in a set of search results that you haven’t seen before. Constant change is one of the things that makes SEO interesting.
I just spend 1h with English dictionary to read all stuff here. You are sharing very valuable information that i couldn’t find in such great condensed form in my language. I`m in half of SEO patent series..its gonna be thought night. I really appreciate your work, trying to follow algorithm its not easy.
Thanks
Thank you, Lukas.