Compound Words in Search Advertising and SEO

What’s the difference between icecream and ice cream, or paperclips and paper clips? How about sandpaper and sand paper, or thumbtack and thumb tack? A compound word comes about when two words can be joined together to form a new word.

In my examples, the meanings of each pair of two words joined together is the same as those two words as phrases.

When someone searches for icecream at Google, should they see the same search results for ice cream or icecream, given that the words mean the same thing? If a page about paperclips shows adsense advertisements, should the ads be for paper clips and paperclips? If an adwords advertiser runs adwords advertisements to show during search results, should their ads run both when someone searches for sandpaper and sand paper?

A few years back, I wrote about a patent filing that explored ways that Google might handle compound and hyphenated words and other spellings of words, in my post SEO and Compound Words, Inflections, and Alternative Spellings. That patent filing gave us a pretty high level overview of how it might treat compounds, but didn’t delve too deeply into the details.

Last week, another patent filing from Google came out on compound words, focusing primarily on describing how it might be used for search advertisements that it shows on web pages and during search results.

While it uses examples from Google search advertising, it gives us a better sense of how it might see icecream and ice cream as pretty much meaning the same thing. Some patent filings are narrow in scope, but near the end of this one, we are told that the way Google handles compounds might apply to more than just ads:

Although the above description refers to a content item such as an advertisement, content items such as video and/or audio files, web pages for particular subjects, news articles, etc. can also be used. Also, the implementations can be used with other compound words such as for example, Finnish, as well as other languages that include compound words.

Furthermore, while the above description refers to online advertisements, the implementation described can also be used with other possible applications such as, for example, machine translation, speech recognition, information retrieval, etc.

The patent application is:

Word Decompounder
Invented by Enrique Alfonseca and Stefan H. Pharies
Assigned to Google
US Patent Application 20090063462
Published March 5, 2009
Filed: September 4, 2007

Some example languages that use compound words include: Afrikaans, Danish, Dutch-Flemish, English, Faroese, Frisian, High German, Gutnish, Icelandic, Low German, Norwegian, Swedish, and Yiddish.

Google’s advertisements might take compound words from a page that an ad might be shown upon, or from a search query, and split them to show ads that might be relevant to one or more of the words that make up the compound word as keywords for advertisements.

Word Splitting

In exploring whether a word is a compound word, and in trying to find advertising keywords that are appropriate for the compound, Google might take the word and split it apart into strings of letters, add some possibly meaningful parts of words to those strings, and see how often those strings and the strings with the additions might show up together on the web or in search queries or advertising campaigns as phrases.

To start, Google might take the word “icecream” and break it up into strings of letters, such as:

i, ic, ice, icec, icecr, icecre, icecrea, icecream, c, ce, cec, cecr, cecrea, cecream, e, ec, ecr, ecre, ecrea, ecream, c, cr, cre, crea, cream, r, re, rea, ream, e, ea, eam, a, am

Some commonly used semantically meaningful parts of words, or morphemes, might be added to those strings to expand the possible meanings that those strings might have. For example in English, one common morpheme that is often added to the ends of words is the letter “s” which can transform many words from a singular version to a plural – so that the words “ices” and creams” might also be part of the analysis for the compound word “icecream.”

Using Query Logs to find Scores for Strings

I’ve generalized how scoring might work for different strings in the following section, but I think it provides a helpful overview.

Google might take a look at the query logs used to serve advertisements to searchers, and see how often those terms such as ice, ices, cream, creams, ream, and reams show up.

Scores might then be created for each of those strings based upon that frequently.

So, for the strings created for icecream, we might see the following frequencies of appearance in advertising logs:

ice (32)
ices (6)
cream (32)
creams (6)
ream (6)
reams (18)

If the frequency of cream is 32, and the total frequency of all the keywords is 100, then the probablility that the substring “cream” appears as a query keyword is 32 percent.

Now some strings, such as the words “ream” and “reams” have a lot more to do with paper than “icecream.” So the search engine may also pay attention to how often strings show up together, or co-occur. Since “ice” and “cream” tend to co-occur very frequently, that combination might be given a pretty good score.

Anchor Text

The search engine might also look at the anchor text in links that appear on the Web in giving scores to these strings, and their combinations.

For example, if pages about icecream have the anchor text “icecream” and “ice cream” links pointing to those pages, then that may also be considered in a score for those strings. As the patent filing tells us:

Anchor text is the text that appears in a hyperlink on the web. If, for example, two web pages have a hyperlink to the same document, then the texts in those hyperlinks can be related, because the anchor texts usually describes the place where a user is directed if the user clicks on the link. Therefore, if the anchor text of a hyperlink to a web page contains the substring “kontrollfunktion,” and in the anchor text of a hyperlink to the same web page exists “kontrolle funktion,” then a good indication exists that all substrings, e.g., “kontrollfunktion,” and “kontrolle funktion” are the same, written as a compound or separately.

Advertising Keywords

The search engine might also get some clues from advertisers, and the terms that they bid upon for a specific ad. If someone selling icecream on the web using Google Adwords chooses “icecream” and “ice cream” as keywords for the same ad, that also indicates that the compound word and the phrase are related, that “icecream” is probably a compound, and that “ice cream” is the correct way to split it.

Outside Sources

In addition to looking at queries related to advertisements, at the use of words in anchor text, and in the words that advertisers bid upon, Google might also look at outside sources to see how words might be split apart, or if they even should be split.

These sources could include such things as:

Dictionaries from different languages,
Lists of locations,
Proper nouns for people, including first names and family names and organizations and trademarks, and;
Suffixes of words

These sources might contain words that shouldn’t be split. The patent filing gives the example of German place names which end with “strasse” or “dorf,” and which are proper nouns that shouldn’t be “decompounded” or split into substrings.

Compounds and Decompounded Phrases in Search Advertising and Search Results.

Should it make a difference in what pages you see in search results at Google if you search with the compound version of a word, or the two word phrase that joins together to make up the compound?

Should it make a difference in which advertisements that you see on the pages of publishers who show adsense ads? Or in the ads that you see above and to the right of search results?

In some cases, maybe.

For example, when a movie or book or place uses one version, and a searcher is looking for information about that film or publication or location. If you had a movie titled “Paper Clips” and the phrase “Paperclips” attracted a lot more searches, you wouldn’t refer to the movie as “Paperclips” on your web site about the movie, would you?

If you were an advertiser, and you used the phrase “paperclips” as an advertising keyword, would you want Google to show your ad in search results for the phrase “paper clips” or on pages about “paper clips” if you didn’t include the two word version as an advertising keyword?

If you show adsense on the pages of your web site, and you write a page or post about icecream would it make a difference to you if Google showed ads for both “ice cream” and “icecream” on that page or post?

If you look at the Google Adwords Keyword Tool to see how frequently people search for different phrases, you might see the following search volumes for the compounds I started this post with, and their decompounded versions:

  • icecream (average search volume – 3,350,000)
  • ice cream (average search volume – 246,000)
  • paperclips (average search volume – 49,500)
  • paper clips (average search volume – 40,500)
  • sandpaper (average search volume – 74,000)
  • sand paper (average search volume – 14,800)
  • thumbtack (average search volume – 6,600)
  • thumb tack (average search volume – 2,900)

Chances are that even though the search volumes for those pairs show some significant differences, people performing those searches are likely looking for very similar results. The version that you use on your pages, or in your advertisements might mean a big difference in how many people see your pages or your ads.

Right now, there isn’t much overlap in the sponsored results (ads) that show up at Google for the different versions of my chosen pairs.

Does Google follow this compound word or decompound process for search queries? Are compounds and decompounded words that mean the same thing considered equally relevant for the other version? If so, should the search results shown for each be the same?

In searches for the compound and the two word versions of the four pairs listed above at Google, there is overlap of the search results that are shown for each version, and both compound and two word phrase results show up in the search results for each. But the results don’t match exactly.

It’s possible that they are considered very relevant, but not equally so.

For instance, the title of the top result for a search for “icecream” is “Ben & Jerry’s Homemade Ice Cream,” and the phrase “Ice Cream” in that title is bolded. According to a granted Google patent, Systems and methods for highlighting search results , terms are highlighted, or bolded, in search results to help show searchers that the results are relevant for their query. By itself, that isn’t an indication that Google is using this decompounding approach, but it is one piece of evidence that could be considered amongst others.

Share

19 thoughts on “Compound Words in Search Advertising and SEO”

  1. Pingback: TaT: …
  2. It is very interesting to see the difference between Compound and Decompounded Phrases. It makes you take a step back and really give some thought to which words you are deciding to focus on for your SEO campaign.

  3. In one of my campaigns we have noticed that the search frequency and results for singular and plural search strings are different to a major degree. Always be sure to check this before you go ahead. One of the searches may only carry 10,000 Avg. Search volume, the other may have 100,000+.

    -Dk

  4. I have noticed this in practice and experimentation. There is definitely a correlation although not exact.

  5. This issue of compound and/or decompounded phrases is an enormous fly in the ointment for Google (and all the other search engines) because of the skewed results that parsing in this fashion can produce. An example might be the phrase: dog town -or- dogtown. How does Google “understand” what it is you are looking for?

    Are you searching for: Stevie’s Dog Town, a dog shampooing service on the outskirts of town, or are you doing historical research on a circa 1700 neighborhood in Gloucester, MA, or perhaps you are looking for clips of Heath Ledger in the film, Dogtown?

    In what order do you return the results if you are Google and weighted according to what algorithm?

    This is why the Narrow Silos and the Deep Content of a quality SEO campaign are so important for a website. In my opinion, this is the underpinning of good strategy, regardless of how Google peels their onion.

    —SB

  6. This is an interesting topic. It’s one of those often overlooked things when trying to optimize a page for keywords. I can imagine just how complex this issue is on Google’s end.

  7. That is an interesting topic and a good spot seeing the latest patent. One thing which is also worth noting is that there is a small but significant proportion of people that don’t even enter spaces between words, often when searching using product names. For these searches Google certainly wants to use the decompounding approach.

  8. Its strange , I read so many article on various webmaster froums about SEO but never read anywhere else about compound words .Thanks for the detailed description

  9. I had not thought that Google could use their logs in such way. So you think they are doing this kind of thing on-the-fly or is there some manual intervention. There is a good chance they could make some howlers if it was all auto.

    As you point out German would be a difficult one. For example you have “Linktauschsystemen”, which means ‘link exchange system'; but you can also have “Linktausch systemen”, “Link tausch systemen”. There is also “Linkaustauschvorschlag”, which means ‘to propose a link exchange’. I can’t image what it would be like to manage this kind of thing in all the different languages. Finnish is another one, this: “linkkivaihtojärjestelmääni” means link exchange directory. Can also be broken up.

  10. Hi Derek,

    That’s a very good point. The difference between using a singular and a plural version of a word can go beyond big differences in search volume as well. For example, when you write about “baseball,” you might be discussing the sport or a baseball. When you write about “baseballs,” you’re less likely creating a page about the sport, and more likely discussing baseballs themselves.

    I did write a previous post about singular and plural terms, which you might find interesting:

    How a Search Engine Might Handle Singular and Plural Queries

    Sometimes a search engine will provide results that return both singular and plural versions of a query term. Makes things interesting, no?

  11. Hi Agent SEO,

    The differences are definitely worth exploring, especially if the compound, or multiple word phrase is one that is one of the most important terms for a site. I’ve seen some very large differences in search volumes based upon choosing one version or another.

  12. Hi Promotif,

    I’ve seen an interesting mix of both compounds and decompounded phrases in search results as well. Some of the interesting points that were raised in the patent filing was how a search engine might associate the different versions based upon things like how often both versions appear in anchor text from different links, pointing at the same page, or how often both versions were used in the same advertising campaigns as advertising keywords for the same ad.

  13. Hi SOCIALAMIGO,

    You raise a very good point. It is something that is addressed in the patent filing, where they discuss when compound words shouldn’t be decompounded because the term might be a proper noun, such as a person’s name or a place name or a trademarked term.

    It makes sense for a search engine to consider both compounds and decompounded versions of those, but they need to do it with care, and a concern that someone may be looking for a person or place or thing that is unique. It’s possible that, like spell correction, some of the decision making as to how much weight to give to one of these special compounds or decompounded terms might rely upon looking at user behavior in things such as search results selections.

    I think exploring how terms are treated in search results is also very important, and as essential as looking at things like search volumes for terms, and at how competitive those terms might appear to be.

  14. Hi pays to live green,

    Good points. Considering the differences between compound and decompounded versions of query terms or keywords is something I think people do sometimes miss when researching keywords. If the patent filing only shows us enough to get an idea of what they might be doing, and it probably does, the process that they may follow probably is more complex than what we are being told.

  15. Hi marketingminefield,

    Thanks. You raise a very good addition to this topic. I was considering some of the ways that decompounding might be used elsewhere. People performing searches where they join two words together for products, or even break a compound name apart is one. Breaking up domain names was another area that I considered. The processes described in this patent filing could be used by the search engine for much more than just the advertising examples that they described in some detail. Any others?

  16. Hi David,

    Thanks. Those are great questions and points.

    A number of previous patent filings and white papers from Google, and from the other major search engines describe exploring their query logs to learn about the ways that people choose query terms, select web pages from search results, and discover other interesting information about how people use the search engines. A good number of them describe ways that they try to keep the results of such efforts in their use of searching and browsing behavior relevant and meaningful, and attempt to avoid being manipulated or abused. Google’s spelling correction might be one that you’ve seen in action.

    It wasn’t surprising that Google used German and Finnish examples in the patent filing. I imagine that they are pretty challenging when it comes to something like this decompounding process. I also imagine that it’s essential for Google to get a handle on understanding how words in those languages can be joined together and presented apart if the search engine intends to provide search for people using those languages.

  17. Understanding website traffic and visitor clickstream behavior is crucial to managing a website on a daily basis.
    Make daily decisions based on customer behavior. Traffic statistics are a form of direct feedback.

Comments are closed.