I remember at some point in the past going through many web sites and changing “e-mail” to “email” because it seemed like most of the times I had seen the word lately, people were using it without the hyphen. I also recall struggling sometimes when trying to decide whether to use two words or a compound word when doing keyword research for a site. Or which spelling of a word to use when there is more than one alternative.
I enjoyed a recent post at SEO Igloo, Familiarity breeds compound words in SEO and beyond, which discusses the use of compound words and SEO. I mentioned to the authors that Google had filed a patent application that discusses how they may be trying to handle some of the issues around compounds when they receive them as parts of queries. They told me that they would be interested in learning more about the patent filing.
This isn’t a new patent application, and I wrote about it in July 2005, over at Cre8asite Forums, though it has a title that doesn’t tell you much about what it covers – Systems and methods for improving search quality.
To explain it as simply as possible, the search engine might look at a database of alternative word choices when receiving a search, to see if it should expand and refine words in that query before it searches.
The patent focuses upon examples in German because it isn’t uncommon to create compound words in German. But it makes it clear that the ideas within the patent are equally applicable to other languages:
 Moreover, while many of the examples provided above have been in the context of the German language, it will be appreciated that the techniques that have been described are readily applicable to other languages as well. Each language has its own set of linguistic features that pose problems for search. Thus, to design a search engine for a given language, and/or a general-purpose search engine, an effort can be made to identify these problems and to address them.
I also found it interesting that in addition to looking at how words might be used on the pages of sites, they would also look at the words that people use while searching:
User sessions can also be analyzed to find patterns in users’ searching behavior. For example, users may apply certain transformations to compensate for problematic aspects of the language. Once a set of problem areas are identified, work can be done to generate solutions. Potential solutions can be tested or simulated to determine their effectiveness and the amount of effort needed to implement them.
I have some examples of different types of language that might be covered by this patent application. It’s interesting to see how Google treats them. A search for “Tomatoe” offers me a spelling alternative of “tomato” as in “Did you mean: tomato”. A search for “Colour” returns results for both “color” and “colour”, with the top result using the “color” variation of the word.
“Ice cream” and “icecream” might result in different results and different amounts of results. But they are likely referring to the same dessert. Google offers me a spelling correction of “ice cream” when I search for “icecream.”
Words have different forms, depending upon the context within which they are being used. Plurals in nouns or transformation based upon tense may cause them to be used differently – for instance, speak, spoke, spoken. A search for “spoke word” turns up a lot of “spoken word” results.
I mentioned the different treatments in this alternative spelling above:
You may want to try others, to see how Google may be treating a compound or inflection or alternative spelling. It’s interesting to see how the spelling correction feature of Google seems like it may be tied to this.
The patent application does mention that the two may work together, and points to some other features that may also:
 It will be appreciated that a variety of changes can be made to the systems and methods described above following embodiments of the present invention. For example, the techniques described above can be applied in combination with other techniques, such as spelling correction, synonym and/or related-word expansion, language translation, spam reduction, and/or the like, to further enhance search results.
When you are doing keyword research for a site, if you haven’t been using the keyword and keyword phrases in actual searches to see if a search engine might be expanding the query to cover compound, infections, and alternative spellings, you may want to give it a try.