How a Search Engine might Weigh the Relevance of Anchor Text Differently

One of the things that’s clear about how search engines work is that when they find a link pointing to a page using certain anchor text, that page might be seen to be a little more relevant for the text found in that link. Google pointed that out in one of the earliest white papers about how the search engine works:

This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents. We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

But one of the assumptions that many make is that each link, with its anchor text, is equally as important as any other link and that if a page has lots of links pointing to it with certain anchor text included in those links that it will rank more highly for the terms found in that text than it otherwise might in the absence of all those links.

Continue reading

Share

How Google Might Enable You to Translate Your Webpages Through a Proxy Server

In July, Google launched a beta version of their Page Speed Service, which collects content from your pages on the fly, and republishes it on a proxy, rewritten in a manner that should provide faster pages. Search Engine Watch wrote about this proxy service on July 29, 2011, and Adam Hopkinson in the comments points to details about the configuration page one sets up to control this service. The service appears to be one that Google will charge for once the beta is over.

What if Google also offered the ability to do other things through that proxy service such as offer localization of those pages, with the ability to set up translations of text on the page in different languages, or to change logos and other images for viewers from certain locations?

Imagine that rather than using machine translation, you could edit the proxy versions of your page through a browser, like the service from Israel that NETMASK Internet Technologies provides with their Netmask.IT! tool. That tool can work on webpages as well as on software products. Customers of Netmask Internet Technologies in the past, for at least the software localization that they offer, have included Siemens, Compaq, IBM, Data General, Sun, Oracle, Motorola, HP, and SGI.

Continue reading

Share

Google Acquires Non-Strategic Mosaid Technologies Patents

It’s a tangled web when it comes to patent acqusitions and infringement lawsuits, and a scorecard isn’t enough to keep track of the all of the businesses involved and the nature of their relationships with one another, as a recent acquisition of patents by Google from Mosaid Technologies illustrates.

In mid-September it was reported that Mosaid Technologies sold $11 million worth of patents to an undisclosed technology company. The purchased involved 5 patent families that Mosaid considered non-strategic patents because there were no licensing deals in place on the patents. It was supposedly the largest sale of intellectual property by the company since they started licensing technology patents around 4 years ago.

According to the US Patent and Trademark database, Google was recently assigned a number of patents from Mosaid Technologies. The assignment was executed on September 9th, and recorded on September 20th. The patents acquired by Google sound similar to those described in the news report I linked to above, involving technologies covering flash memory, encoding data, data compression, database search, and encryption approaches. Originally, a number of these patents appear to have originated at companies other than Mosaid Technologies, including Micron Quantum Devices, Inc., Purple Ray, Inc., Integrated Silicon Solution, Inc., and Chrysalis-ITS, Inc. (which was acquired by Safenet).

If you’ve heard about Mosaid Technologies recently, it may have been because they’ve been involved in some newsworthy events. For example, they acquired Core Wireless Licensing S.a.r.l. at the start of September, which holds approximately 2,000 patents originally filed by Nokia:

Continue reading

Share

Facebook Patent Application Describes Receiving Data from Logged-Out Users to Target Ads

Is Facebook targeting conventional and social ads to the social network’s users and their connections, based upon visits to pages outside of Facebook that show Facebook widgets or use Facebook tracking pixels, while the Facebook users are logged out of Facebook?

On Sunday, Australian tech developer Nik Cubrilovic wrote a post titled, Logging Out of Facebook is Not Enough, which describes how cookies from Facebook are sent to Facebook everytime someone visits a page that contains a Facebook widget of some type, even after that person logs out of Facebook.

A Facebook engineer wrote the first comment to the post, explaining that the cookies in question are there for safety and security purposes, to provide customizations to users, and to help Facebook maintain and optimize their services. He notes that Facebook has no interest in tracking people, and, “We don’t have an ad network and we don’t sell people’s information.”

The Wall Street Journal picked up on the story yesterday, and did some exploring of their own, including contacting Facebook, who responded with a interesting statement.

Continue reading

Share

Search Engine Archeology

I read a novel not long ago, Rainbow’s End by Vernor Vinge, that suggested that in the future one of the most popular technology positions would be that of software archeologist, with development and programming skills capable of digging through many lines of code to decipher where they originated and how they might work with other kludge within a program to interact in meaningful ways. It made me wonder how important it would be to have a sense of the history of the growth and development of the Web.

A trip to the US Library of Congress Photographs website showed me a little of the local history of my region that I didn’t know much about, including the existence of a resort I hadn’t heard of before about five miles from where I live that could house more than a thousand people, and which had been the vacation spot of Presidents, Senators, Supreme Court Justices, and more.

A postcard showing the Fauquier Virginia White Sulpher Springs resort in its heyday.

Continue reading

Share

How a Search Engine May Measure the Quality of Its Search Results

When you try to gauge how effective your website is, you may decide upon certain metrics to measure its impact. Those may differ based upon the objectives of your pages, but could include things like how many orders you receive for products you might offer, how many phone calls you receive inquiring about your services, how many people signup for newsletters or subscribe to your RSS or click upon ads on your pages. They could include whether people link to your pages, or tweet or +1 articles or blog posts that you’ve published. You may start looking at things like bounce rates on pages that have calls to action intended to have people click upon other links on that page. You could consider how long people tend to stay upon your pages. There are a range of things you could look at and measure (and take action upon) to determine how effective your site might be.

A search engine is no different in that the people who run it want to know how effective their site is. A patent granted to Yahoo today explores how the search engine might evaluate pages ranking in search results for different queries, and looks at a range of possible measurements that it might use. While this patent is from Yahoo, expect that Google and Bing are doing some similar things. And while Bing is providing search data for Yahoo, that doesn’t mean that Yahoo’s results might not be presented and formatted differently than Bing’s results, and include additional or different content as well. As a matter of fact, Yahoo recently updated its search results pages.

One of the problems or issues that you might run into when attempting to see how well your site works is determining how well the metrics you’ve chosen to measure that might work. A problem that plagues large sites is that they are so large that it can be difficult to determine which metrics work best. Yahoo’s approach uses a machine learning approach to determining the effectiveness of different “search success” metrics.

Continue reading

Share

Google and IBM do it again: Google Acquires over 1,000 Patents from IBM in August

In what feels like a case of deja vu, Google has recorded the acquisition of at least 1,022 patents from International Business Machines in August of this year (there’s a 1,023rd patent listed in the USTPO assignment database as well, but the patent number appears to be wrong). The USPTO recording date for the transaction is September 13, 2011, and the execution date on the document is August 17th, 2011. I wrote about a previous acquisition of patents by Google from IBM earlier this year in the post, Google Acquires Over 1,000 IBM Patents in July

As I noted in the earlier post, Google had lost a bidding match earlier this year for more than 6,000 patent filings from Nortel to a collective formed of Apple, Microsoft, Research in Motion, Ericsson, Sony, and EMC. That doesn’t seem to have stopped them from dipping from the same well a second time to acquire more intellectual property from IBM. Google has also recently acquired a very large number of patents from the purchase of Motorola Mobility.

Last week, Google sold nine patents to HTC to help them pursue a patent infringement case against Apple. Some of the patents that were transferred to HTC were acquired by Google last year when Google purchased them from Myriad Group, which I wrote about in December.

Continue reading

Share

How a Search Engine Might Use Statistics to Identify New Ranking Features

I may have been a little unusual as an English major in my college days. I remember one professor asking me what I found interesting about a particular author we were studying, and my answer was about patterns involving the language that he used, and how he tended to frequently use certain words that were no longer much in fashion these days. He asked for an example, and I pointed out the use of the word “singular.” I could tell that he found my point a little odd, and I wish that the Google Books N-Gram Viewer was around back in those days to back up my statement . As a side note, I wish I could have taken a class or two with HITS algorithm inventor Jon Kleinberg, who probably would have appreciated my response.

I point that out because I recall some unusual phrasings by search engineers at a large search conference I attended a few years back where most of the search marketers were using the term “ranking factors,” and all of the search engineers who gave presentations and participated in question and answer sessions instead used the term “signals.” I wasn’t the only one who noticed the phrasing, and someone called one of the search engine representatives on his use of the term, upon which a Google representative responded, and was seconded by the Yahoo and Microsoft reps, that they preferred to use the term “signal” instead of “factor.”

Much like in my college days, I find myself a little obsessed with the language used in the search patents I read. If Google would point their N-Gram viewer at the USTPO’s database of patents, that would be a great thing. There are a few terms that I keep on seeing spring up in some Google patents that I’ve been finding pretty interesting lately.

Continue reading

Share

Getting Information about Search and SEO Directly from the Search Engines