Google has had a new patent application published at the US Patent and Trademark Office (USPTO) which provides an expanded view of how it may present real time suggestions for queries when someone is typing words into a search box. At the same time, Google has come under fire, and faces litigation, for their predictive suggestions.
This post takes a quick look at the litigation, the new patent application, some of the additional processes that it uses in filtering and collecting information about queries, and why all this might matter to people who are interested in having their web sites found through Google.
Litigation over Google Query Suggestions
A Belgian software company is pursuing legal proceedings against Google for toolbar suggestions which are pointing to illegal versions of the software that company offers, when someone searches for their name. The case was originally initiated back in February, and appears to be ready to go to trial. It raises some interesting issues involving what happens when a search engine provides suggestions in a tool like Google Suggest, or though a toolbar.
Continue reading Expanding Google Suggest in Legal Dispute
The Official Google Blog has a call out for all Pearl Jam fans, with a link to the video for Life Wasted (no longer available), which will be available until next Wednesday on Google Video for download, or as streaming media.
The video has been released under a Creative Commons license, which allows for it non commercial uses, with attribution to the band. It also can’t be changed or altered to make a deriviative work under the license. If you’re a Pearl Jam fan, you may want to download a copy.
If you’re part of a band and think this is a great idea for getting the word out about your music, and you haven’t looked into Google Video or the Creative Common’s licenses, it might be time to check them out. It could be a good way to let more people know about your music. If you have questions about either, let me know in a comment or by email, and I might be able to answer your questions or point you to somewhere that will. I’d love to see more songs published on Google Video with a creative commons license like Pearl Jam’s.
The Creative Commons pages also point to a number of sites that offer audio and video under their licenses, if you are looking for something to listen to or watch, or if you would like to publish a song or video online.
What would it take for Google to include in its index 100 billion pages?
Could they develop a way for people to search for, and look at older versions of web pages, and also simultaneously improve the quality of their search results? Would indexing words within conceptually related phrases make the search process better?
A recent patent application from Google estimates the web to contain around 200 billion pages, and guesses that the largest index from the major search engines hold around 6-8 billion pages. The document is Multiple index based information retrieval system, US Patent Application 20060106792, which was published May 18, 2006, and originally filed on January 25, 2005.
In addition to providing us with a rough estimate of the size of the web, and the amount of pages indexed by search engines, it also tries to answer the questions I asked at the top of this post.
The inventor listed in the patent filing is Anna Patterson, who has already built a search engine that holds more than 55 billion pages (The Internet Archive). Part of the process described in the document was the subject of a blog post here back in February – Move over pagerank: Google’s looking at phrases?
Continue reading Google Aiming at 100 Billion Pages?
Apple adds something to those songs you’ve been listening to from them, but it’s not music.
Normally, at the time a digital media file is created, there’s information about the content included with the music. This data is embedded in the digital media file’s header section, including such things as copyright information and digital rights management information, as well as title, author, and publisher.
If you’ve been watching the shelves of your local music store, you’ve probably seen enhanced CDs and DVDs, which contain hyperlinks to additional media content, often available on web sites. Apple wants to be able to include additional information, like that, in digital tunes that are downloaded.
This isn’t a problem with streaming media, which could have that kind of information added to it, but many people prefer direct access to the songs so that they can listen or watch when they don’t have access to streaming media.
Continue reading Apple to Embed Ads and Marketing Information in iTunes?
I downloaded the Google Notebook browser extension about twenty minutes ago, and have been trying it out.
In case you didn’t hear about Google Notebook yet, it’s a new tool from Google announced last week during the Google Press Day, but not planned to be released until this week.
The idea behind it is that you can use it to take notes about web pages , and copy snippets from those pages, and keep them in notebooks, which you can keep private, or make accessible to the public. A link to the page where you found the material makes it easy to return to the source of the information.
Notebooks can be organized into sections, and can contain images as well as text. The program can be accessed from more than one computer, which means that the information contained within it is stored by Google rather than on your own computer.
I really like the way that the mini notebook, and the full page notebook work together. As a tool for tracking information on the web, it’s pretty useful. I could see some value in using it as a work tool when looking at a site, and considering rewriting content on the pages of that site. Or in writing notes for a blog post, or article or paper.
Continue reading Google Notebook Released
Trust is essential in our reliance on search engines. But we should understand some of the risks in placing too much trust in search results.
There’s the possibility of bias in what search engines show people based upon the engines’ business practices and operating policies, limitations in indexing and ranking algorithms, and in political and cultural pressures placed upon them.
When I think of conferences like the one to be held next week in Edinburgh, Scotland, during the 15th Annual World Wide Web Conference, I don’t expect to see presentations that are critical of search engines. But, during a workshop on Models of Trust for the Web, there’s a paper being presented that takes a close look at search engine bias, from a couple of researchers at Yuan Ze University in Taiwan.
Position Paper: A Study of Web Search Engine Bias and its Assessment (pdf) by Ing-Xiang Chen and Cheng-Zen Yang
The authors of this paper describe in more detail the three different sources of bias that I mentioned above. How could business practices shape the bias of search engines? Continue reading Trust and the Internet: Search Engine Bias
I’ve been using Google Alerts for the past year or so to stay on top of a handful of topics, and I decided this weekend that it might be worth expanding their use a little more.
So, I added about ten terms that I’m interested in tracking to my alerts list for Google.
And then, I decided that it might be fun to try out Yahoo Alerts also, and compare what the two services provide.
My experience with Google Alerts has been interesting so far. With some news articles, the alerts I’ m sent have been fairly timely. But every so often, I see an alert pointing to a page that’s more than a year old. When I see that, I wonder if Google has just descovered the page, and noticed in some vast database that they hadn’t sent me a copy of it yet.
I haven’t searched to see if someone has tried this already, but it might be fun to keep track of what links I’m provided with, and compare the two alert systems over a period of a few weeks or months. How old are the pages that I receive an alert for? How many links am I provided per term over the length of time, and how many do I receive each day from both search engines?
Continue reading Testing Google and Yahoo Alerts
Trust is a topic that has a profound affect upon the way search engines work on the web.
How easy or difficult is it to come up with methods that don’t rely (much) on human judgment to identify spam free pages that can be trusted, and to locate pages that are intended solely to rank well in search engines without providing any value at all for visitors, except possibly ads that are on the topic of their search?
In a week, there will be a gathering in Edinburgh, Scotland, during the 15th Annual World Wide Web Conference, on the subject of Models of Trust for the Web. While I won’t be attending, it sounds like an interesting presentation, and I wanted to take a look at some of the papers written by presenters at the conference. In this post, I’ll be looking at one of the papers to be presented, and listing some of the other work by its authors.
Problems with Yahoo’s Trustrank Assumptions
Continue reading Trust and the Internet: Web Search Spam