Ok, so why is a blog that usually focuses upon internet marketing and search related patents publishing a post about saving amphibians?
The short answer is that I was asked very nicely, by Jeff Davis of Frog Matters and Amphibian Ark.
The longer answer is addressed by some other folks who are also posting about the Year of the Frog:
Darren Naish, vertebrate palaeontologist, and blogger at Tetrapod Zoology – Get Ready for 2008: Year of the Frog
Continue reading 2008 is the Year of the Frog
Unfortunately, there are web pages that can be harmful to visit. Google researchers discussed the identification of malicious code on web pages earlier this year in The Ghost In The Browser: Analysis of Web-based Malware (pdf).
The paper’s authors tell us that the focus of delivery of harmful code to computer users has shifted from software that someone installs, to software that is delivered directly to a browser via the Web.
Microsoft has also detailed some of the research that they’ve conducted on web-based malware in their Strider HoneyMonkey pages
Search Engines and Malware
Continue reading Search Engine Identification and Filtering of Malicious Web Sites
Choosing the right character set for your web page might mean that it is easier for a search engine to understand what language your page is in, though there are also other ways that it might be able to determine that.
But, what about when someone types in a query?
- How does a search engine know what language a search query might be in?
- How does it handle queries in different languages made on devices that might not be capable of creating some special characters outside of the latin alphabet?
Also, do webpages that use a certain charater set (something that webmasters can choose in their HTML for a page) stand a better chance of having the language that they use be identified more easily by a search engine?
Continue reading How Does a Search Engine Know the Language of A Query? Google Explores Character Mapping
In my last post, I wrote about how Microsoft might use an automated method to identify blogs, and how that method might work.
I wondered why they might be interested in doing that, and received some great comments on the post. One reason I that appealed to me is that a search engine would want to understand better what results it is showing searchers, so that it might be able to provide diverse and even personalized results to people using that search engine.
Of course, I’m concerned about what that might mean to what we see in search results, and how it might act to shape those results.
Another Microsoft patent application published last week, and filed within a couple of days of the application on identifying blog posts, looks at how it might present diverse results to searchers.
Continue reading Reranking Search Results Based Upon Personalization and Diversification
A new Microsoft patent application has some interesting statements within it about blogs. First it tells us of the value of blogs and blogging:
Blogging has grown rapidly on the internet over the last few years. Weblogs, referred to as blogs, span a wide range, from personal journals read by a few people, to niche sites for small communities, to widely popular blogs frequented by millions of visitors, for example.
Collectively, these blogs form a distinct subset of the internet known as blogspace, which is increasingly valuable as a source of information for everyday users.
Then it goes on to tell us that search engines work to limit results from blogs in searches, and the difficulties that search engines sometimes have in identifying blogs:
Continue reading Do Search Engines Hate Blogs? Microsoft Explores an Algorithm to Identify Blog Pages
Ok, my title is a mouthful, but you have to love a patent filing that uses South Park characters in examples. Even if it is a somewhat odd patent filing.
Kyle, Stan, Kenny, and Cartman are programmers who report their daily code production to Mr. Garrison. Kyle creates a node datum 1001 reporting 111 lines.
The node datums are illustrated as using a markup language although any defined data format can be used. Kenny creates a node datum 1003 reporting 141 lines.
Cartman creates a node datum 1004 reporting a massive 214 lines. Cartman also creates a node datum 1002 for Stan reporting 66 lines.
Perhaps Stan should not trust Cartman.
Continue reading The Southpark Google Organizational Information Flow Patent Application
There are often three pieces of information about pages displayed in search results to searchers in response to a search:
- Page title,
- The URL where that page can be found, and;
- A summary of the page in the form of a snippet or snippets, taken from either a meta description tag, or a description of the page from a directory like the DMOZ, or actual text from the page itself.
One mystery involving search engines involves how a snippet might be generated when it is taken from a page.
Continue reading How does Google Pick Search Snippets for Your Pages to Show in Results?