Ok, so why is a blog that usually focuses upon internet marketing and search related patents publishing a post about saving amphibians?
The short answer is that I was asked very nicely, by Jeff Davis of Frog Matters and Amphibian Ark.
The longer answer is addressed by some other folks who are also posting about the Year of the Frog:
Darren Naish, vertebrate palaeontologist, and blogger at Tetrapod Zoology – Get Ready for 2008: Year of the Frog
Unfortunately, there are web pages that can be harmful to visit. Google researchers discussed the identification of malicious code on web pages earlier this year in The Ghost In The Browser: Analysis of Web-based Malware (pdf).
The paper’s authors tell us that the focus of delivery of harmful code to computer users has shifted from software that someone installs, to software that is delivered directly to a browser via the Web.
Microsoft has also detailed some of the research that they’ve conducted on web-based malware in their Strider HoneyMonkey pages
Search Engines and Malware
I rarely write about the search marketing industry here, focusing mostly instead upon search related patents and papers, and an occassional event. There are some happenings and posts that I do want to point out though.
I was asked to be a judge in the semi-finals of 2007 Rubber Chicken Award for Humor in the SEM industry, and the winner is chosen by you as the voters. There were some very funny posts made over the course of the year, and unfortunately only 10 finalists were chosen amongst those nominated. I think the winners are the folks who get to read all ten, and I’d like to thank all the nominees for making the decision of who the finalist were so difficult.
Search Engine Journal is holding its third annual awards, the Search Blogs Awards of 2007, and I would like to thank the many folks who nominated SEO by the Sea in a couple of categories, and me in another. SEO by the Sea was nominated for:
Best SEO Blog of 2007
Best Search Engine Research Blog
Choosing the right character set for your web page might mean that it is easier for a search engine to understand what language your page is in, though there are also other ways that it might be able to determine that.
But, what about when someone types in a query?
- How does a search engine know what language a search query might be in?
- How does it handle queries in different languages made on devices that might not be capable of creating some special characters outside of the latin alphabet?
Also, do webpages that use a certain charater set (something that webmasters can choose in their HTML for a page) stand a better chance of having the language that they use be identified more easily by a search engine?
In my last post, I wrote about how Microsoft might use an automated method to identify blogs, and how that method might work.
I wondered why they might be interested in doing that, and received some great comments on the post. One reason I that appealed to me is that a search engine would want to understand better what results it is showing searchers, so that it might be able to provide diverse and even personalized results to people using that search engine.
Of course, I’m concerned about what that might mean to what we see in search results, and how it might act to shape those results.
Another Microsoft patent application published last week, and filed within a couple of days of the application on identifying blog posts, looks at how it might present diverse results to searchers.
A new Microsoft patent application has some interesting statements within it about blogs. First it tells us of the value of blogs and blogging:
Blogging has grown rapidly on the internet over the last few years. Weblogs, referred to as blogs, span a wide range, from personal journals read by a few people, to niche sites for small communities, to widely popular blogs frequented by millions of visitors, for example.
Collectively, these blogs form a distinct subset of the internet known as blogspace, which is increasingly valuable as a source of information for everyday users.
Then it goes on to tell us that search engines work to limit results from blogs in searches, and the difficulties that search engines sometimes have in identifying blogs:
Ok, my title is a mouthful, but you have to love a patent filing that uses South Park characters in examples. Even if it is a somewhat odd patent filing.
Kyle, Stan, Kenny, and Cartman are programmers who report their daily code production to Mr. Garrison. Kyle creates a node datum 1001 reporting 111 lines.
The node datums are illustrated as using a markup language although any defined data format can be used. Kenny creates a node datum 1003 reporting 141 lines.
Cartman creates a node datum 1004 reporting a massive 214 lines. Cartman also creates a node datum 1002 for Stan reporting 66 lines.
Perhaps Stan should not trust Cartman.
There are often three pieces of information about pages displayed in search results to searchers in response to a search:
- Page title,
- The URL where that page can be found, and;
- A summary of the page in the form of a snippet or snippets, taken from either a meta description tag, or a description of the page from a directory like the DMOZ, or actual text from the page itself.
One mystery involving search engines involves how a snippet might be generated when it is taken from a page.