How I Came to Love Entities

I recently revisited a web site that I worked on almost a decade ago, and one of my favorite pages on the site no longer exists, but its spirit and inspiration remains. The site was Baltimore.org, at the time for the Baltimore Area Convention and Visitors Association, and has since been rebranded to the more memorable, “Visit Baltimore.” Back in 2005, the Association told us that they wanted a page on Black History, and that they wanted it to rank well for the term “Black History.”

The Baltimore Inner Harbor

One of our early efforts wasn’t bad, but lacked the ability to generate a lot of interest and wasn’t really shared much by others. We weren’t really drawing a lot of traffic to the site for the term black history, and there were a lot of really good pages that deserved to rank well for the term. Ours just wasn’t competing.

Old Baltimore Black History Page

I knew Baltimore had a rich history, filled with historic churches and schools and people that should enable it to do much better.

I woke up one morning with a thought in my head that we were trying to force the term “Black History” into prominence at Google without really giving people a glimpse as to why our page from baltimore.org should rank well for the term. I went into work that day, and asked one of the copywriters I was working with, Lisa Melvin, if she would rewrite the page, and ignore any concept of word counts.

Instead tell visitors about the famous people and places in Baltimore that showed its Black history. Lisa was working remotely, and she couldn’t see how serious I looked at the time, so she had to ask me to repeat myself. I did. And she returned a lengthy article that did just that.

I had told her to put the locations of these historical sites into the article so that people could visit them today. That was part of the goal of a Visitor’s Association website after all, to get people to visit.

She did.

Here’s a snippet from the page, which shows off history, and tells you were to go to see this historical places:

Snippet from Newer Version of the Baltimore Black History page.

At 3,300 words, this was one of the longer articles we had published on a client’s site.

Within a couple of months, this page on Black History that hadn’t been getting much traffic, was the 6th most visited page on the site. Even better, it was bringing actual visitors to the site. Telling people about Frederick Douglas, James Hubert (“Eubie”) Blake, Fanny Coppin, Billie Holiday, and Oprah Winfrey, and their ties to Baltimore were the kinds of things that people wanted to learn about.

Letting people know where they could see the places where people lived, where events took place, and what kinds of impacts those things had, brought them to the website, and to the City.

We took a page about the words “Black History” and turned it into a real page about Baltimore’s Black History, and that made all the difference.

Summary
Article Name
How I Came to Love Entities
Description
Optimizing a page for a word or phrase was a lot less rewarding that optimizing the page for a user experience, and focusing upon entities let us do that.
Author

13 thoughts on “How I Came to Love Entities”

  1. Hi Giovanni

    It’s likely that search engines never used LSI for ranking ever. It’s just not a good fit as an algorithm for indexing Web content, and there’s also a patent on it that would possibly keep others from using it. It’s also old, and newer technologies developed afterwards that possibly could have been used.

    Here’s the patent:

    Computer information retrieval using latent semantic structure
    http://www.google.com/patents/US4839853

    Here’s the 1990 paper about it:

    Indexing by Latent Semantic Analysis
    http://www.cob.unt.edu/itds/faculty/evangelopoulos/dsci5910/LSA_Deerwester1990.pdf

    It’s possible that a search engine might use something such as TF*IDF in an algorithm such as one that chooses query refinements to show on search results pages, and I’ve seen a Google patent filed within the last 5-10 years that suggests that possibility, but as a ranking signal for web pages, I think it’s had its day, which is long past.

  2. Hi Bill, nice to read you again! It’s long time I’d like to ask you a question and this article is perfect for it 🙂
    Considering some old algo for Search engines: TF-IDF and LSI –> years ago one of the most common problem for these two algorithms was that you could obtain higher f(x) values with very long texts (more repetitions&density&co).

    Today, do you think that we can still obtain some little boost from very long content? Same question with different words: do you think that the concepts behind TF-IDF and LSI are still valuable for ranking? If yes, what is their weight?

    Thanks a lot for any clarification 🙂

  3. Hi Giovanni

    I’m sure that there probably is stuff from IR that may still be in use. But the things that they are being used for may not be today’s search engines, which are changed and added to and updated every day.

    TF*IDF became something more advanced years ago, such as:

    http://en.wikipedia.org/wiki/Okapi_BM25

    LSI was created before there was a web, and was intended for document databases that were much smaller than the web, and didn’t change so much. There are a number of patents assigned to Google that focus upon a probabilistic LSI, and that might be a good direction to do research in.

    I appreciate the comments, but they really don’t have anything to do with the topic of this post. I really can’t answer your questions, and don’t know about the impact of text length on a page.

  4. Hi Giovanni

    I don’t mind being off topic too much, but I don’t want to dissuade people who might be interested in the topic of my post from commenting about it. 🙂

    I also don’t want to have them misled by questions that involve “density” which is more myth than anything. 🙁

  5. Thank you very much Bill.
    I’m studying some IR and, talking about pure theory, very often I can read about LSI and TF-IDF. It’s clear that we are talking about ’70 years stuff but… I thought that modern search engines could use some evolutions of these original functions with some common concepts. That’s why I asked about the text length and its impact on ranking.
    So, is there nothing left we can study about IR that is still used (also partially) for ranking?
    Thank’s again and have a great day!

  6. Clear. Thank you for your kind reply and sorry if I brought you off-topic

  7. Appealing to readers can be difficult. On one hand, you want to provide the best information for a given subject. On another, you want to make sure that you are taking emotional factors into account in your copywriting. It is definitely a fine balance – no surprise that appealing to people in that way produced better results.

  8. Hi Bill,

    great article. I imagine that the traffic was caused by many long tail keywords, you never planned and that the content was shared by natural backlinks. Do you remember the SERP after that in Google.com for “black history”?

    Yes, this is also my approach to SEO, provide content that is helpful for people and helping search engines to find it.

    I enjoy very much your postings, only if it get to tecnical, I must rethink it sometimes several times.

    It is a pity, that they have deleted this article and not updated and keep it.
    Often not Google is the problem, but the client and then he is suprised, that his perfomance is not as such as expected.

    Thanks,

  9. Hi Geoff,

    Thank you. There’s a link to the internet archive version of the page in my post, with the anchor text “lengthy article” above.

    I wasn’t so much trying to make any definitive statement about Google’s use of entities as much as I was about how I learned how useful they could be, and how much they enrich the content of pages in significant ways. 🙂

  10. Hi Hans

    The additional long tail terms did lead to additional visitors to the site. Since there weren’t many articles online about many of those places, they tended to rank well for those terms, and the people who were searching for them online were interested in search for them in the real world too. And they were sharing the page with other people.

    I don’t remember the SERP ranking, but I agree that it is a shame that the article is no longer online. I thought it was pretty useful.

  11. Hi Ryan,

    Once we made the switch on the page to focusing upon appealing to readers of the site instead of the search engine, we started getting a whole lot more traffic to it, and that made us feel really good about it.

    We also had a page about the Baltimore Ravens, and I suggested that they ask players on the team what they liked about Baltimore. They didn’t do that. They created a video instead, and asked the coach to talk about Baltimore and what he liked about it. That was close enough for me. 🙂

  12. Hi Bill,

    Interesting. I wish you’d posted the whole page, because it’s great content,especially for someone who lives in Baltimore. I suppose you’re trying to say that, despite the recent hype, entities have been used for a long time by Google. Thanks for the great read!

  13. Hi Bill,

    Best explanation of search entities I’ve read. The final paragraph is a great soundbyte for anyone who doesn’t grasp the concept:

    “We took a page about the words “Black History” and turned it into a real page about Baltimore’s Black History, and that made all the difference.”

    I’m still being asked to ‘check the copy for SEO’ and I generally bounce it straight back. Writing about the actual subject instead of writing around one keyword phrase isn’t an SEO thing, it’s just the way online writing should be done.

Comments are closed.