LDA / topic modelling explained and why its needed for SEO and content…

Image from SXC.hu

If you’re new to SEO or even if your not; have you heard of the term ‘latent dirichlet allocation’, also known as LDA?

Wikipedia defines LDA as the following:

“In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word’s creation is attributable to one of the document’s topics. LDA is an example of a topic model and was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael Jordan in 2002.”


LDA has been rumoured to have been part of the Google algorithm for a while and is classed as an on-page factor relating to website content.

Now this all sounds very complicated but it’s actually quiet simple…

In a nutshell LDA relates to topic modelling. Topic modelling can be related to SEO through defining a piece of content with relevant words to the primary keyword.

So lets say you have a piece of web copy which refers to drums and guitars, but you want to make it more targeted towards guitars – including such words will help the process:

  • Strings
  • Frets
  • Neck
  • Pickups
  • Whammy bar

Lets say you were writing a piece of content about Nottingham, here are a few examples of key phrases you may want to include:

  • River Trent
  • Robin Hood
  • Sherwood Forest
  • Galleries of Justice
  • Lace

So, when Google visits the page of content and sees all of the above keywords, the Google algorithm will be able to determine that the page is about guitars or Nottingham. As you can see you are modelling the copy towards one topic via related phrases. Doing this is great for making pages even more targeted without being spammy.

Topic modelling is also good for content because you can avoid keyword stuffing by adding other variations, let’s say you are targetting the term “cheap holidays”, other terms that would work around this include:

  • affordable holidays
  • low cost holidays

You can find quick alternatives by using the Tilde (~) Symbol.

The theory put to the test…

Now I’ve tested this theory for a certain two word key term with the following stats:

  • 67 million results in Google
  • Estimated 20,000 visits per month
  • Estimated Adwords CPC of £6,75

I took a page of content and made it more targeted using the above theory and within the first cache of the page it had jumped from page five to page two with no other on-page or off-page activities. I also checked the back linking profile before and after the change and no other back links were achieved in that time.

Findings from the test…

Here is a breakdown of the page in question:

  • The length of the page was 700 words
  • The primary keyword was mentioned 4 times within the content
  • There were 3 variations of the primary term evenly distributed across the content
  • 1 variation was included in the META title and description

and finally…

Whilst this theory may seem complicated, following this method can be a great way to write content which reads great and is focused at the user, plus being very targeted and search engine friendly.

This post was written by , Dave is a digital marketer specialising in SEO and PPC, and can be followed on Twitter, Facebook, LinkedIn and Google+.

Sharing is caring...

  • Categories

  • Archives