New Webinar: How to foster innovation culture in companys. Join us on October 5, 2023 13:00 - 14:00 CET.

SEO Quality Control: How to Use Landing Page Hashes

Google stops indexing your pages if they become too similar to each other. This blogpost will serve as a guide on how to prevent this by using landingpage hashes.

This blog post will explain how to optimize automatically created pages for search engines by using landing page hashes. Automated SEO is on the rise. But did you know that Google stops indexing and even crawling your pages if they become too similar to each other? At a recent meetup of Growth Hackers of Vienna (GHV on Facebook), Search Engine expert Franz Enzenhofer explained how landing page hashes can be used as a great tool to ensure the uniqueness of all pages. I have put these insights together for you in the following article.

Why do I need SEO Quality Control?

First things first. Before we look into what landing page hashes are and why and how you should use them, I would like to start by thinking about why quality control is essential. When Google crawls and indexes pages on the web, its highest concern is to deliver good value to its users. As the crawler is “only” a machine, its ability to judge a page’s quality is restricted and relies on certain standardized indicators. One of them is a page’s uniqueness: if a website’s pages and content are repeated over and over, it certainly does not deliver additional value to users. While this is typically not an issue for manually created pages, it definitely can be severe for:

  • Marketplaces
  • Aggregators
  • Platforms of all kinds
  • Large e-commerce sites

For example, this issue – also referred to as “self-spamming” – occurs if one shop category coincides largely with another category, and hence their corresponding result pages will overlap too. If Google recognizes this, it first stops indexing the pages in question, or worse, rejects to crawl them at all – which often defeats the very SEO purpose why these pages were created in the first place.

What are Landing Page Hashes?

Now let’s look into landing page hashes. In general, a hash function takes some input to calculate a unique hash value of some fixed size from it. In the case of Landing Page hashes, this input basically is the content of a certain web page, or a unique ID for each item on the page. It does not matter — neither for Google’s crawler, nor for the hash function — how the single items are sorted on each page.

A landing page hash is a sort independent and comparable identifier of the visible items on a list landing page.

How do you use this knowledge?

If the number of your pages to compare is limited – for example in the case of category results or list pages – you can resort to out-of-the-box tools calculating page similarities. For instance, if you are concerned about two shop categories overlapping, you can use Small SEO Tool’s Page Comparison. Simply copy-and-paste your URLs to compare, and receive a similarity score. For example, if there is a lot of similarity, you can now readjust your categories to reduce the overlap and thus improve SEO.

If the number of your pages is too large for these tools to handle, you (or your developer) will have to set up a hash comparison manually in your website’s backend. There, comparing landing page hashes represents a relatively simple, quick and scalable way to automatically check how similar each of your pages is to all the others – which would otherwise take an eternity for large platforms consisting of hundreds or thousands of pages.

This can be done the following way:

  1. Assign each page element (for example products in a web shop) a unique ID and save this ID to a variable.
  2. Use these variables as input and call a locality-sensitive hashing function on them. Only locality sensitive hashing will preserve your pages’ similarity in a meaningful way (other hashes create huge variation even from tiny differences). The individual hashing function will depend on the programming language used in your backend (for example PHP or Node.js).
  3. Compare the hash of each page with all the other pages’ hashes, for example by using Levenstein distance, which represents a good measure of similarity.

How to leverage these insights:

Equipped with the insights of your pages’ similarity, you can:

  • Rework your whole website’s hierarchy and SEO strategy to produce more unique pages
  • Adjust your list pages and (shop) categories in order to produce less overlap with others
  • Generate an XML sitemap that only lets Google index the more relevant pages

For example, you can adapt the script that generates new XML sitemaps – most probably overnight – to also produce a daily report on identical and nearly identical pages. If several pages are very similar, only communicate the more relevant to the search engines via the sitemap, or mark the less relevant ones “noindex”.

For more details on using landing page hashes for SEO, take a look at Franz Enzenhofer’s article on Medium.com.

If you liked this blogpost, stay ahead of the competition by following Growth Hackers of Vienna on Facebook, or join the LinkedIn Group!

BUT WAIT, THERE'S MORE

Want to stay up to date with the latest innovation content? There's plenty more where this came from. Subscribe to our newsletter today.

CHECK OUT MORE

Digital Channel News: August & September 2023 Roundup
The most exciting and essential updates from the world of Social Media, curated monthly for you by Content Specialist Josef Gasteiger.
Beyond Algorithms: Ethical Considerations in AI Enhancements
Explore the intersection of ethics and AI in modern business. Dive into the challenges and solutions for integrating AI responsibly in product decisions.
How to leverage corporate academies for employee training
Taking a closer look at all the benefits of in-house corporate academies and why they are the future of business learning & growth.

Get in touch

Interested in working together? Contact us and let’s talk about how we can support your innovation journey!

Thanks for your signup!

Your Guide should be in your inbox shortly. If you don’t find it straight away, make sure to check your spam folder!