Ann Yaroshenko 2 September 2019

Categories B2B, Content, Data & Analytics, Ecommerce, Search Marketing, Strategy & Operations, Technology

What’s a Canonical URL? When and How to Use It

When search bot crawls your website and finds similar data on multiple URLs, it doesn’t know how to treat your content. In most cases, bot trusts the clues you give it (unless you trying to manipulate search results). So, the game plan is to specify which pages are original and which are appreciably similar. Let's find the best way to do it.

Are you sure you don’t have duplicates?

A canonical URL is a page that Google perceives as the most relevant from a few duplicate URLs on a website. Maybe you think: I don't copy any URL, so there is nothing to worry about. In fact, duplicates can be automatically generated.
For instance, search crawlers might be able to reach your landing page in different ways:

HTTP and HTTPS protocols:

http://www.yourwebsite.com
https://www.yourwebsite.com

WWW and non-WWW:

- http://example.com
- http://www.example.com/

What path to your website is preferable? Choose the best way and don’t forget to tell Google about your choice.

Let’s move on, e-commerce sites provide different paths to similar content.
URL parameters for sizes, colors, brands, etc. generate thousands of duplicated pages.

Of course, you don’t want to create duplicates deliberately but parameters can cause this issue automatically. Large-scale duplication may dilute your ranking ability. Nevertheless, even if your content does rank, Googlebot may pick the wrong page as the high-priority. Using canonicalization helps you control duplicate content.

You can canonize the URL using different approaches. We will talk about pros, cons, and specificity of each method below but firstly let’s go through the general rules you need to respect whichever way you will choose.

Canonicalization: General DON’T

Don’t mark a few URLs as canonical for the same webpage. Suppose you specify page A as canonical version of your content with rel=canonical <link> tag in code and also canonize page B in the sitemap. Oops, you’ve just confused the Googlebot and it hates to be confused. Canonize your content attentively and always choose only one URL as the original.
Don’t use rel=canonical <link> tag/ HTTP header on category filters pages

With a bunch of possible combinations of filters on e-commerce website (colour, sizes, brand etc.) for each item on the site, search bot could waste a ton of your crawl budget crawling in and out of navigational filters.

Whilst the rel=canonical element will help you avoid duplicate content issues, this approach won’t save you any crawl budget. Furthermore, canonical tags can often be ignored by search engines bots so you should use this in conjunction with another approach, to direct search engines toward the preferred version of each page.

Don’t use robots.txt directives for canonicalization. Robots.txt is a roadmap to Google, it shows which URLs should be crawled, and which should be ignored. Nevertheless, robots.txt can’t be used for canonicalization. John Muller confirmed that on Twitter:
Tweet
Read more about robots.txt essentials here.
Don't block duplicates with noindex <meta> tag or HTTP header. This directive prevents URL from appearing in SERP, and not the way of canonization.

English Google Web master Central office-hours hangout, 00:20:50

Google recommends using a rel=canonical to the preferred version of the URL so that the signals for both versions can be combined rather than dropping the signals from the noindexed page.

Don’t link to the duplicated URL within your website. If you specify some version of content as canonical, you consider it original, relevant and the most profitable version. It’s strange for Google when you are building internal linking structure on duplicated URLs. That confuses Google and, as we found out above, Google hates to be confused.
Don't insert duplicates in the URL removal tool in Google Search Console. This method temporary hides ALL duplicated and canonical versions of your URL from SERP.

By blocking a page on your website, you can stop Google from indexing and ranking that URL. In other words, users won’t be able to see or navigate to canonical as well as duplicated version of the URL.

Don’t use HTTP URL as canonical when you have the version with HTTPs protocol. Google prefers HTTPS over HTTP URLs as canonical, except when there are conflicting signals such as mentioned in Google’s guidelines.

So, we found out which canonicalization ways to avoid and now let’s talk about how to show your preferred content to Google correctly.

Proven ways to canonicalize your URL

Google identifies 4 techniques on how to show your preferred version is SERP.

1. Rel=canonical <link> tag

Setting this tag is similar in concept to a 301 redirect, only without actually redirecting. With this tag you can canonize as many URLs as you want.
Implementation is simple:
insert link tag with the attribute rel=canonical from the non-canonical (duplicated) URL to the canonical one in HTML code of your webpage.

Note that this tag only works for HTML URLs. Read the detailed guide on how to work with rel=canonical <link> tag like a pro for further information.

2. Rel=canonical HTTP header

You should add the rel="canonical" HTTP headers (rather than Rel=canonical <link> tag) to indicate the canonical URL for non-HTML documents such as PDF files.

For example, you may offer information about prices as both an HTML page and as a downloadable file in PDF:

http://www.example.com/price.html

http://www.example.com/price.pdf

In this example, with both files serving up the same information, a server-side canonical tag could be applied.

3. 301 redirect

A redirect is a way to send both visitors and Google to a different page from the one they originally see. A 301 Status Code is a permanent redirect which passes between 90-99% of link equity (ranking power) to the redirected page.

You should use 301 redirect if you are moving your website to a new location, changing your URLs to a new structure or if you have expired content on your website such as old products or news items.

4. XML Site map

A sitemap is a file that provides a list of pages on your website for Google. Google's John Mueller said in a video hangout at the 3:16 mark that the URLs in your XML site maps are often used to define your canonical URLs.

English Google Web master Central office-hours hangout, 3.16 mark

“...we use site maps URLs as a part of trying to understand which URL should be the canonical for a piece of content.”

Please, pay attention that Googlebot still must determine the associated duplicate for any canonicals that you declare in the sitemap files. Also, note that site maps are less strong signals to Google than the rel=canonical method.

Wrapping up

So, canonicalization is a way to tell Google which pages are preferable for indexation and ranking, and which are not. Unless you want to confuse bots, respect these Don’ts:

Don’t mark a few URLs as canonical for the same webpage.
Don’t use robots.txt for canonicalization.
Don't block duplicates with noindex <meta> tag or HTTP header.
Don’t link to the duplicated URL within your website.
Don't insert duplicates in the URL removal tool in GSC.
Don’t canonize HTTP URLs

Instead of using abovementioned unreliable methods, choose one of the 4 proven ways to canonize your URL, such as:

301 redirect - when you are moving your content permanently to the new location;
XML Site map - as the additional way to show Google your preferable content (you still have to mark duplicates with rel=canonical);
Rel=canonical <link> tag - when you need to specify an infinite number of duplicates
Rel=canonical HTTP header - when you need to canonize non-HTML files

Keep learning

We also gathered the list of relevant guides to dive deeper into canonization methods:

Google: Tell Google about your duplicate content
JetOctopus: Work with Rel=canonical Like a Pro
Moz: How To: Advanced rel="canonical" HTTP Headers
Google: Block crawling of parameterized duplicate content
Yoast: Canonical. The ultimate guide

If you have any questions about technical SEO in general, and canonical URLs in particular, feel free to ask it.

Please login or register to add a comment.

    
            AI-driven Personalisation Dominates the Future of Travel and Hospitality Marketing
        
                Michael Nutley
                3 December 2024
            
Read more

            The Impact of Exceptional Customer Service on Retention in E-Commerce
        
                Mihir Bhatt
                29 October 2024
            
Read more

            Prompting the Future - AI and Beyond - DMEXCO 2024 
        
                Zsofia Raffa
                22 October 2024
            
Read more

            The Role of AI in Transforming eCommerce 
        
                Gaurav Sharma
                17 October 2024
            
Read more

            Building Trust Through Independent Research: A Strategic Approach for B2B Marketers
        
                Nick Colthorpe
                16 October 2024
            
Read more

            Don’t Be Fooled by Synthetic Data for Market Research: Why Insights From Real People are Key to Authentic Brand Building
        
                Digital Doughnut Contributor
                15 October 2024
            
Read more

            The Ultimate Guide to Interactive Video Marketing
        
                Ghia Marnewick
                10 October 2024
            
Read more

            Seeing is Believing? Deepfake Detection at the Time of Super Election Year
        
                Zsofia Raffa
                9 October 2024
            
Read more

            Podcast – Data-Driven Decisions: The Business Advantage
        
                Russell Goldsmith
                8 October 2024
            
Read more

            Best SaaS Integration Platforms: Examples and Importance
        
                Natalia Polomkina
                3 October 2024
            
Read more

Estonia

A passion for proven technical and on-page SEO techniques and data-driven approach to creating content.

See more about this author

Skills

This user has not entered their Skills

Previous Experience

This user has not entered their Previous Experience

Education & Qualifications

This user has not entered their Education & Qualifications

Contribute Now!

Loving our articles? Do you have an insightful post that you want to shout about? Well, you've come to the right place! We are always looking for fresh Doughnuts to be a part of our community.

Digital Doughnut

Register

Article

What’s a Canonical URL? When and How to Use It

Are you sure you don’t have duplicates?

Canonicalization: General DON’T

Proven ways to canonicalize your URL

1. Rel=canonical <link> tag

2. Rel=canonical HTTP header

3. 301 redirect

4. XML Site map

Wrapping up

Keep learning

Author Profile

Skills

Previous Experience

Education & Qualifications

Contribute Now!

Popular Articles

How to Review a Website — A Guide for Beginners

The Impact of New Technology on Marketing

How Fashion Brands Are Retaining Consumers Through Social Proof

Sales and Marketing Collaboration: A Recipe for B2B Success

Best SaaS Integration Platforms: Examples and Importance