This is the fifth article in a series on the perils of Duplicate Content. We have previously covered the problems of repeating Title tags and Meta tags across your site and using standard product descriptions.
I know what you’re thinking:
“Sure, first day back from a long weekend, and the guy’s making up words that are impossible to say!”
I figure I just lost 20% of the audience with that one word. Oh well, it’s fact, a problem, and it needs to be dealt with. When we at Strider are reviewing florist websites, the vast majority have canonicalization issues, and in many cases it’s hurting them.
So what is this tongue-twisting peril? The term “canonicalization” refers to having a canonical (Def: authoritative and confirmed, trusted, true) link to each page on your site. Having secondary links that display the same page makes the page appear duplicated. Search engines can’t tell that they are all the same page, so they will only store one version. The “link juice” flowing to the other URLs will be lost.
Using our FlowerChat.com site, we’ll list several URLs that will all return the forum index page.
All of the URLs above will show you the same page. Since they are six different addresses, a search engine will view them as six different pages. We need to select one URL format as the “canon” and prevent the rest from being used.
Session Ids & URL Parameters
But wait – it gets better! Since forums can have different styles, you may well see a URL like this:
So now in addition to the six duplicate URLs above, we have another six options, all of which will still serve the same page contents. This is further multiplied by the number of style options. If FlowerChat was to have three active style options, we would have at least 24 different URLs pointing to the same page!
Many Content Management Systems (CMS – some well known ones are Joomla, WordPress, Drupal) are guilty of producing URL structures that lead to duplicate content. They do this through the use of URL parameters that appear at the end of the URL, like the styleid example above.
Some popular florist shopping carts are very dependent on URL variables. You may have your home page at www.myflowershop.com/default.aspx. To display the other pages of the site, the website software will add parameters to the URL to specify what page to display.
The parameters may be used to specify category, page, language, currency …
While Google has stated they will index up to two URL parameters, they have been known to index more. It’s important to note that URL parameters can be in any order – it doesn’t matter to the website! However, changing the order means the URL is different, and duplicate content is created again!
www.myflowershop.com/default.aspx?cur=USD&l=en&p=2&c=3 will serve exactly the same content as the example above.
One of the worst offenders for creating duplicate content is the Session ID. This is used by some (mostly outdated) shopping cart platforms and CMSs to track an individual user. A Session ID is a unique string of characters assigned to the end of a URL. If Cathy, Rick and Jon are all viewing the same website they will each have a different Session ID:
- Cathy: www.myflowershop.com?scid=d351g999
- Rick: www.myflowershop.com?scid=b10gvv1z
- Jon: www.myflowershop.com?scid=c0k1ee0g
This means the search engine will see different URLs every time they visit, and in every link to the site. Using this example from a florist in California we see this long URL:
The section of bold text is the Session ID. Visiting the same site a bit later I am given a different Session ID of: osCsid=ba0237201e14617e57c747e860d73b33. Not only is that ugly and creating duplicate content, try giving that URL to a customer over the phone 🙂
Note: This site is also victim to another sign of web provider laziness – serving the contents of the site on the providers domain. But we’ll cover this a bit later.
How do you refer to your own pages? While there are a number of solutions available to help prevent duplicate content, it’s important to remain consistent with your chosen (canonical) URL format. If you have chosen www.myflowershop.com/ as your canonical URL for your homepage, always use that exact link from within your own site.
After all, if you can’t be clear about the correct address of your site, how can you expect the search engines to figure it out?
There are a number of solutions to prevent indexing of duplicate content on your site. Unfortunately, many require some knowledge of programming and/or access to your server. If you don’t have this knowledge or access, ask your website host to fix it. If they won’t … then find a new host.
- Session Ids: Don’t use them. Period. Use a cookie. If you can’t make the switch, find a new web host or at least a new website platform. Don’t ignore this.
- Internal Linking: This is one thing within your control. Start by checking the link on your site’s navigation – does “home” point to your domain name alone, or the domain name with /index or /default after it? When you’re linking to other pages from within the content of a page, make sure to use the full and proper URL for the target page.
- WWW vs. No WWW: There is no definitive answer from an SEO perspective as to which is better. It’s a matter of personal preference. Pick one. Stick to it.
- 301 Redirects: If your page can be accessed by more than one URL, create 301 (never 302!) redirects on your server to point all versions of a URL to the canonical version. This definitely requires a technical explanation that is beyond the scope of this article. Ask you host or provider. If they won’t fix it, lose them and upgrade.
- Search Engine Friendly URLs: Replace your spaghetti URLs filled with parameters and values with search (and people) friendly URLs. Some software platforms have an integrated option for this. Others will have plug-ins that you can add. Some instances may require the manual use of mod_rewrite statements on your server – again, outside the scope of this article. Ask your provider.
- Noindex, Follow: Pages that will display duplicate contents by nature (ex: search results on your site) should use the Meta Robots tag to instruct the search engines to follow all links from the page, but not actually index the results page.
Duplicate content serves to dilute the power of incoming links to your site, and an abundance of dupliate content on a low-authority site (98% of florist sites are not authority sites) may cause a search engine to determine the entire domain is of little or no value.
In our next article we’ll look at a few ways to check your site to identify duplicate content. Following that, we’ll begin to look at ways to create unique content on your site.