In the previous posts in this series I’ve focused a lot on the “conceptual” aspects of SEO – the non-technical things that can make a big difference to your SEO efforts. Many of these aspects have other practical and usability benefits.
Over the next few posts I’m turning to some of the more technically-oriented things that you can do to optimise for search engines. These posts definitely sway towards the geek end of the spectrum (just a fair warning if that’s not your thing). However, even if you’re in management, it helps to get an overview on such matters if only for when you’re briefing your tech team.
Today’s post focuses on technical matters that are visible to your participants (i.e. they impact how your users access the site). Future posts will look at some of the behind-the-scenes things you can do to assist search engines.
As before, many of these tips are best practices for other reasons, but they all certainly provide SEO benefits as well. Some techniques will have a bigger impact than others, and how much impact a particular technique may have on rankings is largely unknown (as far as I can tell) as most search engine algorithms are closely guarded secrets. So even if you can’t apply all these techniques, it’s still worth incorporating as many as you can into your site.
Web addresses are technically known as Uniform Resource Locators, or “URLs” for short. URLs uniquely identify an individual “resource” (a web page, image etc.) on the web, which is how a browser is able to display all the content we view each day.
Search engines are affected by URLs in a couple of ways:
- URLs that include query string information (which appears at the end of a URL following a “?”) are ignored by search engines under certain conditions
- Pages that are too deep in a site hierarchy (i.e. greater than 4 levels deep) may also be ignored
- Some search engines will use the words in the URL as part of their index (i.e. to determine relevancy for specific keywords)
For these reasons it’s important to get the URLs of your site right.
Some CMSs or systems that query databases (e.g. a product catalogue) will be implemented using the query string (the bit after the “?” in certain URLs) to identify content/pages etc. within a site. Sometimes this query string will contain things like product codes or database record identifiers that don’t contain much meaning.
By default the popular blogging engine WordPress uses this format to access blog posts – for example, “http://zum.io/?p=213” is the “non-clean” URL for a post on this site. This format of URL is used by default in WordPress if you don’t enabled the clean URL options in your admin settings. The query string – “?p=213” – contains the blog post ID which WordPress can understand, but isn’t very useful to anyone else.
URLs like this can be problematic for search engines. If at all possible, you should create and enable what are known as “clean” URLs on your site, and you should make sure that your CMS supports this requirement.
Clean URLs don’t have the query string parameter and usually include a human-readable component. The clean URL for the example page mentioned above is “http://zum.io/2009/02/16/exploring-seo-part-4-writing-effective-copy/” which includes the keywords from the title and date information which is useful to both humans and search engines alike.
(As an aside: in WordPress you can do this under “Settings > Permalinks”)
In a previous post I mentioned that getting the information architecture of the site – the structure of information across the site – was important. This is one place where doing this well helps.
As most website will use section names, or derivatives, in URLs, your URLs will naturally contain your trigger words because you’ve thought through your information architecture well.
For example, WWF-Australia has a section called “Our Work” with a sub-section called “Climate Change”. This is represented in a URL as “http://wwf.org.au/ourwork/climatechange/”. You can see the keywords “climate” and “change” there in the URL. So good information architecture makes this process easy ;)
Search engines pay a lot of attention to the title of a page, so it’s important that you ensure they are keyword rich and well structured. When I refer to “title” here, I’m talking about the <title> element which is rendered in the title bar or current tab of the browser.
Some CMSs automatically prepend the title with the site name and section (e.g. “Site name > Section title > Page title”), see if your team can flip the order of these elements in the template so that the unique content, the page title, appears at the front (e.g. “Page title / Site name / Section title” or similar). You can see this at work on this blog – the theme is designed to support this requirement.
See the previous post for some tips on creating effective titles. These same tips apply here.
Some web servers can be configured to accept requests at both “<domain-name>.com” and “www.<domain-name>.com” (note the missing “www.” in the first example). Search engines may treat these as different domains (as “www.” is technically a subdomain) – potentially affecting your rankings adversely, as an incoming link to the “www.” URL may be considered distinct to one coming to the plain domain.
For example: if one site links to “http://www.<domain-name>.com/section/page.html” and another site links to the same page, but without the “www.” – “http://<domain-name>.com/section/page.html” – a search engine would consider that a vote for two separate pages, reducing the ranking of each as a result.
Home page URLs
This also applies to home page URLs. Your server may be configured to respond to requests for the same page using different URLs – for example “www.<domain-name>.com”, “www.<domain-name>.com/” (note the “/”) and “www.<domain-name>.com/index.html” are all considered different URLs by search engines, even though in most web servers they would result in the same page (the home page) being shown.
CMS that don’t enforce unique URLs
Lastly, your CMS may allow you to use the same content in two different parts of the site, so the exact same content/page can be accessed at the end of two different URLs.
For example, you may access the same page at: “http://<domain-name>.com/section/mypage.html” and “http://<domain-name>.com/a-different-section/mypage.html”. Again – these are seen as two separate pages by search engines, even though the same content appears in both places.
In each case, your server or CMS will need to be configured to avoid these scenarios.
For example, your web server should be configured to redirect to one canonical domain – e.g. requests to “www.<domain-name>.com” be redirected to “<domain-name>.com” (or vice versa) so that there is only ever one domain being referenced by incoming links. I emphasise redirect, as the server should force a HTTP 301 Status to indicate to the browser/search engine that only one URL should be used.
Speak to your tech team about implementing this if you don’t have it setup already.
(A quick aside: if your website doesn’t work when accessed using “<domain-name>.com” it should – speak to your team or ISP about setting this up if it isn’t already.)
In cases where you don’t have access to your server, or your ISP does not allow configuration of these settings, you can use a special HTML tag to advise search engines as to the correct URL to use. This is not as good as the server-configured solutions (as it doesn’t fix the problem of duplicate URLs, simply works around it), but may be the only practical solution in your circumstance.
More on URL canonicalisation:
- Matt Cutts: SEO advice: URL canonicalization
- Google Webmaster Central: Demystifying the “duplicate content penalty
- Michael Nguyen: Cleaning up canonical URLs with redirects (technical post on one technique to us with Apache)
Next installment we’ll look at some of the “invisible”, behind-the-scenes, things you can do.