RationalSpace

Archive for the ‘SEO’ Category

Remove Duplicate link of Amazon Cloud Services

leave a comment »

Duplicate link remove of Amazon Cloud Services

You just migrated your website to Amazon Cloud and found that a whole set of URLs with AWS IP are getting crawled by Google ? Here is a guide to resolve this issue. This issue can be quite harmful for your site’s SEO as it will lead to duplicate content being indexed by Google!

Written by rationalspace

May 5, 2014 at 12:45 pm

Improving website performance – high speed delivered!

leave a comment »

Performance is a big thing. The faster the website, the better it is. In the world of new-age internet, speed is not only an important factor to retain users, but also important from SEO perspective. More and more weightage is being given to speed by search engines to rank websites. There are a number of tools like pingdom and google page speed insights available that help you to analyse your site with respect to performance

Since I have been working on improving performance of our website for quite some time, I thought of jotting down all the pointers required to optimize websites in one place.

  1. Use Sprite
  2. Zip your content – Use Content Encoding Header
  3. Add expire headers
  4. Remove blocking javascripts – Place assets optimally – CSS on top, JS on bottom
  5. Reduce Cookie Size
  6. Serve static content from another domain/CDN
  7. Optimise CSS
  8. Minify JS and CSS
  9. Cache resources – Use headers ” cache-control” and “Expiry”
  10. Render Google Ads asynchronously
  11. Compress images
  12. Don’t call ads in mobile responsively
  13. Minimise use of plugins in a CMS
  14. Optimise Queries
  15. Use APC Caching
  16. Add Character Set Header
  17. Add dimensions to images
  18. Load scripts asynchronously whenever possible
  19. Use Google PageSpeed Module

Written by rationalspace

February 26, 2014 at 3:49 pm

Tools to find broken links in your site

leave a comment »

If you analyse your web logs , you may find that quite a few 404s are reported by Google and other crawlers. A lot of times it becomes very difficult to figure out the source of these bad links. Though Google Webmaster has started reporting source along with crawl errors, but it is still not very useful a lot of times since its not accurate and does not cover all. It becomes particularly difficult for analysing medium to large sites where manually checking every page would be extremely labour intensive (or impossible!) and where you can easily miss a redirect, meta refresh or duplicate page issue.

I came across 2 tools that help to do this.

Hope this will help debugging and making crawlers happy!!

Written by rationalspace

November 29, 2013 at 3:36 pm

Posted in SEO

Test your website with lynx for SEO friendliness

leave a comment »

You may have developed a website that is very attractive in terms of UI but is not very friendly for any crawler. Since SEO forms a very important part of any internet success, it is crucial that you check your website in a text browser like lynx.

Here is an snippet from Google’s SEO blog:

Make your site easily accessible

Build your site with a logical link structure. Every page should be reachable from at least one static text link.

Use a text browser, such as Lynx, to examine your site. Most spiders see your site much as Lynx would. If features such as JavaScript, cookies, session IDs, frames, DHTML, or Macromedia Flash keep you from seeing your entire site in a text browser, then spiders may have trouble crawling it.

So better “brew lynx” or “apt-get install lynx” etc to set this up. There may be many surprises in store for you once you start checking your site on it. Your table structure may not show well or alt tags may be really bad or lists may not show up nicely etc. Well, this is how you start cleaning up!

Written by rationalspace

November 22, 2013 at 11:53 am

Posted in SEO

Sitemaps – do’s and don’ts

with 3 comments

Sitemaps are basically a log of all the urls that you as a website owner may want Google, Bing and other search engines to crawl and index.   Though one can create such a log in formats like html and text, Google recommends XML format.

Before generating sitemaps, you must first understand your site content well and try to categorise which pages are most important, which pages are most frequently updated , which pages may have duplicate content (Content that you might be aggregating) etc.

Based on the above criteria you can come-up with 2 kind of sitemaps.

Static XML Site-map:
This site-map will have all the static landing pages of the site.
All the pages that have duplicate content / content aggregated from vendors.

Dynamic XML Site-map: 
Main focus of this XML Site-map is to index the important pages of the site.
This includes all the Original content which you are publishing on the site.
And also you should add the most important pages that you want to target for SEO
Key things which we need to make sure when we publish the XML sitemap is –
  1. Sitemap should be accessible to crawlers – Should be root folder of your web directory.
  2. It should not have pages which we are redirecting to other pages of our site,
  3. Any page which is blocked in robots.txt should not be there in sitemap.
  4. It doesn’t specify the correct namespace, or the namespace is declared incorrectly.
  5. One or more entries lists the URL is the url of a sitemap instead of being the url of your website pages.
  6. Invalid date
  7. Invalid Tag Value
  8. Too many URLs in the XML Sitemap ( You should split sitemaps if one is becoming too big)
  9. URL Not allowed ( Forbidden error )
  10. Too many sitemaps in Index file.

You can have a master index file like sitemap.xml which can have urls to other sitemaps.

You can also ping Google and submit sitemaps. Here is a sample PHP code. But be careful – Google takes its own time to process sitemaps so it does not make sense to make Google process the pages that are not changing everyday. This might in turn affect crawling of other critical pages!

$url = "http://www.google.com/webmasters/tools/ping?sitemap=".urlencode("http://yoursite.com/sitemap1.xml");
file_get_contents($url);
$httpCode = $http_response_header[0];
if($httpCode == "HTTP/1.0 200 OK") {
file_put_contents("logs/sitemap".date("Ymd"),"sitemap submitted successfully");
}else{
file_put_contents("logs/sitemap".date("Ymd"),"sitemap could not be submitted");
}

Written by rationalspace

August 26, 2013 at 2:13 pm

Posted in SEO

Fixing Soft 404 errors

leave a comment »

In any large scale website with constantly changing data, it is quite possible that a lot of pages may show no data depending on date/time. There are news that could expire or items that are no longer available/have become obsolete.
It is a common practice to show a page with a search-box or some other default page to the user in this kind of scenario so that the user does not go away and see some response.

However, google does not like webserver sending status code 200 OK in header for a page that has no data. This is what it reports as Soft 404 error in webmaster. Google argues that its crawler can spend time in crawling some other relevant page of the website instead of crawling something that has no data. If the webserver sends 200 status code for such an empty page, it cannot really make the decision of not to crawl it.

An important directive that we can put in .htaccess to avoid this is the following:
ErrorDocument 404 /404.php

404.php is some custom page that you want to show to the user with a proper not found message along with links to some other popular pages of your website.

But there could be situations when you need to do a check in your web script also.
You need to make sure that data exists in your system before blindly rendering a page with empty data. If no data is found – redirect to a non-existent page.
header("Location: non-found-page.html")

Since this page “not-found-page.html” does not exist on your server, apache directive in .htaccess as described above will send a 404 in the header and show the content of 404.php. This way you can solve the problem of soft 404 errors without giving a bad user-experience:)

Written by rationalspace

August 13, 2013 at 3:27 pm

Posted in Apache, SEO

SEO Checklist – Steps to do before you launch your website

leave a comment »

I wanted to prepare a check-list of things one should do with respect to Search Engine Optimization before a site is launched. This is not an attempt to discuss what each of these items mean  – for that one could refer google guide for seo

This is just a summary of all big items in one place.

  1. Title tags in each page – should be unique, relevant to page content and brief. Within 70 characters.
  2. Description meta tag – summarize each page, should be relevant, don’t put just keywords there. Within 160 characters. Try to use this wisely – Not very long and not very short.
  3. Simple to understand URLs – use words , should not be cryptic , avoid parameters and session ids
  4. Provide one version of URL – don’t have multiple urls with same content . If they are there – use rel="canonical" or 301 redirect. rel=”canonical” should be a link to the url of the original content.
  5. Navigation by breadcrumbs
  6. Allow for the possibility of part if url being missing  – If one removes the last bit from the url, the rest of it should still work instead of showing 404.
  7. Create html sitemap with organisation of content  along with xml sitemap
  8. Content – avoid spelling mistakes,  organize your article, create fresh & unique content
  9. Write better anchor text – title in “<a>” tag can help the search engines understand what the href is about.
  10. Use heading tags well
  11. Use robots.txt to restrict crawling where its not needed.
  12. Use rel="nofollow" to tell search engines that you don’t want these pages to be followed or pass your page’s reputation to pages linked to. For example : a comment by a spammer on your site or some other site’s url that you may have referenced.
  13. Go Social , create back-links
  14. Use alt tags well – Alt tags in image is a way to make the search engines understand what the image is about. If you are posting an image of a car let the alt tag say that and not just image1 or any such meaningless text. Also try to keep the image names meaningful.
  15. If  you are using pagination for any listing type of content, make sure each new page has a unique title and description. Mostly people forget to add page numbers to the tiles and description of pages that are added due to pagination and have unique content. For example: title could be “Job listing – Internet Industry – Bangalore – Page 1”
  16. Add rel=’prev’ and rel=’next’ meta tags in pagination related unique pages. So if you are on page one of your listing pages, add rel=’next’ meta tag with second page url. On second page url, add “rel=’prev'” for page 1 and “rel=’next'” for page 2. Be careful not to add “rel=next” on last page 🙂
  17. Using structured data : If Google understands the content on your pages, it can create rich snippets—detailed information intended to help users with specific queries. For example, the snippet for a restaurant might show the average review and price range.  Read more about this here: structured data

Written by rationalspace

April 29, 2013 at 12:42 pm

Posted in SEO, Social Media

%d bloggers like this: