Home Website Templates Website Hosting Free Templates Free Flash Templates Free WordPress Themes Tutorials Blog Contact Us
Types
HTML5 Website Templates (new)
Easy Flash Templates
Website Templates
Flash Templates
Flash Animated
Flash Intro
eCommerce Templates
PhotoVideoAdmin Templates
GalleryAdmin Flash Templates
VideoAdmin Flash Templates
Joomla Templates (new)
PowerPoint Templates (new)
OpenCart Templates (new)
Categories

Animal & pets
Art & Artworks
Books & Literature
Business & Finance
Cars & Vehicles
Children
Communications
Computers
Dating
Education
Electronics
Entertainment
Fashion & Beauty
Flip books & Notebooks
Flowers
Food & Drink
Futuristic
Gambling
General
Healthcare
Hobby
Industry & Constructions
Interior & Furniture
Jewelry
Marina & Water
Media
Music
Night Clubs
Personal Page
Photography
Real Estate
Religious
Science & High Tech
Shopping
Society
Spiritual
Sports
Travel
Web Design
Web Hosting
Marriage/Wedding
Do You Like Our Website? Share With Others!

Share |

How To Protect
Your Files From Robots

By Erika Lawal (c) 2003

Optimizing website pages for the search engines without running into trouble at the very least causes most of us webmasters to keep our brain cells finely honed, and at worst induces massive migraines!

One of the most common challenges for us all is how to present "clean", relevant and original content to a wide range of visitors.


You may find that you want to exclude search engine and other robots from all or part of your website for a number of reasons including;

  • you want to write similar pages for different types of visitors, but don't want to be penalized for duplication.

  • you want to prepare pages or files that you don't want viewed.

It's very easy to achieve this by one of two means. You can use either a robot.txt file or a meta tag.

Let's de-mystify the process of writing these files and tags!

WHAT IS A ROBOTS.TXT FILE?

A robots.txt file is an instruction to the robots that travel the web, spidering the pages they find there. There are several forms such a file can take - how often to traverse the site, if at all, and how.

The robots.txt file we're considering here is an exclusion instruction - think of it as a "no entry" sign to robots.

You can write a file to exclude ("disallow") robots from all, or just part of your site.

Before you begin, you need to know how to write the .txt file.

Prepare it in a text editor such as Notepad. Don't attempt it in Word or an HTML editor such as FrontPage. When you're finished, save it as "robots.txt".

WHAT TO PUT IN YOUR ROBOTS.TXT FILE

If you want to disallow all robots, you'd write;

User-agent: *
Disallow: /

And that's all. Nothing else.

What about if you only want to exclude part of your site?

Let's pretend you're running a website which advises on raising children. Your material will be relevant to surfers who live in many countries, but if you want them to really sit up and look, especially if you want them to buy from you, you'll need to make sure that your content is region-specific, including references, idiom and spelling.

This situation is an ideal candidate for a robots exclusion .txt file.

You've written all the pages you want to show to surfers in Canada, UK, and Australia in 3 separate directories which surfers will access by clicking on an appropriate link on your main pages.

The directories are:
/ca/
/uk/
/au/

To disallow robots from these directories write the following .txt file;

User-agent: *
Disallow: /ca/
Disallow: /uk/
Disallow: /au/

It may be that you want to allow some robots and disallow others.

In our example, it may be that you want to disallow just one robot, from one directory, in which case you'd write;

User-agent: NastyBot
Disallow: /ca/

Or to exclude all robots except one, which you want to traverse all of your site;

User-agent: NiceBot
Disallow:

User-agent: *
Disallow: /ca/

Note that if you don't enter a slash, that means the robots are permitted to read the whole site. " * " means all known robots. So in the last .txt file example, all robots are excluded from your Canadian directory, except NiceBot, which can read the whole site.

Easy isn't it!

WHERE TO PUT YOUR ROBOTS.TXT FILE

Once created, your file needs to go into your root directory. This is the same directory which contains your home page. Don't put it anywhere else, because the robots won't see it.

Note that you can only have ONE robots.txt file per site, so any modifications will need to be integrated into your original file.

Note also that writing a no index robots.txt file means these pages won't be indexed, but that won't matter if you've optimized your indexed pages properly.

In our Ca/UK/Au example above, your traffic will find your indexed global/US pages via the search engines, and will make the link to their "nationality" page from the point of entry to your site - we've all seen the little flag links on other sites - just put up a flag graphic and say for example; "UK Visitors Click Here".

If you want to learn more about exclusion robots.txt files, visit:

Web Server Administrator's Guide to the Robots Exclusion Protocol

If you prefer/need to exclude individual pages from being viewed by robots, you can do this using a robots.txt file, but you can also achieve it using a meta tag on your web page between the <head> tags. The universal exclusion is as follows:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

It may be that you want robots to index your pages, but not to archive them. There may be a range of reasons why you don't want search engines to keep copies of old pages - the most prevalent one among webmasters is because they are cloaking pages and don't want it known that the page served to search engines is a different one to that seen by surfers, but it's also possible to have perfectly "legitimate" reasons for wanting to exclude parts of your site from public scrutiny.

Whatever your reason, if you want to avoid your page being indexed, the universal tag is:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

For Google (the search engine you are most likely to want to avoid archiving your pages for its cache feature), the tag is:

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

To learn more about exclusion meta tags, visit:

The Robots META tag

Don't be put off by the jargon; writing these files and tags is one of the easiest and most useful technical tasks you can undertake as a webmaster - write a file today and save yourself hundreds of hours!

About The Author
Erika Lawal writes Daily Internet Marketing Tips for webmasters desperately in search of cutting edge site optimization and marketing advice that produces results. Get a FREE series of our Tips by visiting: DailyInternetMarketingTips.com

Featured Flash Templates

Shop Secure With PayPal. We accept all Major Credit Cards.
Our Partners
Free Flash Templates Free CSS Templates Free Web Templates Your Link Here... Website Design Blog
Top Sites  |  Link To Us  |  Resources  |  Terms Of Use  |  Privacy Policy  |  Site Map  |  Template Tour  |  Call Back Request  |  What's Cool?  |  Software Development  |  Portfolio
Home Website Templates Website Hosting Free Templates Free Flash Templates Free WordPress Themes Tutorials Blog Contact Us
Copyright © by Metamorphosis Website Design 2003-2012. All rights reserved