<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
	<channel>
		<title><![CDATA[Latest posts for the topic "Robots.txt"]]></title>
		<link>https://proxy2.de/forum/posts/list/2.php</link>
		<description><![CDATA[Latest messages posted in the topic "Robots.txt"]]></description>
		<generator>JForum - http://www.jforum.net</generator>
			<item>
				<title>Robots.txt</title>
				<description><![CDATA[ Here is a nice robots.txt file that will allow the good bots to enter and keep out the bad bots.<br /> <br /> [quote][color="green"]#<br /> # robots.txt generated by www.1-hit.com's robot generator<br /> # Please, we do NOT allow nonauthorized robots any longer.<br /> #<br /> User-agent: *<br /> Disallow: /cgi-bin/<br /> <br /> User-agent: googlebot<br /> <br /> User-agent: Scooter<br /> <br /> User-agent: Openbot<br /> <br /> User-agent: fast<br /> <br /> User-agent: ZyBorg <br /> <br /> User-agent: Slurp<br /> <br /> User-agent: Googlebot-Image<br /> <br /> User-agent: msnbot<br /> <br /> <br /> User-agent: URL_Spider_Pro<br /> Disallow: /<br /> <br /> User-agent: CherryPicker<br /> Disallow: /<br /> <br /> User-agent: EmailCollector<br /> Disallow: /<br /> <br /> User-agent: EmailSiphon<br /> Disallow: /<br /> <br /> User-agent: WebBandit<br /> Disallow: /<br /> <br /> User-agent: EmailWolf<br /> Disallow: /<br /> <br /> User-agent: ExtractorPro<br /> Disallow: /<br /> <br /> User-agent: CopyRightCheck<br /> Disallow: /<br /> <br /> User-agent: Crescent<br /> Disallow: /<br /> <br /> User-agent: SiteSnagger<br /> Disallow: /<br /> <br /> User-agent: ProWebWalker<br /> Disallow: /<br /> <br /> User-agent: CheeseBot<br /> Disallow: /<br /> <br /> User-agent: LNSpiderguy<br /> Disallow: /<br /> <br /> User-agent: Black Hole<br /> Disallow: /<br /> <br /> User-agent: Titan<br /> Disallow: /<br /> <br /> User-agent: WebStripper<br /> Disallow: /<br /> <br /> User-agent: NetMechanic<br /> Disallow: /<br /> <br /> User-agent: Wget<br /> Disallow: /<br /> <br /> User-agent: mozilla/4<br /> Disallow: /<br /> <br /> User-agent: mozilla/5<br /> Disallow: /<br /> <br /> User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)<br /> Disallow: /<br /> <br /> User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)<br /> Disallow: /<br /> <br /> User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98 )<br /> Disallow: /<br /> <br /> User-agent: Teleport<br /> Disallow: /<br /> <br /> User-agent: TeleportPro<br /> Disallow: /<br /> <br /> User-agent: MIIxpc<br /> Disallow: /<br /> <br /> User-agent: Telesoft<br /> Disallow: /<br /> <br /> User-agent: Website Quester<br /> Disallow: /<br /> <br /> User-agent: WebZip<br /> Disallow: /<br /> <br /> User-agent: moget/2.1<br /> Disallow: /<br /> <br /> User-agent: WebZip/4.0<br /> Disallow: /<br /> <br /> User-agent: WebSauger<br /> Disallow: /<br /> <br /> User-agent: WebCopier<br /> Disallow: /<br /> <br /> User-agent: NetAnts<br /> Disallow: /<br /> <br /> User-agent: Mister PiX<br /> Disallow: /<br /> <br /> User-agent: WebAuto<br /> Disallow: /<br /> <br /> User-agent: TheNomad<br /> Disallow: /<br /> <br /> User-agent: WWW-Collector-E<br /> Disallow: /<br /> <br /> User-agent: RMA<br /> Disallow: /<br /> <br /> User-agent: libWeb/clsHTTP<br /> Disallow: /<br /> <br /> User-agent: asterias<br /> Disallow: /<br /> <br /> User-agent: httplib<br /> Disallow: /<br /> <br /> User-agent: turingos<br /> Disallow: /<br /> <br /> User-agent: spanner<br /> Disallow: /<br /> <br /> User-agent: InfoNaviRobot<br /> Disallow: /<br /> <br /> User-agent: Harvest/1.5<br /> Disallow: /<br /> <br /> User-agent: Bullseye/1.0<br /> Disallow: /<br /> <br /> User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)<br /> Disallow: /<br /> <br /> User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0<br /> Disallow: /<br /> <br /> User-agent: CherryPickerSE/1.0<br /> Disallow: /<br /> <br /> User-agent: CherryPickerElite/1.0<br /> Disallow: /<br /> <br /> User-agent: WebBandit/3.50<br /> Disallow: /<br /> <br /> User-agent: NICErsPRO<br /> Disallow: /<br /> <br /> User-agent: Microsoft URL Control - 5.01.4511<br /> Disallow: /<br /> <br /> User-agent: DittoSpyder<br /> Disallow: /<br /> <br /> User-agent: Foobot<br /> Disallow: /<br /> <br /> User-agent: WebmasterWorldForumBot<br /> Disallow: /<br /> <br /> User-agent: SpankBot<br /> Disallow: /<br /> <br /> User-agent: BotALot<br /> Disallow: /<br /> <br /> User-agent: lwp-trivial/1.34<br /> Disallow: /<br /> <br /> User-agent: lwp-trivial<br /> Disallow: /<br /> <br /> User-agent: Wget/1.6<br /> Disallow: /<br /> <br /> User-agent: BunnySlippers<br /> Disallow: /<br /> <br /> User-agent: Microsoft URL Control - 6.00.8169<br /> Disallow: /<br /> <br /> User-agent: URLy Warning<br /> Disallow: /<br /> <br /> User-agent: Wget/1.5.3<br /> Disallow: /<br /> <br /> User-agent: LinkWalker<br /> Disallow: /<br /> <br /> User-agent: cosmos<br /> Disallow: /<br /> <br /> User-agent: moget<br /> Disallow: /<br /> <br /> User-agent: hloader<br /> Disallow: /<br /> <br /> User-agent: humanlinks<br /> Disallow: /<br /> <br /> User-agent: LinkextractorPro<br /> Disallow: /<br /> <br /> User-agent: Offline Explorer<br /> Disallow: /<br /> <br /> User-agent: Mata Hari<br /> Disallow: /<br /> <br /> User-agent: LexiBot<br /> Disallow: /<br /> <br /> User-agent: Web Image Collector<br /> Disallow: /<br /> <br /> User-agent: The Intraformant<br /> Disallow: /<br /> <br /> User-agent: True_Robot/1.0<br /> Disallow: /<br /> <br /> User-agent: True_Robot<br /> Disallow: /<br /> <br /> User-agent: BlowFish/1.0<br /> Disallow: /<br /> <br /> User-agent: JennyBot<br /> Disallow: /<br /> <br /> User-agent: MIIxpc/4.2<br /> Disallow: /<br /> <br /> User-agent: BuiltBotTough<br /> Disallow: /<br /> <br /> User-agent: ProPowerBot/2.14<br /> Disallow: /<br /> <br /> User-agent: BackDoorBot/1.0<br /> Disallow: /<br /> <br /> User-agent: toCrawl/UrlDispatcher<br /> Disallow: /<br /> <br /> User-agent: WebEnhancer<br /> Disallow: /<br /> <br /> User-agent: TightTwatBot<br /> Disallow: /<br /> <br /> User-agent: suzuran<br /> Disallow: /<br /> <br /> User-agent: VCI WebViewer VCI WebViewer Win32<br /> Disallow: /<br /> <br /> User-agent: VCI<br /> Disallow: /<br /> <br /> User-agent: Szukacz/1.4<br /> Disallow: /<br /> <br /> User-agent: QueryN Metasearch<br /> Disallow: /<br /> <br /> User-agent: Openfind data gathere<br /> Disallow: /<br /> <br /> User-agent: Openfind<br /> Disallow: /<br /> <br /> User-agent: Xenu's Link Sleuth 1.1c<br /> Disallow: /<br /> <br /> User-agent: Xenu's<br /> Disallow: /<br /> <br /> User-agent: Zeus<br /> Disallow: /<br /> <br /> User-agent: RepoMonkey Bait &amp; Tackle/v1.01<br /> Disallow: /<br /> <br /> User-agent: RepoMonkey<br /> Disallow: /<br /> <br /> User-agent: Zeus 32297 Webster Pro V2.9 Win32<br /> Disallow: /<br /> <br /> User-agent: Webster Pro<br /> Disallow: /<br /> <br /> User-agent: EroCrawler<br /> Disallow: /<br /> <br /> User-agent: LinkScan/8.1a Unix<br /> Disallow: /<br /> <br /> User-agent: Keyword Density/0.9<br /> Disallow: /<br /> <br /> User-agent: Kenjin Spider<br /> Disallow: /<br /> <br /> User-agent: Cegbfeieh<br /> Disallow: /<br /> <br /> [/color][/quote]]]></description>
				<guid isPermaLink="true">https://proxy2.de/forum/posts/preList/4083/12963.php</guid>
				<link>https://proxy2.de/forum/posts/preList/4083/12963.php</link>
				<pubDate><![CDATA[Sun, 9 Jan 2005 16:07:44]]> GMT</pubDate>
				<author><![CDATA[ JTD]]></author>
			</item>
			<item>
				<title></title>
				<description><![CDATA[ This serves little purpose. Email harvesters just ignore robot.txt files. I would also like to point out that Xenu is actually a rather nice program for checking that all the links on your site are valid.]]></description>
				<guid isPermaLink="true">https://proxy2.de/forum/posts/preList/4083/12964.php</guid>
				<link>https://proxy2.de/forum/posts/preList/4083/12964.php</link>
				<pubDate><![CDATA[Sun, 9 Jan 2005 16:35:00]]> GMT</pubDate>
				<author><![CDATA[ Carbonize]]></author>
			</item>
			<item>
				<title></title>
				<description><![CDATA[ Blocking the bad bots from .htaccess definitely works. Problem is there are so many, and not everyone has .htaccess.<br /> <br /> Block bad bots in .htaccess like this:<br /> [list]SetEnvIfNoCase User-Agent "^NaverBot" bad_bot<br /> SetEnvIfNoCase User-Agent "^TurnitinBot" bad_bot<br /> SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot<br /> SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot<br /> SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot<br /> SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot<br /> SetEnvIfNoCase User-Agent "^webcollage" bad_bot<br /> SetEnvIfNoCase User-Agent "^Port_Huron_Labs" bad_bot<br />  <br /> &lt;Limit GET POST&gt;<br /> Order Allow,Deny<br /> Allow from all<br /> Deny from env=bad_bot<br /> &lt;/Limit&gt;[/list]<br /> Add new bots to the top.  You don't have to use the bot's full name - first 3 letters will work.  Get bot names from your server stats.<br /> <br /> Everybody should block that Naverbot!<br /> <br /> Another alternative is the robot trap.  Bots that ignore the robots.txt file are caught in an endless loop.<br /> <br /> Info on building a robot trap:<br /> [list]How to keep bad robots, spiders and web crawlers away:  <a class="snap_shots" href="http://www.fleiner.com/bots/#trap" target="_blank" rel="nofollow">http://www.fleiner.com/bots/#trap</a><br /> <br /> A Rogue Robot Trap:  <a class="snap_shots" href="http://www.braemoor.co.uk/software/robottrap.shtml" target="_blank" rel="nofollow">http://www.braemoor.co.uk/software/robottrap.shtml</a><br /> <br /> How to build a Bot Trap and keep bad bots away from a web site: <a class="snap_shots" href="http://www.kloth.net/internet/bottrap.php" target="_blank" rel="nofollow">http://www.kloth.net/internet/bottrap.php</a>[/list]<br /> This site has a good forum about robots and they share info on the function of various bots.<br /> Here's their PHP version of the robot trap:  [list]http://www.webmasterworld.com/forum88/3104.htm[/list]]]></description>
				<guid isPermaLink="true">https://proxy2.de/forum/posts/preList/4083/12969.php</guid>
				<link>https://proxy2.de/forum/posts/preList/4083/12969.php</link>
				<pubDate><![CDATA[Sun, 9 Jan 2005 22:09:21]]> GMT</pubDate>
				<author><![CDATA[ amber222]]></author>
			</item>
			<item>
				<title></title>
				<description><![CDATA[ Well I also use .htaccess along with the robots.txt file. Double protection I guess or maybe paranoid. <img src="https://proxy2.de/forum//images/smilies/385970365b8ed7503b4294502a458efa.gif" />]]></description>
				<guid isPermaLink="true">https://proxy2.de/forum/posts/preList/4083/12970.php</guid>
				<link>https://proxy2.de/forum/posts/preList/4083/12970.php</link>
				<pubDate><![CDATA[Sun, 9 Jan 2005 22:33:01]]> GMT</pubDate>
				<author><![CDATA[ JTD]]></author>
			</item>
	</channel>
</rss>