| | | |
|---|
| -- | Search Engine | |
AAfter | AAfter looks like a legit search engine. () | Search Engine | |
AboutUsBot | AboutUsBot is used by the About Us website to determine the contents, aspect, logo and owner of a website. It is legit. | Search Engine | |
aiHitBot | aiHitBot, seems to be a legit search engine. () | Search Engine | |
ia_archiver | Alexa ia_archiver () | Search Engine | |
almaden | almaden, Unstructured Information Analysis and Search @ IBM () | Search Engine | |
Scooter | AltavistaBot, Altavista () | Search Engine | |
aport | Aport () | Search Engine | |
appie | Appie-spider/Walhello () | Search Engine | |
ApptusBot | ApptusBot is the Apptus crawler bot, some business driven search engine. () | Search Engine | |
Ask Jeeves | Ask Jeeves, Teoma () | Search Engine | |
askpeter_bot | Ask Peter is a German based search engine () | Search Engine | |
ASPseek | ASPSeek () | Search Engine | |
Baiduspider | Baidu search engine web crawler. Not always respectful of robots.txt, sometimes a bit pushy as well. () | Search Engine | |
BecomeBot | BecomeBot shopping search () | Search Engine | |
Blazer | Blazer Browser, Sharp Zaurus () | Search Engine | |
CatchBot | CatchBot is a business page crawler. They claim to resell information for companies, academics and various professional fields. Bot behaves correctly. | Search Engine | |
abby/ | Ellerdale determines trends through the semantic web, usually through gathering recent Twits or Facebook entries. () | Search Engine | |
ExaBot | Exlead Exabot () | Search Engine | |
facebookexternalhit | Facebook External Hit () | Search Engine | |
fast | FAST-WebCrawler () | Search Engine | |
sitedossier.com | Featuring sitedossier.com as a referer, the IP 69.71.222.186 seems to check for websites recently crawled by one of their competitors, domaintools. Seems harmless. () | Search Engine | |
feedfetcher | Feedfetcher Google, gathers news feeds from websites () | Search Engine | |
Feedtrace-bot | Feedtrace-bot makes a list of the most popular twitter feeds (parses the most recent feeds all time round). () | Search Engine | |
ftxbrowser | ftxBrowser, Windows CE () | Search Engine | |
gais | Gais () | Search Engine | |
Gigabot | Gigablast's Gigabot () | Search Engine | |
Mediapartners | Google AdSense () | Search Engine | |
Google Desktop | Google Desktop is a desktop data manager/search. It should be harmless. () | Search Engine | |
googlebot | Googlebot () | Search Engine | |
ichiro | ichiro @ Goo Japan / Inktomi () | Search Engine | |
IconSurf | Icon Surf () | Search Engine | |
ICRA_Semantic_spider | ICRA semantic spider, Internet Content Rating association. () | Search Engine | |
infoseek | InfoSeek () | Search Engine | |
JS-Kit | JS-Kit is a blabla software for blogs. It usually connects here and there to promote their stuff through curiosity. For that reason no URL is provided here. | Search Engine | |
Linguee Bot | Linguee Bot is a legit search engine bot. However it WILL get banned from your Beamreactor enabled website for its extreme crawling speed with the argument 'flood'. () | Search Engine | |
LinkWalker | LinkWalker () | Search Engine | |
grub | Looksmart/Grub () | Search Engine | |
Mail.RU | mail.ru (Поиск@mail.ru) is tied to the mail.ru search engine () | Search Engine | |
MJ12bot | Majestic-12 distributed search engine bot () | Search Engine | |
MetaQuerier | MetaQuerier (University of Illinois in Urbana-Champaign) () | Search Engine | |
bingbot | Microsoft Bing () | Search Engine | |
Media Center PC 5.0 | Microsoft EnhanceIE enables Windows users to mod their useragents and emulate other user agents (!?) () | Search Engine | |
MLbot | MLBot is a mp3/video crawler. The true purpose of MLbot is undisclosed but might be related to piracy protection. This robot is fairly clean. () | Search Engine | |
Yahoo-MMCrawler | MM Crawler, seeks for images on the www. () | Search Engine | |
MOT-A768 | Motorola A768 browser client. Might be fairly harmless. | Search Engine | |
msnbot | MSN Search Crawler () | Search Engine | |
MSTV | MSTV WebTV () | Search Engine | |
MyIE2 | MyIE2 @ turkey? | Search Engine | |
netcraft | Netcraft () | Search Engine | |
Naverbot | NHN Corp bot/Naver.com () | Search Engine | |
Ocelli | Ocelli Engineering search () | Search Engine | |
OnetSzukaj | OnetSzukaj () | Search Engine | |
avantgo | PalmOs AvantGo () | Search Engine | |
PSbot | Picsearch web crawler () | Search Engine | |
plucker | Plucker Browser, Windows CE () | Search Engine | |
Plukkie | Plukkie: a search engine robot, fairly harmless. () | Search Engine | |
pompos | Pompos () | Search Engine | |
Moo | qsdfqs () | Search Engine | |
quepasacreep | QuePasaCreep () | Search Engine | |
StackRambler | Rambler search robot () | Search Engine | |
RSScache | RSS Cache website bandwith saver () | Search Engine | |
SapphireWebCrawler | SapphireWebCrawler crawls for a computer science project from Carnegie Mellon university. | Search Engine | |
scrubby | scrubby () | Search Engine | |
Shim | Shim Crawler (University of Tokyo) () | Search Engine | |
slurp | Slurp () | Search Engine | |
spbot | spbot; "we just want to find out to which web pages you link to" () | Search Engine | |
Speedy Spider | Speedy Spider is a part of the highly advanced search engine Entireweb.com, that was developed in Halmstad, Sweden during 1998-2000. () | Search Engine | |
sproose | Sproose Crawler () | Search Engine | |
Apple-PubSub | The PubSub client is checking your RSS for an Apple computer owner! Don't remove or block this client/IP. () | Search Engine | |
bnf.fr_bot | This robot comes from the National French Library. It makes a web archive of your website for various reasons and may, or may not respect robots.txt according to its settings. Harmless nonetheless. () | Search Engine | |
seznambot | Tied to the Seznam Czech search engine. () | Search Engine | |
turnitin | Turn It In () | Search Engine | |
Twiceler | Twiceler is the legit Cuil search engine crawler () | Search Engine | |
Twingly Recon | Twingly Recon is a RSS parser, focused towards blogs. Usually triggered with syndication tools, such as facebook / twitter post third party APIs. () | Search Engine | |
Twitturls | Twitter URL parser. Someone linked your content to twitter. () | Search Engine | |
Vagabondo | Vagabondo () | Search Engine | |
VideoSurf_bot | VideoSurf bot looks for videos. It uses social webs to parse URLs to visit, so its visit might be related to some of your website data being posted on Twitter or FB | Search Engine | |
VoilaBot | VoilaBot is from the Voila search engine, owned by The "France Telecom - Orange" group. Basically harmless. () | Search Engine | |
Jigsaw | W3C CSS validator - JFouffa () | Search Engine | |
W3C_Validator | W3C Validator () | Search Engine | |
WMP | Windows Media Player | Search Engine | |
Xenu Link Sleuth | Xenu Link Sleuth validates your website for dead links () | Search Engine | |
Yahoo! Mindset | Yahoo Mindset () | Search Engine | |
Yandex | Yandex. I at the end refers to search. H looks for mirror copies, P for images, F for favicons, D for Yandex declared websites, B for RSS () | Search Engine | |
YandexSomething | YandexSomething searches for news related RSS feeds for their news system. () | Search Engine | |
zyborg | Zyborg () | Search Engine | |
ImagesiftBot | AI image analysis crawler () | AI Crawler | |
Anthropic-AI | Anthropic AI services general agent () | AI Crawler | |
Claude-Web | Anthropic Claude web crawler () | AI Crawler | |
Bytespider | ByteDance (TikTok) AI crawler () | AI Crawler | |
CCBot | CCBot, or CommonCrawl bot, claims since 2009 and like many that it'll bring interesting search content for xyz in a near future. It offers nothing but crawling stats. It's CC name misleads to "creative commons" but has NOTHING in common with it. () | AI Crawler | |
ChatGPT-User | ChatGPT user agent for browsing () | AI Crawler | |
cohere-ai | Cohere AI language model crawler () | AI Crawler | |
Diffbot | Diffbot AI extraction and analysis () | AI Crawler | |
Omgilibot | Omgili conversation analysis bot () | AI Crawler | |
GPTBot | OpenAI GPT crawler for AI training () | AI Crawler | |
PerplexityBot | Perplexity AI search crawler () | AI Crawler | |
MyIE2 | | Scraper | |
_viewer | -- | Scraper | |
<? | '<?': Some script kiddo attempted to bypass your website securities through php injection. | Scraper | |
aboundex | Aboundexbot claims to index websites and whilst it abides to robots.txt its activity remains doubtful. () | Scraper | |
America Online | America Online is a rather weak mockup of the AOL referer and hides a strong forum spammer | Scraper | |
synapse | Apache Synapse isn't documented. Frequently seen. Suspicious. | Scraper | |
avantbrowswer.com | Avant Browser Second Street Research @ Shawcable.net proxy, AB, CA | Scraper | |
BackStreet | BackStreet Browser | Scraper | |
Bluecoat | Bluecoat DRTR | Scraper | |
Brick House | Brick House | Scraper | |
calif univ | Calif Univ Tools @ btcentralplus.com | Scraper | |
Cam finder | Cam finder | Scraper | |
Orbiter | DailyOrbit's Orbiter, supposed dead. Shouldnt crawl your web. | Scraper | |
domainratio | Domainratio bot belongs to a website that claims to "sort interesting websites", but is really a whois frontend with a lot of advertisement. () | Scraper | |
drupal | Drupal web management () | Scraper | |
Daum | EDI/Edacious & Intelligent Web Robot | Scraper | |
EmeraldShield | EmeraldShield.com web spider () | Scraper | |
FlashGet | FlashGet | Scraper | |
Franklin | Franklin Locator (eclipse.net.uk) @ XO Communications, Reston, VA, US | Scraper | |
FunWebProducts | FunWebProducts enters the Adware/spyware category with their set of dirty toys from IWon. Their bot can be related to Mr Sputnik. | Scraper | |
gqbi | gqbi hnxupsxgfgnX berXjteu (!) | Scraper | |
http generic | http generic @ mchsi.com, Mediacom, NY, US | Scraper | |
Huasai | Huasai ignores robots.txt. It is an harvester, the purpose is unknown. | Scraper | |
huaweisymantec | Huawei Symantec; Chinese bot that claims to "fix websites security holes". It isn't in any way related to Symantec, and most probably a scam. () | Scraper | |
IE/4.0 | IE/4.0 @ mesh.ad.jp, JP | Scraper | |
intelium_bot | Intelium does respect robots.txt, doesn't flood, but is much too discrete to be considered safe. | Scraper | |
Internet Explore 5.x | Internet Explore 5.x @ Dynegy-Comm, Beijing, CN | Scraper | |
ISC Systems | ISC Systems iRc Search 2.1 | Scraper | |
Java/ | Java/xxxx. Various users, usually used for cheap crawlers (hispeed.ch), more rarely for harmful actions. | Scraper | |
Java1.3.1 | Java1.3.1 @ antelecom.net | Scraper | |
Java1.4.0_02 | Java1.4.0_02 @ Speakeasy.NET, US | Scraper | |
jobo | JoBo/1.3 @ Technikzentrum Luebeck tzl.de, DE () | Scraper | |
libwww-perl | libwww-perl | Scraper | |
linguee | Linguee bot. Flooder () | Scraper | |
Mac Finder | Mac Finder 1.0.26 @ rr.com | Scraper | |
Mac_Power | Mac_Power | Scraper | |
mnoGoSearch | mnoGoSearch () | Scraper | |
Mozilla(IE Compatible) | Mozilla(IE Compatible) @ UNSX, RU | Scraper | |
indy library | Mozilla/3.0 (compatible; Indy Library) @ Bijing Gold, sina.com, CN | Scraper | |
MRSPUTNIK | Mr Sputnik is a strong adware/malware crap from IWon - maybe linked to hiyo.com | Scraper | |
nerdbynature | Nerd By Nature indexes French and German websites. It is supposed to establish maps of links tied to a website, but does it in some unobvious way. () | Scraper | |
Nextopia | NextopiaBOT () | Scraper | |
OmniExplorer | Omni Explorer () | Scraper | |
PlantyNet_WebRobot | PlantyNet Web Robot @ hinet.net, TW | Scraper | |
Poirot | Poirot | Scraper | |
Program Shareware | Program Shareware | Scraper | |
RPT-HTTPClient | RPT-HTTPClient/0.3-3 | Scraper | |
Search17Bot | Search17Bot claims to be a semantic search engine. It is closed to the public, therefore might be anything. | Scraper | |
second life lsl | Second Life LSL. LSL is a programming language for the Second Life and OpenSIM game environments. Into the wrong hands and provided Linden Labs doesn't check for outgoing traffic contents, it can be used to flood, spam or seriously hit a website. () | Scraper | |
SiteBot | SiteBot is a link collector. Provided its origin and customers aren't disclosed, we may consider it as privacy infringing or some cheap harvester. () | Scraper | |
HMSE | Spammer | Scraper | |
surveybot | SurveyBot () | Scraper | |
teleport pro | Teleport pro @ interbusiness.it, IT | Scraper | |
tiehttp | Tiehttp: related to AskPeter, the tiehttp software has been developped as a freeware by Kyriacos Michael for the Delphi plateform. It is a free bot. Normally shouldn't be on your web. () | Scraper | |
W3CRobot | W3CRobot/5.4.0 @ CommunityEngine, Tokyo, JP | Scraper | |
wbdbot | WBD bot @ hostcasters.com, TX, USA | Scraper | |
WebDataCentre | WebDataCentre is, according to their web yet another 'team of scientists' cruising the web with automated systems to reveal the future of the internet, or whatever.The bot ignores robots.txt, leeches full website content whenever it finds an (yet undetermined) trigger word, otherwise goes away after hitting the website homepage. () | Scraper | |
WebDAV | WebDAV shared server document editor for Excel () | Scraper | |
WGet | WebGet "Multi-Threaded File Downloader". It leeches your content. () | Scraper | |
WEP search | WEP search 00 @ rr.com, USA | Scraper | |
lwp | wp-trivial/1.32 & LWP::Simple/5.48 @ OLM Llc, Lisle, IL, US | Scraper | |
wsowner | WSOwner is a poorly maintained PHP crawler tied to a broken website. () | Scraper | |
Zeus 2.6 | Zeus 2.6 @ Dynegy-Comm, Beijing, CN | Scraper | |
Psycheclone | | Malware | |
8484 boston | -- | Malware | |
cerberian | Cerberian drtrs @ TW () | Malware | |
core-project | core-project/1.0 frontpage exploiter | Malware | |
DataCha0s | DataCha0s | Malware | |
DigExt | Dig Extense | Malware | |
GalaxyBot | GalaxyBot/1.0 (http://www.galaxy.com/galaxybot.html) @ Logika Corp, Chicago, IL, US () | Malware | |
GetRight | GetRight () | Malware | |
google_three_web | Google_three_web is.. Not google, obviously. Related to #*$! viewer, probably other '_viewers' using Larbin. Ignores robots.txt, insists on trying to reach documents forbidden by robots.txt | Malware | |
Green Research, inc | Green research, inc [ Nigerian 419-scam email ] @ linkserve Nigeria, linkserve.com.ng, NG | Malware | |
metabot | human-guided@lerly.net @ Cogent Co, DC, US | Malware | |
IPiumBot | IPiumBot laurion(dot)com @ CommunityEngine, Tokyo, JP | Malware | |
JikeSpider | Jike is a very doubtful chinese crawler, tied to fishing and spywares () | Malware | |
Lachesis | Lachesis @ NEC Research Inst. Corp., Princeton, NJ, US | Malware | |
URL control | Microsoft URL Control - 6.00.8862 () | Malware | |
chartercom.com | Microsoft URL Control - 6.00.8862 @ chartercom.com | Malware | |
Missigua | Missigua Locator 1.9 | Malware | |
NetResearchServer | NetResearchServer/2.7 @ RNCI New Media, Pittsburgh, PA, US. Theoretically bankrupt. | Malware | |
nhnbot | NHNbot@naver.com, KR | Malware | |
Offline Explorer/([0-9].[0-9]{ | OfflineExplorer | Malware | |
openfind | Openfind/Openbot 3.0+ (robot-response@openfind.com.tw) @ OpenFind.com.tw, HINET-NET, CHTD, TW () | Malware | |
PHP version tracker | PHP version tracker | Malware | |
Port Huron Labs | Port Huron Labs @ cox.net, GA, USA | Malware | |
Purebot | Purebot, malicious Content Scraper and Spam Agent, rule breaker. () | Malware | |
river valley | River Valley inc @ cox.net, GA, USA | Malware | |
Snapbot | Snapbot | Malware | |
TopBlogsInfo | TopBlogsInfo is a spammer, potientally harmful. | Malware | |
URL_Spider_SQL | URL_Spider_SQL/1.0 | Malware | |
WebCopier | webcopier @ Hugues Network Systems / HOT, hns.com, DE | Malware | |
webdup | Webdup/0.9 @ Chinanet-BJ Beijing Province network, Beijing, CN | Malware | |
wells | Wells search @ NL | Malware | |
yanga | Yanga WorldSearch is a dangerous personal data harvester. They are probably related to identity theft. | Malware | |
fimap | Yet another free - yet - dangerous software that leads to catastrophes in the wrong hands: python tool that find, prepare, audit, exploit & google automaticly for local and remote file inclusion bugs in webapps. Someone wants to upload crap into your web. Usually just visiting, multiple hits reveals a clear attempt to ruin your website and should be monitored carefully. () | Malware | |
AddThis.com | AddThisCrawler, tied to the "Add This" social network plugin. () | Legitimate | |
ahrefs | Ahrefs indexes the links of websites. It doesn't abide to robots.txt. () | Legitimate | |
arste.info | arste.info, related to AskPeter.info, probably running the tiehttp software, crawls for a cheap search engine from Germany. Doesn't abide to robots.txt. | Legitimate | |
BlackBerry | BlackBerry device browser. Normally not a threat. () | Legitimate | |
DotBot | DotBot claims to be making a structure display of the web. Whilst fairly opened about it, it is unverifiable, hence the level 1 rank. () | Legitimate | |
findlinks | Find Links () | Legitimate | |
HTTPRetriever | HTTP Retriever PHP class | Legitimate | |
InfoPath.1 | InfoPath is a Microsoft web environment/framework normally not supposed to reach the web (most generally limited to a LAN/WAN). Whilst not a big threat, it is still doubtful () | Legitimate | |
Jakarta Commons | Jakarta Commons Java HTTP client () | Legitimate | |
Lipperhey | Lipperhey usually crawls websites to advertise their SEO tools () | Legitimate | |
mAgent | mAgent is an adware at the user browser level | Legitimate | |
MyFamilyBot | MyFamilyBot () | Legitimate | |
Nutch | Nutch robot software @ Apache () | Legitimate | |
page_verifier | page_verifier is Secure Computings anti malware () | Legitimate | |
ParchBot | ParchBot is a robot from Parchment Hill supposedly optimised to provide websites instead of webpages during a search. Bot is currently down (03/2008) () | Legitimate | |
Pingdom | Pingdom Website monitoring () | Legitimate | |
ShareThisFetch | ShareThisFetch. Probably linked to the twitter or facebook API. Respects robots.txt and only parses documents. | Legitimate | |
SheenBot | SheenBot belongs to Amazon web services (cloud computing). The behaviour can be mixed, but it is most often harmless. () | Legitimate | |
Sogou | Sogou web crawler is dirty, it systematically ignores robots.txt (although seems to parse it). Sogou is otherwise a legit search engine. | Legitimate | |
Steeler | Steeler () | Legitimate | |
Szukacz | Szukacz () | Legitimate | |
ezooms | There's very few information towards Ezooms. It behaves correctly, abides to robots.txt and doesn't flood websites. | Legitimate | |
WebVulnCrawl | WebVulnCrawl.blogspot.com () | Legitimate | |
wotbox | Wotbox complies to robots.txt, and doesn't flood. It is tied to an obscure search engine. () | Legitimate | |
CMS Survey | CMS Survey belongs to punkt.de, a CMS creators company. This robot seems to be "visiting" their competitors. It follows robots.txt, but comes with no explanation. () | Unknown | |
compspybot | CompSpyBot - Competitive Spying and Scraping
This robot is probably a joke made by some bored wanna be James Bond or content leecher. It does seem to abide by robots.txt. () | Unknown | |
F-Bot test pilot | f-bot test @ pilotad.jp | Unknown | |
larbin | Larbin is a multi purpose bot. Usually not too critical. | Unknown | |
NaverRobot | minibot(NaverRobot) @ KORNET-NETINFRA-JUNGANG-KR, Seoul, KR | Unknown | |
Missouri College | Missouri College Browse @ Sprint DSL-Net, sprint-hsd.net, KS, US | Unknown | |
natcrawl | natcrawler (France Telecom Interactive, Orange, Voila, Voilachat, etc) often misses robots.txt. Worst, it can suddenly LOOP over the very same page tenths of thousand times, many examples of this behavior can be find on the web. | Unknown | |
PycURL | PycURL, Python interface for cURL, might be good or bad news, syndication related or SEO related, or a crawler. () | Unknown | |
Python-urllib | Python-urllib is used by the Python high level language to "open arbitrary resources by URL" () | Unknown | |
robotgenius | Robotgenius is supposed to monitor company PCs over the www. () | Unknown | |
ScoutJet | ScoutJet is the web crawler for blekko, a new Silicon Valley based search engine. Seems ok, with interesting leaders. However, this search engine is down permanently for months and this raises their RR () | Unknown | |
Second Street | Second Street Research @ Shawcable.net proxy, AB, CA | Unknown | |
SISTRIX | Sistrix is a German based private SEO engine. It crawls at a very high speed and triggers the Beamreactor anti flood protection. () | Unknown | |
sitecheck | Sitecheck () | Unknown | |
SBider | Sitesell statistics robot () | Unknown | |
Syntryx | Syntryx ANT Scout Chassis Pheromone () | Unknown | |
T-H-U-N-D-E-R-S-T-O-N-E | T-H-U-N-D-E-R-S-T-O-N-E is a free web crawler. Also refers to 'webinator'. Might or might not be dangerous, depending of the use made of it by script kids. () | Unknown | |
T312461 | T312461 UNKNOWN BOT @ Corex technologies | Unknown | |
TuringOS | TuringOS anonymizer | Unknown | |