Beamreactor Crawler Bots knowledge base

Amount of known bots: 237

Search Engine AI Crawler Social Media Monitoring Archiver Feed Reader SEO Tool Malicious Unknown

Search Engine
--Search Engine
0
AAfterAAfter looks like a legit search engine. ()Search Engine
0
AboutUsBotAboutUsBot is used by the About Us website to determine the contents, aspect, logo and owner of a website. It is legit.Search Engine
0
aiHitBotaiHitBot, seems to be a legit search engine. ()Search Engine
0
ia_archiverAlexa ia_archiver ()Search Engine
0
almadenalmaden, Unstructured Information Analysis and Search @ IBM ()Search Engine
0
ScooterAltavistaBot, Altavista ()Search Engine
0
aportAport ()Search Engine
0
appieAppie-spider/Walhello ()Search Engine
0
ApptusBotApptusBot is the Apptus crawler bot, some business driven search engine. ()Search Engine
0
Ask JeevesAsk Jeeves, Teoma ()Search Engine
0
askpeter_botAsk Peter is a German based search engine ()Search Engine
0
ASPseekASPSeek ()Search Engine
0
BaiduspiderBaidu search engine web crawler. Not always respectful of robots.txt, sometimes a bit pushy as well. ()Search Engine
0
BecomeBotBecomeBot shopping search ()Search Engine
0
BlazerBlazer Browser, Sharp Zaurus ()Search Engine
0
CatchBotCatchBot is a business page crawler. They claim to resell information for companies, academics and various professional fields. Bot behaves correctly.Search Engine
0
abby/Ellerdale determines trends through the semantic web, usually through gathering recent Twits or Facebook entries. ()Search Engine
0
ExaBotExlead Exabot ()Search Engine
0
facebookexternalhitFacebook External Hit ()Search Engine
0
fastFAST-WebCrawler ()Search Engine
0
sitedossier.comFeaturing sitedossier.com as a referer, the IP 69.71.222.186 seems to check for websites recently crawled by one of their competitors, domaintools. Seems harmless. ()Search Engine
0
feedfetcherFeedfetcher Google, gathers news feeds from websites ()Search Engine
0
Feedtrace-botFeedtrace-bot makes a list of the most popular twitter feeds (parses the most recent feeds all time round). ()Search Engine
0
ftxbrowserftxBrowser, Windows CE ()Search Engine
0
gaisGais ()Search Engine
0
GigabotGigablast's Gigabot ()Search Engine
0
MediapartnersGoogle AdSense ()Search Engine
0
Google DesktopGoogle Desktop is a desktop data manager/search. It should be harmless. ()Search Engine
0
googlebotGooglebot ()Search Engine
0
ichiroichiro @ Goo Japan / Inktomi ()Search Engine
0
IconSurfIcon Surf ()Search Engine
0
ICRA_Semantic_spiderICRA semantic spider, Internet Content Rating association. ()Search Engine
0
infoseekInfoSeek ()Search Engine
0
JS-KitJS-Kit is a blabla software for blogs. It usually connects here and there to promote their stuff through curiosity. For that reason no URL is provided here.Search Engine
0
Linguee BotLinguee Bot is a legit search engine bot. However it WILL get banned from your Beamreactor enabled website for its extreme crawling speed with the argument 'flood'. ()Search Engine
0
LinkWalkerLinkWalker ()Search Engine
0
grubLooksmart/Grub ()Search Engine
0
Mail.RUmail.ru (Поиск@mail.ru) is tied to the mail.ru search engine ()Search Engine
0
MJ12botMajestic-12 distributed search engine bot ()Search Engine
0
MetaQuerierMetaQuerier (University of Illinois in Urbana-Champaign) ()Search Engine
0
bingbotMicrosoft Bing ()Search Engine
0
Media Center PC 5.0Microsoft EnhanceIE enables Windows users to mod their useragents and emulate other user agents (!?) ()Search Engine
0
MLbotMLBot is a mp3/video crawler. The true purpose of MLbot is undisclosed but might be related to piracy protection. This robot is fairly clean. ()Search Engine
0
Yahoo-MMCrawlerMM Crawler, seeks for images on the www. ()Search Engine
0
MOT-A768Motorola A768 browser client. Might be fairly harmless.Search Engine
0
msnbotMSN Search Crawler ()Search Engine
0
MSTVMSTV WebTV ()Search Engine
0
MyIE2MyIE2 @ turkey?Search Engine
0
netcraftNetcraft ()Search Engine
0
NaverbotNHN Corp bot/Naver.com ()Search Engine
0
OcelliOcelli Engineering search ()Search Engine
0
OnetSzukajOnetSzukaj ()Search Engine
0
avantgoPalmOs AvantGo ()Search Engine
0
PSbotPicsearch web crawler ()Search Engine
0
pluckerPlucker Browser, Windows CE ()Search Engine
0
PlukkiePlukkie: a search engine robot, fairly harmless. ()Search Engine
0
pomposPompos ()Search Engine
0
Mooqsdfqs ()Search Engine
0
quepasacreepQuePasaCreep ()Search Engine
0
StackRamblerRambler search robot ()Search Engine
0
RSScacheRSS Cache website bandwith saver ()Search Engine
0
SapphireWebCrawlerSapphireWebCrawler crawls for a computer science project from Carnegie Mellon university.Search Engine
0
scrubbyscrubby ()Search Engine
0
ShimShim Crawler (University of Tokyo) ()Search Engine
0
slurpSlurp ()Search Engine
0
spbotspbot; "we just want to find out to which web pages you link to" ()Search Engine
0
Speedy SpiderSpeedy Spider is a part of the highly advanced search engine Entireweb.com, that was developed in Halmstad, Sweden during 1998-2000. ()Search Engine
0
sprooseSproose Crawler ()Search Engine
0
Apple-PubSubThe PubSub client is checking your RSS for an Apple computer owner! Don't remove or block this client/IP. ()Search Engine
0
bnf.fr_botThis robot comes from the National French Library. It makes a web archive of your website for various reasons and may, or may not respect robots.txt according to its settings. Harmless nonetheless. ()Search Engine
0
seznambotTied to the Seznam Czech search engine. ()Search Engine
0
turnitinTurn It In ()Search Engine
0
TwicelerTwiceler is the legit Cuil search engine crawler ()Search Engine
0
Twingly ReconTwingly Recon is a RSS parser, focused towards blogs. Usually triggered with syndication tools, such as facebook / twitter post third party APIs. ()Search Engine
0
TwitturlsTwitter URL parser. Someone linked your content to twitter. ()Search Engine
0
VagabondoVagabondo ()Search Engine
0
VideoSurf_botVideoSurf bot looks for videos. It uses social webs to parse URLs to visit, so its visit might be related to some of your website data being posted on Twitter or FBSearch Engine
0
VoilaBotVoilaBot is from the Voila search engine, owned by The "France Telecom - Orange" group. Basically harmless. ()Search Engine
0
JigsawW3C CSS validator - JFouffa ()Search Engine
0
W3C_ValidatorW3C Validator ()Search Engine
0
WMPWindows Media PlayerSearch Engine
0
Xenu Link SleuthXenu Link Sleuth validates your website for dead links ()Search Engine
0
Yahoo! MindsetYahoo Mindset ()Search Engine
0
YandexYandex. I at the end refers to search. H looks for mirror copies, P for images, F for favicons, D for Yandex declared websites, B for RSS ()Search Engine
0
YandexSomethingYandexSomething searches for news related RSS feeds for their news system. ()Search Engine
0
zyborgZyborg ()Search Engine
0
AI Crawler
ImagesiftBotAI image analysis crawler ()AI Crawler
2
Anthropic-AIAnthropic AI services general agent ()AI Crawler
0
Claude-WebAnthropic Claude web crawler ()AI Crawler
0
BytespiderByteDance (TikTok) AI crawler ()AI Crawler
2
CCBotCCBot, or CommonCrawl bot, claims since 2009 and like many that it'll bring interesting search content for xyz in a near future. It offers nothing but crawling stats. It's CC name misleads to "creative commons" but has NOTHING in common with it. ()AI Crawler
4
ChatGPT-UserChatGPT user agent for browsing ()AI Crawler
0
cohere-aiCohere AI language model crawler ()AI Crawler
1
DiffbotDiffbot AI extraction and analysis ()AI Crawler
1
OmgilibotOmgili conversation analysis bot ()AI Crawler
2
GPTBotOpenAI GPT crawler for AI training ()AI Crawler
0
PerplexityBotPerplexity AI search crawler ()AI Crawler
0
Scraper
MyIE2Scraper
5
_viewer--Scraper
6
<?'<?': Some script kiddo attempted to bypass your website securities through php injection.Scraper
6
aboundexAboundexbot claims to index websites and whilst it abides to robots.txt its activity remains doubtful. ()Scraper
5
America OnlineAmerica Online is a rather weak mockup of the AOL referer and hides a strong forum spammerScraper
6
synapseApache Synapse isn't documented. Frequently seen. Suspicious.Scraper
5
avantbrowswer.comAvant Browser Second Street Research @ Shawcable.net proxy, AB, CAScraper
5
BackStreetBackStreet BrowserScraper
5
BluecoatBluecoat DRTRScraper
6
Brick HouseBrick HouseScraper
6
calif univCalif Univ Tools @ btcentralplus.comScraper
6
Cam finderCam finderScraper
6
OrbiterDailyOrbit's Orbiter, supposed dead. Shouldnt crawl your web.Scraper
6
domainratioDomainratio bot belongs to a website that claims to "sort interesting websites", but is really a whois frontend with a lot of advertisement. ()Scraper
4
drupalDrupal web management ()Scraper
5
DaumEDI/Edacious & Intelligent Web RobotScraper
5
EmeraldShieldEmeraldShield.com web spider ()Scraper
4
FlashGetFlashGetScraper
6
FranklinFranklin Locator (eclipse.net.uk) @ XO Communications, Reston, VA, USScraper
5
FunWebProductsFunWebProducts enters the Adware/spyware category with their set of dirty toys from IWon. Their bot can be related to Mr Sputnik.Scraper
6
gqbigqbi hnxupsxgfgnX berXjteu (!)Scraper
5
http generichttp generic @ mchsi.com, Mediacom, NY, USScraper
5
HuasaiHuasai ignores robots.txt. It is an harvester, the purpose is unknown.Scraper
4
huaweisymantecHuawei Symantec; Chinese bot that claims to "fix websites security holes". It isn't in any way related to Symantec, and most probably a scam. ()Scraper
5
IE/4.0IE/4.0 @ mesh.ad.jp, JPScraper
5
intelium_botIntelium does respect robots.txt, doesn't flood, but is much too discrete to be considered safe.Scraper
6
Internet Explore 5.xInternet Explore 5.x @ Dynegy-Comm, Beijing, CNScraper
5
ISC SystemsISC Systems iRc Search 2.1Scraper
6
Java/Java/xxxx. Various users, usually used for cheap crawlers (hispeed.ch), more rarely for harmful actions.Scraper
4
Java1.3.1Java1.3.1 @ antelecom.netScraper
5
Java1.4.0_02Java1.4.0_02 @ Speakeasy.NET, USScraper
4
joboJoBo/1.3 @ Technikzentrum Luebeck tzl.de, DE ()Scraper
6
libwww-perllibwww-perlScraper
6
lingueeLinguee bot. Flooder ()Scraper
4
Mac FinderMac Finder 1.0.26 @ rr.comScraper
5
Mac_PowerMac_PowerScraper
5
mnoGoSearchmnoGoSearch ()Scraper
6
Mozilla(IE Compatible)Mozilla(IE Compatible) @ UNSX, RUScraper
6
indy libraryMozilla/3.0 (compatible; Indy Library) @ Bijing Gold, sina.com, CNScraper
5
MRSPUTNIKMr Sputnik is a strong adware/malware crap from IWon - maybe linked to hiyo.comScraper
6
nerdbynatureNerd By Nature indexes French and German websites. It is supposed to establish maps of links tied to a website, but does it in some unobvious way. ()Scraper
4
NextopiaNextopiaBOT ()Scraper
5
OmniExplorerOmni Explorer ()Scraper
5
PlantyNet_WebRobotPlantyNet Web Robot @ hinet.net, TWScraper
6
PoirotPoirotScraper
5
Program SharewareProgram SharewareScraper
5
RPT-HTTPClientRPT-HTTPClient/0.3-3Scraper
5
Search17BotSearch17Bot claims to be a semantic search engine. It is closed to the public, therefore might be anything.Scraper
4
second life lslSecond Life LSL. LSL is a programming language for the Second Life and OpenSIM game environments. Into the wrong hands and provided Linden Labs doesn't check for outgoing traffic contents, it can be used to flood, spam or seriously hit a website. ()Scraper
4
SiteBotSiteBot is a link collector. Provided its origin and customers aren't disclosed, we may consider it as privacy infringing or some cheap harvester. ()Scraper
4
HMSESpammerScraper
4
surveybotSurveyBot ()Scraper
6
teleport proTeleport pro @ interbusiness.it, ITScraper
6
tiehttpTiehttp: related to AskPeter, the tiehttp software has been developped as a freeware by Kyriacos Michael for the Delphi plateform. It is a free bot. Normally shouldn't be on your web. ()Scraper
4
W3CRobotW3CRobot/5.4.0 @ CommunityEngine, Tokyo, JPScraper
6
wbdbotWBD bot @ hostcasters.com, TX, USAScraper
6
WebDataCentreWebDataCentre is, according to their web yet another 'team of scientists' cruising the web with automated systems to reveal the future of the internet, or whatever.The bot ignores robots.txt, leeches full website content whenever it finds an (yet undetermined) trigger word, otherwise goes away after hitting the website homepage. ()Scraper
5
WebDAVWebDAV shared server document editor for Excel ()Scraper
6
WGetWebGet "Multi-Threaded File Downloader". It leeches your content. ()Scraper
6
WEP searchWEP search 00 @ rr.com, USAScraper
6
lwpwp-trivial/1.32 & LWP::Simple/5.48 @ OLM Llc, Lisle, IL, USScraper
4
wsownerWSOwner is a poorly maintained PHP crawler tied to a broken website. ()Scraper
6
Zeus 2.6Zeus 2.6 @ Dynegy-Comm, Beijing, CNScraper
5
Malware
PsychecloneMalware
9
8484 boston--Malware
7
cerberianCerberian drtrs @ TW ()Malware
7
core-projectcore-project/1.0 frontpage exploiterMalware
8
DataCha0sDataCha0sMalware
8
DigExtDig ExtenseMalware
7
GalaxyBotGalaxyBot/1.0 (http://www.galaxy.com/galaxybot.html) @ Logika Corp, Chicago, IL, US ()Malware
9
GetRightGetRight ()Malware
8
google_three_webGoogle_three_web is.. Not google, obviously. Related to #*$! viewer, probably other '_viewers' using Larbin. Ignores robots.txt, insists on trying to reach documents forbidden by robots.txtMalware
8
Green Research, incGreen research, inc [ Nigerian 419-scam email ] @ linkserve Nigeria, linkserve.com.ng, NGMalware
8
metabothuman-guided@lerly.net @ Cogent Co, DC, USMalware
9
IPiumBotIPiumBot laurion(dot)com @ CommunityEngine, Tokyo, JPMalware
8
JikeSpiderJike is a very doubtful chinese crawler, tied to fishing and spywares ()Malware
7
LachesisLachesis @ NEC Research Inst. Corp., Princeton, NJ, USMalware
8
URL controlMicrosoft URL Control - 6.00.8862 ()Malware
8
chartercom.comMicrosoft URL Control - 6.00.8862 @ chartercom.comMalware
9
MissiguaMissigua Locator 1.9Malware
7
NetResearchServerNetResearchServer/2.7 @ RNCI New Media, Pittsburgh, PA, US. Theoretically bankrupt.Malware
8
nhnbotNHNbot@naver.com, KRMalware
7
Offline Explorer/([0-9].[0-9]{OfflineExplorerMalware
8
openfindOpenfind/Openbot 3.0+ (robot-response@openfind.com.tw) @ OpenFind.com.tw, HINET-NET, CHTD, TW ()Malware
7
PHP version trackerPHP version trackerMalware
9
Port Huron LabsPort Huron Labs @ cox.net, GA, USAMalware
7
PurebotPurebot, malicious Content Scraper and Spam Agent, rule breaker. ()Malware
7
river valleyRiver Valley inc @ cox.net, GA, USAMalware
7
SnapbotSnapbotMalware
8
TopBlogsInfoTopBlogsInfo is a spammer, potientally harmful.Malware
7
URL_Spider_SQLURL_Spider_SQL/1.0Malware
9
WebCopierwebcopier @ Hugues Network Systems / HOT, hns.com, DEMalware
8
webdupWebdup/0.9 @ Chinanet-BJ Beijing Province network, Beijing, CNMalware
9
wellsWells search @ NLMalware
8
yangaYanga WorldSearch is a dangerous personal data harvester. They are probably related to identity theft.Malware
9
fimapYet another free - yet - dangerous software that leads to catastrophes in the wrong hands: python tool that find, prepare, audit, exploit & google automaticly for local and remote file inclusion bugs in webapps. Someone wants to upload crap into your web. Usually just visiting, multiple hits reveals a clear attempt to ruin your website and should be monitored carefully. ()Malware
9
Legitimate
AddThis.comAddThisCrawler, tied to the "Add This" social network plugin. ()Legitimate
2
ahrefsAhrefs indexes the links of websites. It doesn't abide to robots.txt. ()Legitimate
1
arste.infoarste.info, related to AskPeter.info, probably running the tiehttp software, crawls for a cheap search engine from Germany. Doesn't abide to robots.txt.Legitimate
1
BlackBerryBlackBerry device browser. Normally not a threat. ()Legitimate
1
DotBotDotBot claims to be making a structure display of the web. Whilst fairly opened about it, it is unverifiable, hence the level 1 rank. ()Legitimate
1
findlinksFind Links ()Legitimate
1
HTTPRetrieverHTTP Retriever PHP classLegitimate
2
InfoPath.1InfoPath is a Microsoft web environment/framework normally not supposed to reach the web (most generally limited to a LAN/WAN). Whilst not a big threat, it is still doubtful ()Legitimate
2
Jakarta CommonsJakarta Commons Java HTTP client ()Legitimate
2
LipperheyLipperhey usually crawls websites to advertise their SEO tools ()Legitimate
1
mAgentmAgent is an adware at the user browser levelLegitimate
1
MyFamilyBotMyFamilyBot ()Legitimate
1
NutchNutch robot software @ Apache ()Legitimate
2
page_verifierpage_verifier is Secure Computings anti malware ()Legitimate
1
ParchBotParchBot is a robot from Parchment Hill supposedly optimised to provide websites instead of webpages during a search. Bot is currently down (03/2008) ()Legitimate
2
PingdomPingdom Website monitoring ()Legitimate
2
ShareThisFetchShareThisFetch. Probably linked to the twitter or facebook API. Respects robots.txt and only parses documents.Legitimate
1
SheenBotSheenBot belongs to Amazon web services (cloud computing). The behaviour can be mixed, but it is most often harmless. ()Legitimate
1
SogouSogou web crawler is dirty, it systematically ignores robots.txt (although seems to parse it). Sogou is otherwise a legit search engine.Legitimate
1
SteelerSteeler ()Legitimate
1
SzukaczSzukacz ()Legitimate
1
ezoomsThere's very few information towards Ezooms. It behaves correctly, abides to robots.txt and doesn't flood websites.Legitimate
2
WebVulnCrawlWebVulnCrawl.blogspot.com ()Legitimate
2
wotboxWotbox complies to robots.txt, and doesn't flood. It is tied to an obscure search engine. ()Legitimate
2
Unknown
CMS SurveyCMS Survey belongs to punkt.de, a CMS creators company. This robot seems to be "visiting" their competitors. It follows robots.txt, but comes with no explanation. ()Unknown
3
compspybotCompSpyBot - Competitive Spying and Scraping This robot is probably a joke made by some bored wanna be James Bond or content leecher. It does seem to abide by robots.txt. ()Unknown
3
F-Bot test pilotf-bot test @ pilotad.jpUnknown
3
larbinLarbin is a multi purpose bot. Usually not too critical.Unknown
3
NaverRobotminibot(NaverRobot) @ KORNET-NETINFRA-JUNGANG-KR, Seoul, KRUnknown
3
Missouri CollegeMissouri College Browse @ Sprint DSL-Net, sprint-hsd.net, KS, USUnknown
3
natcrawlnatcrawler (France Telecom Interactive, Orange, Voila, Voilachat, etc) often misses robots.txt. Worst, it can suddenly LOOP over the very same page tenths of thousand times, many examples of this behavior can be find on the web.Unknown
3
PycURLPycURL, Python interface for cURL, might be good or bad news, syndication related or SEO related, or a crawler. ()Unknown
3
Python-urllibPython-urllib is used by the Python high level language to "open arbitrary resources by URL" ()Unknown
3
robotgeniusRobotgenius is supposed to monitor company PCs over the www. ()Unknown
3
ScoutJetScoutJet is the web crawler for blekko, a new Silicon Valley based search engine. Seems ok, with interesting leaders. However, this search engine is down permanently for months and this raises their RR ()Unknown
3
Second StreetSecond Street Research @ Shawcable.net proxy, AB, CAUnknown
3
SISTRIXSistrix is a German based private SEO engine. It crawls at a very high speed and triggers the Beamreactor anti flood protection. ()Unknown
3
sitecheckSitecheck ()Unknown
3
SBiderSitesell statistics robot ()Unknown
3
SyntryxSyntryx ANT Scout Chassis Pheromone ()Unknown
3
T-H-U-N-D-E-R-S-T-O-N-ET-H-U-N-D-E-R-S-T-O-N-E is a free web crawler. Also refers to 'webinator'. Might or might not be dangerous, depending of the use made of it by script kids. ()Unknown
3
T312461T312461 UNKNOWN BOT @ Corex technologiesUnknown
3
TuringOSTuringOS anonymizerUnknown
3