Agents
All Agent Types
Every known artificial agent (bot) on the internet. You can track their activity on your website with Agent Analytics, or control their behavior with Automatic Robots.txt.
AI Data Scrapers
CloudVertexBot
CloudVertexBot is a Google-operated crawler available to site owners to request targeted crawls of their own sites for AI training purposes on the Vertex AI platform.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
cohere-training-data-crawler
cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
Cotoyogi
Cotoyogi is a research crawler operated by Japan's Research Organization of Information and Systems that collects web content to build AI training datasets for research p…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
Datenbank Crawler
Datenbank Crawler is a web crawler operated by German company netEstate used for collecting and selling international website data.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
FacebookBot
FacebookBot is a web crawler used by Meta to download training data for its AI speech recognition technology.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
Google-Extended
Google-Extended is a web crawler used by Google to download AI training content for its AI products like the Gemini assistant and its Vertex AI generative APIs.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
GoogleOther
GoogleOther is Google's generic crawler used by various product teams for fetching publicly accessible content, including one-off crawls for internal research and develop…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
GPTBot
GPTBot is OpenAI's web crawler that collects data from publicly accessible web pages to improve AI models like ChatGPT, while respecting robots.txt and opt-out preference…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
ICC-Crawler
ICC-Crawler is NICT's research crawler that automatically collects web pages from the Internet for academic research at Japan's National Institute of Information and Comm…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
imageSpider
imageSpider is a web crawler operated by ByteDance, the company behind TikTok, Douyin, and other content platforms. The bot collects images from websites across the inter…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
Kangaroo Bot
Kangaroo Bot is used by the company Kangaroo LLM to download data to train open source AI models tailored to Australian language and culture.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
laion-huggingface-processor
LAION-huggingface-processor is a web crawler operated by LAION (Large-scale Artificial Intelligence Open Network), a non-profit organization that creates open datasets fo…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
LCC
LCC is a web crawler operated by the University of Leipzig that collects text data from websites to build large-scale linguistic corpora for research purposes. The bot ga…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
meta-externalagent
meta-externalagent crawls web content for training AI models and improving Meta's products by indexing content directly across the internet.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
netEstate Imprint Crawler
netEstate Imprint Crawler is an AI data scraper operated by netEstate. If you think this is incorrect or can provide additional detail about its purpose, please let us kn…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
omgili
omgili is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
PanguBot
PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
SBIntuitionsBot
SBIntuitionsBot is a web crawler operated by SB Intuitions, a Japanese company that develops generative language models optimized for the Japanese language and culture. T…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
Spider
Spider is a web crawler designed for AI projects, including AI agents, LLMs, RAG systems, and data analysis. It collects and converts web data into multiple formats inclu…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
Timpibot
Timpibot is used by Timpi's decentralized network of independent node operators. The index they build can be used to train LLMs (Large Language Models).
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
VelenPublicWebCrawler
VelenPublicWebCrawler is a web crawler developed by Velen for Hunter that analyzes millions of publicly accessible internet pages every month. The bot builds business dat…
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
webzio-extended
webzio-extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.
AI Data Scraper
Downloads website content to include in datasets used for training AI models such as LLMs
See More →
AI Search Crawlers
AddSearchBot
AddSearchBot is a web crawler that indexes website content for AddSearch's AI-powered site search solution, collecting data to provide fast and accurate search results.
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Amzn-SearchBot
Amzn-SearchBot is an AI search crawler operated by Amazon that indexes web content for use in improving Alexa and other Amazon services.
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Anomura
Anomura is Direqt's web crawler that discovers and indexes links and metadata from websites for inclusion in Direqt's AI search results.
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Applebot
Applebot is Apple's web crawler that gathers data to power search features across Apple's ecosystem including Spotlight, Siri, and Safari search functionality.
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
atlassian-bot
atlassian-bot is Atlassian's AI web crawler that indexes custom websites for Rovo, allowing the indexed content to appear in Rovo Search results and be used by Rovo Chat …
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
AzureAI-SearchBot
AzureAI-SearchBot is an AI search crawler operated by Microsoft that indexes web content to support Azure AI services and improve search capabilities across Microsoft's A…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Channel3Bot
Channel3Bot is Channel3's web crawler that visits publicly accessible product detail pages to index product information into a universal product catalog. The bot collects…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Claude-SearchBot
Claude-SearchBot is used to create an index of websites that can be surfaced as results in Anthropic's Claude AI assistant search feature.
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Cloudflare-AutoRAG
Cloudflare-AutoRAG is Cloudflare's web crawler that indexes websites for AI Search (formerly AutoRAG), a managed retrieval-augmented generation service. The crawler autom…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Google-CloudVertexBot
Google-CloudVertexBot is a web crawler operated by Google that collects content for Google Cloud's Vertex AI Search service. This crawler indexes web pages to power enter…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
LinkupBot
LinkupBot is a web crawler operated by Linkup that indexes enterprise web content for its AI search platform. The bot visits websites to collect relevant business informa…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
meta-webindexer
meta-webindexer is Meta's web crawler that browses the internet to improve search results for Meta AI users. It analyzes online content to make Meta AI's responses more r…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
OAI-SearchBot
OAI-SearchBot is OpenAI's web crawler that indexes websites for SearchGPT, collecting and analyzing web content to power AI-driven search results and real-time informatio…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
PerplexityBot
PerplexityBot is the web crawler for Perplexity AI that indexes web content to power the AI search engine's real-time information retrieval and answer generation capabili…
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
PetalBot
PetalBot is Huawei's web crawler that indexes PC and mobile websites to build search databases for Petal Search engine and provide AI-powered content recommendations for …
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
ZanistaBot
ZanistaBot is an AI search crawler operated by Zanista. If you think this is incorrect or can provide additional detail about its purpose, please let us know.
AI Search Crawler
Indexes website content to possibly include as citations in AI-powered search results
See More →
Archivers
Archive-It
Archive-It is a web archiving crawler operated by Internet Archive that preserves copies of web pages for long-term digital preservation and historical record-keeping. It…
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
archive.org_bot
archive.org_bot is the Internet Archive's web crawler for the Wayback Machine, systematically crawling and preserving publicly accessible web pages for historical record …
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
Arquivo-web-crawler
Arquivo-web-crawler is the Portuguese web archive's bot that systematically crawls and preserves Portuguese websites for historical research, creating a comprehensive dig…
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
Authory
Authory is an automated content archiving crawler that systematically searches for and backs up published articles, podcasts, and videos by journalists and content creato…
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
bl.uk_lddc_bot
bl.uk_lddc_bot is operated by the British Library as part of their legal deposit web archiving program, which collects and preserves UK web content for the national archi…
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
bne.es_bot
bne.es_bot is an archiver. If you think this is incorrect or can provide additional detail about its purpose, please let us know.
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
bnf.fr_bot
bnf.fr_bot is the official web crawler of the Bibliothèque nationale de France (BNF), systematically collecting and archiving digital content from French websites to pres…
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
CUCH.org Archive Bot
CUCH.org Archive Bot is a web crawler that archives web content on behalf of C U Cyber History.
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
heritrix
heritrix is an archiver operated by Internet Archive. If you think this is incorrect or can provide additional detail about its purpose, please let us know.
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
ia_archiver
ia_archiver is an archiver operated by Internet Archive. If you think this is incorrect or can provide additional detail about its purpose, please let us know.
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
ia_archiver-web.archive.org
ia_archiver-web.archive.org is an archiver operated by Internet Archive. If you think this is incorrect or can provide additional detail about its purpose, please let us …
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →
IABot
IABot (InternetArchiveBot) is a Wikipedia bot operated by the Internet Archive that combats link rot by finding dead links on Wikipedia articles and adding archived versi…
Archiver
Captures and stores historical website snapshots for long-term digital preservation
See More →