Agents | Known Agents

AI Data Providers

TerraCotta is Ceramic's web crawler that indexes public content to power their web-scale search API for AI and LLMs.

AI Data Provider

AI Data Provider

Crawls websites to supply structured content to AI systems as a third-party service

YouBot is a web crawler by You.com that indexes and extracts web content to power its real-time search, contents, and research APIs, delivering grounded web data to AI ag…

AI Data Provider

AI Data Provider

Crawls websites to supply structured content to AI systems as a third-party service

AI Data Scrapers

AI2Bot is operated by Ai2, a non-profit AI research institute. It's used to download data to train open source AI models.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Ai2Bot-Dolma is operated by Ai2, a non-profit AI research institute. It's used to download data to train open source AI models.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Amazonbot is a web crawler operated by Amazon that builds an index of web content to improve Amazon products and services, including Alexa, Kindle, and Amazon Shopping. T…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Applebot-Extended

Apple-Extended is used to train Apple’s foundation LLM models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok. It's allegedly used to download training data for its LLMs (Large Language Model) includin…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

CCBot is Common Crawl's web crawler that creates an open repository of web data, making crawled content universally accessible for research, analysis, and AI training pur…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

ChatGLM-Spider is a web crawler operated by Zhipu AI, the Chinese company behind ChatGLM. It is used for collecting data to train and evaluate the company's large languag…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

ClaudeBot is a web crawler operated by Anthropic to download training data for its LLMs (Large Language Models) that power AI products like Claude.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

CloudVertexBot is a Google-operated crawler available to site owners to request targeted crawls of their own sites for AI training purposes on the Vertex AI platform.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

cohere-training-data-crawler

cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

CohereBot is a web crawler operated by Cohere, a company that provides large language models and enterprise solutions. This bot collects data from websites to support the…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Cotoyogi is a research crawler operated by Japan's Research Organization of Information and Systems that collects web content to build AI training datasets for research p…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Datenbank Crawler

Datenbank Crawler is a web crawler operated by German company netEstate used for collecting and selling international website data.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

FacebookBot is a web crawler used by Meta to download training data for its AI speech recognition technology.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Google-Extended

Google-Extended is a web crawler used by Google to download AI training content for its AI products like the Gemini assistant and its Vertex AI generative APIs.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

GoogleOther is Google's generic crawler used by various product teams for fetching publicly accessible content, including one-off crawls for internal research and develop…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

GPTBot is OpenAI's web crawler that collects data from publicly accessible web pages to improve AI models like ChatGPT, while respecting robots.txt and opt-out preference…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

ICC-Crawler is NICT's research crawler that automatically collects web pages from the Internet for academic research at Japan's National Institute of Information and Comm…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

imageSpider is a web crawler operated by ByteDance, the company behind TikTok, Douyin, and other content platforms. The bot collects images from websites across the inter…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Kangaroo Bot is used by the company Kangaroo LLM to download data to train open source AI models tailored to Australian language and culture.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

KimiBot is a web crawler operated by Moonshot AI that collects web content for training Kimi's AI models. Sites can block this bot via robots.txt to prevent their content…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

laion-huggingface-processor

LAION-huggingface-processor is a web crawler operated by LAION (Large-scale Artificial Intelligence Open Network), a non-profit organization that creates open datasets fo…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

LCC is a web crawler operated by the University of Leipzig that collects text data from websites to build large-scale linguistic corpora for research purposes. The bot ga…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

meta-externalagent

meta-externalagent crawls web content for training AI models and improving Meta's products by indexing content directly across the internet.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

MistralBot is a web crawler operated by Mistral, a company that develops frontier language models and enterprise AI platforms. The bot collects data from web pages to sup…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

netEstate Imprint Crawler

netEstate Imprint Crawler is an AI data scraper operated by netEstate. If you think this is incorrect or can provide additional detail about its purpose, please let us kn…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

omgili is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

SBIntuitionsBot

SBIntuitionsBot is a web crawler operated by SB Intuitions, a Japanese company that develops generative language models optimized for the Japanese language and culture. T…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Spider is a web crawler designed for AI projects, including AI agents, LLMs, RAG systems, and data analysis. It collects and converts web data into multiple formats inclu…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

Timpibot is used by Timpi's decentralized network of independent node operators. The index they build can be used to train LLMs (Large Language Models).

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

VelenPublicWebCrawler

VelenPublicWebCrawler is a web crawler developed by Velen for Hunter that analyzes millions of publicly accessible internet pages every month. The bot builds business dat…

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

webzio-extended

webzio-extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.

AI Data Scraper

AI Data Scraper

Downloads website content to include in datasets used for training AI models such as LLMs

AI Search Crawlers

AddSearchBot is a web crawler that indexes website content for AddSearch's AI-powered site search solution, collecting data to provide fast and accurate search results.

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Amzn-SearchBot is an AI search crawler operated by Amazon that indexes web content for use in improving Alexa and other Amazon services.

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Anomura is Direqt's web crawler that discovers and indexes links and metadata from websites for inclusion in Direqt's AI search results.

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Applebot is Apple's web crawler that gathers data to power search features across Apple's ecosystem including Spotlight, Siri, and Safari search functionality.

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

atlassian-bot is Atlassian's AI web crawler that indexes custom websites for Rovo, allowing the indexed content to appear in Rovo Search results and be used by Rovo Chat …

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

AzureAI-SearchBot

AzureAI-SearchBot is an AI search crawler operated by Microsoft that indexes web content to support Azure AI services and improve search capabilities across Microsoft's A…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Channel3Bot is Channel3's web crawler that visits publicly accessible product detail pages to index product information into a universal product catalog. The bot collects…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Claude-SearchBot

Claude-SearchBot is used to create an index of websites that can be surfaced as results in Anthropic's Claude AI assistant search feature.

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Cloudflare-AutoRAG

Cloudflare-AutoRAG is Cloudflare's web crawler that indexes websites for AI Search (formerly AutoRAG), a managed retrieval-augmented generation service. The crawler autom…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Google-CloudVertexBot

Google-CloudVertexBot is a web crawler operated by Google that collects content for Google Cloud's Vertex AI Search service. This crawler indexes web pages to power enter…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

Kimi-SearchBot is a web crawler operated by Moonshot AI that powers Kimi's search functionality. It analyzes and indexes web pages so they can appear in Kimi search resul…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

LinkupBot is a web crawler operated by Linkup that indexes enterprise web content for its AI search platform. The bot visits websites to collect relevant business informa…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

meta-webindexer

meta-webindexer is Meta's web crawler that browses the internet to improve search results for Meta AI users. It analyzes online content to make Meta AI's responses more r…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

MistralAI-Index

MistralAI-Index is one of Mistral AI's automated web crawlers. It indexes web content to power search capabilities in Vibe, Mistral's AI assistant product.

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

OAI-SearchBot is OpenAI's web crawler that indexes websites for SearchGPT, collecting and analyzing web content to power AI-driven search results and real-time informatio…

AI Search Crawler

AI Search Crawler

Indexes website content to possibly include as citations in AI-powered search results

← Previous Page 2 of 35 Next →