Public Notes on
View Public Collections
Loading...

Reliably access web content in markdown format by simply prefixing any URL with `pure.md/`. Avoids bot detection, renders JavaScript-heavy websites, and converts HTML, PDFs, images, and more into pure markdown.

#crawler #web #extraction #markdown #api #ai #agent

Show More
Loading...

Crawlspace is a centralized platform for developers to build and deploy web crawlers. Gather fresh data for your apps and agents while contributing to a platform-wide cache for crawler traffic.

via: https://pure.md/

#crawler #api #ai #agent

Show More
Loading...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation. - apify/crawlee

#nodejs #crawler #scraper #headless-browser #framework

Show More
Loading...
推特 图片 视频 爬虫;一键下载. Contribute to caolvchong-top/twitter_download development by creating an account on GitHub.

#twitter #crawler #python

Show More
Loading...
Home - Firecrawl www.firecrawl.dev
Firecrawl crawls and converts any website into clean markdown.

#api #crawler #markdown #ai #llm #readability

Show More
Loading...
A social networking service scraper in Python. Contribute to JustAnotherArchivist/snscrape development by creating an account on GitHub.

#osint #crawler #twitter #telegram #python #scraper

Show More
Loading...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API. - mendableai/firecrawl

#crawler #api #ai #markdown #service

Show More
Loading...
Incredibly fast crawler designed for OSINT. Contribute to s0md3v/Photon development by creating an account on GitHub.

#crawler #python #osint #archive

Show More
Loading...
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫 - NanmiCoder/MediaCrawler

#crawler #bilibili #xiaohongshu #python

Show More
Loading...
Web crawling framework based on asyncio. Contribute to gaojiuli/gain development by creating an account on GitHub. #python #crawler #asyncio
Show More
Loading...
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM) - ultrafunkamsterdam/undetected-chromedriver: Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM) #headless #chrome #crawler #anti-crawler
Show More
Loading...
HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the 'mirrored' website in your browser, and you can browse the site from link to link, as if you were viewing it online.... #website #archiving #download #crawler #pub
Show More
Loading...
Teleport Pro: The world's most widely used webspider. Fast, reliable, robust, comprehensive webspidering, Teleport Pro by Tennyson Maxwell Information Systems, Inc. #website #archiving #crawler #pub
Show More