Public Notes on
View Public Collections
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation. - apify/crawlee

#nodejs #crawler #scraper #headless-browser #framework

Show More
推特 图片 视频 爬虫;一键下载. Contribute to caolvchong-top/twitter_download development by creating an account on GitHub.

#twitter #crawler #python

Show More
Home - Firecrawl www.firecrawl.dev
Firecrawl crawls and converts any website into clean markdown.

#api #crawler #markdown #ai #llm #readability

Show More
A social networking service scraper in Python. Contribute to JustAnotherArchivist/snscrape development by creating an account on GitHub.

#osint #crawler #twitter #telegram #python #scraper

Show More
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API. - mendableai/firecrawl

#crawler #api #ai #markdown #service

Show More
Incredibly fast crawler designed for OSINT. Contribute to s0md3v/Photon development by creating an account on GitHub.

#crawler #python #osint #archive

Show More
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫 - NanmiCoder/MediaCrawler

#crawler #bilibili #xiaohongshu #python

Show More
Web crawling framework based on asyncio. Contribute to gaojiuli/gain development by creating an account on GitHub. #python #crawler #asyncio
Show More
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM) - ultrafunkamsterdam/undetected-chromedriver: Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM) #headless #chrome #crawler #anti-crawler
Show More
HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the 'mirrored' website in your browser, and you can browse the site from link to link, as if you were viewing it online.... #website #archiving #download #crawler #pub
Show More
Teleport Pro: The world's most widely used webspider. Fast, reliable, robust, comprehensive webspidering, Teleport Pro by Tennyson Maxwell Information Systems, Inc. #website #archiving #crawler #pub
Show More