Public Notes by chase_ats Tagged #crawlers

Notes publicly shared by our members.
"Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site. The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful." #ruby #crawlers #mechanize #automation #pub
commoncrawl.org
"non-profit foundation dedicated to providing an open repository of web crawl data that can be accessed and analyzed by everyone" #open_source #open_data #crawlers #Google #spiders #Data #text #web_index #pub
urlsearch.commoncrawl.org
"Enter a domain to find the location of files in the corpus that have pages from that URL. The output will be an alphabetically ordered list and a JSON file that can be downloaded" #open_data #web_index #crawlers #search_engines #Data #open_source #bootstrap_layout #pub
streamified.me
"All your social networks, blogs and feeds in one convenient place. Read, like, share, crosspost, track, analyze, monitor and collaborate." #text_extraction #crawlers #web_crawling #social_networks #web_2.0 #social_media #aggregation #not_sure #web_app #pub
wappalyzer.com
#freemium #crawlers #searching #SaaS #$kippt_bookmark #listings #services #tech_stacks #ideas #browser_extensions #open_source #%on_github #$AFMA #$afma-v #pub
#SaaS #listings #searching #services #ideas #tech_stacks #browser_extensions #$kippt_bookmark #freemium #crawlers #$AFMA_umbrella #site_technology_analyzer #$site_technology_stack #pub
#%product_hunt #startups #SaaS #free* #beta #alternatives #$AFMA_umbrella #$kippt_bookmark #browser_extensions #crawlers #freemium #ideas #listings #searching #services #$site_technology_stack #site_technology_analyzer #tech_stacks #pub
#$AFMA_umbrella #$kippt_bookmark #$site_technology_stack #%product_hunt #SaaS #alternatives #beta #browser_extensions #crawlers #free* #freemium #ideas #listings #searching #services #site_technology_analyzer #startups #tech_stacks #pub