Public Notes
on
histre
Node.js or Ruby for Scraping - Stack Overflow
stackoverflow.com
"When you say that mechanize can't scrape dynamic content, you really mean that it's a little bit more work to figure out which ajax requests need to be made and make them. The other side of that is that once you do you generally get a nice json response that's easy to deal with. Mechanize is also much faster than a full browser solution so my opinion is that it's usually worth the extra work.
As far as Node goes, there's potential and maybe once it's been around for a while some great libraries will become available, but I haven't seen anything yet that would make up for the ruby things I wiss miss."
##dynamic_screen_scraping #project #mechanize #ruby #ajax #javascript #screen_scraping #pub
Show More
Anemone - Ruby Web-Spider Framework
anemone.rubyforge.org
"Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site.
The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful."
#ruby #crawlers #mechanize #automation #pub
Show More
Collect and share the web
Get started for free