Public Notes by chase_ats Tagged #mechanize

Public Notes on

Node.js or Ruby for Scraping - Stack Overflow stackoverflow.com

"When you say that mechanize can't scrape dynamic content, you really mean that it's a little bit more work to figure out which ajax requests need to be made and make them. The other side of that is that once you do you generally get a nice json response that's easy to deal with. Mechanize is also much faster than a full browser solution so my opinion is that it's usually worth the extra work. As far as Node goes, there's potential and maybe once it's been around for a while some great libraries will become available, but I haven't seen anything yet that would make up for the ruby things I wiss miss." ##dynamic_screen_scraping #project #mechanize #ruby #ajax #javascript #screen_scraping #pub

Anemone - Ruby Web-Spider Framework anemone.rubyforge.org

"Anemone is a Ruby library that makes it quick and painless to write programs that spider a website. It provides a simple DSL for performing actions on every page of a site, skipping certain URLs, and calculating the shortest path to a given page on a site. The multi-threaded design makes Anemone fast. The API makes it simple. And the expressiveness of Ruby makes it powerful." #ruby #crawlers #mechanize #automation #pub

automation •
crawlers •
mechanize •
pub •
ruby

Collect and share the web

Get started for free