Web scraping with Ruby/Mechanize
Intro
The Mechanize library is used for automating interaction with websites. Mechanize is also in Perl and Python available. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.
Mechanize uses nokogiri to parse html. What does this mean for you? You can treat a mechanize page like an nokogiri object. After you have used Mechanize to navigate to the page that you need to scrape, then scrape it using nokogiri methods to search parts in the DOM via XPath or CSS3 selectors ...
BBCode markup in Django
Why BBCode?
I'm a fan of reStructuredText but faced the problem of being forced to use two versions of one text field: one without any markup for RSS, one with markup for rendering HTML. I found no easy solution how to remove reST markup, so I decided to change markup for this project.
I chose BBCode because it's pretty easy to understand for ordinary users and it's easy to remove since BBCode uses a syntax similar to HTML, just with squared brackets ...
more