[clug] Perl (or python) page scraping tools

David Tulloh david at tulloh.id.au
Tue Aug 8 09:37:41 GMT 2006


Michael James wrote:
> The grubby task of HTML page-scraping is rearing its ugly head.
> 
> The first task is to snarf a 2 column table,
>  the first column is the variable name
>  the second, its value.
> 
> Ideally I'd like to get back a hash:   table{name} = value
> 
> Sounds simple but the HTML::TableParse module
>  returns a complicated and too general structure.
> 
> Anyone got any recommendations of modules for scraping HTML?
> 

I have used TableThing.pm from http://www.gellyfish.com/htexamples/ in
the past.

It converts a HTML table to an array of rows containing an array of
fields.  As an added bonus, it's dead simple to understand how it works
and tweak it if you desire.


David


More information about the linux mailing list