help dl all of olympics.com

Jeremy jepri at webone.com.au
Thu Mar 7 14:44:45 EST 2002


Wget has full support for your needs.  From the man page (always a good read
when you need help):

-D domain-list
     --domains=domain-list
         Set domains to be accepted and DNS looked-up, where
         domain-list is a comma-separated list.  Note that it
         does not turn on -H.  This option speeds things up, even
         if only one host is spanned.

     --exclude-domains domain-list
         Exclude the domains given in a comma-separated domain-
         list from DNS-lookup.



>Am trying a websuck on olympics.com,
>it's a huge mess of frames, javascript, asp, etc.
>
>$ wget -r -H -nc -R jpg http://olympics.com
>
>is a start, -r to be recursive,
>and we need to span hosts (-H) because akamai.net has some of the
>content (?), but some hosts we don't want like apple.com.
>Not sure how to reject/accept hosts with wget,
>but i did work out -R jpg rejects all the pretty pictures.
>
>Am trying to extract results and athlete profiles
>for a data-mining project (yes i'm nuts).
>Is anyone interested?
>I can keep on hacking this on my own (with e.g. python),
>but if anyone is curious/has ideas that would be a help.
>It's a big project (again). Will i end up manually saving
>pages with mozilla...? stay tuned...
>
>-simon
>
>
>
>





More information about the linux mailing list