wget: wget *keyword*.html?

wget: wget *keyword*.html?

Post by oak » Sat, 19 Dec 1998 04:00:00



Anyone know how I can invoke wget
to get all urls in, say, a home directory and all
subdirectories only, with a particular key word in
the file name?

Something like:

wget http://somewhere.com/~joe/*football*.html

where I'm looking for web pages with the word "football"
somewhere in them (i.e., somewhere in the web page FILE NAME).
In this instance I'd like to have retrieved pages such as
scores-football.html home_football.html footballquotes.html etc..

Thanks.

-Tony                                                

--------------------------------------------------------
 Abbreviate - af 2 millenia, a btr wy t rd n wri.
         http://www.eskimo.com/~oak/abr/

 
 
 

wget: wget *keyword*.html?

Post by Mengmeng Zha » Sat, 19 Dec 1998 04:00:00


: Anyone know how I can invoke wget
: to get all urls in, say, a home directory and all
: subdirectories only, with a particular key word in
: the file name?

: Something like:

: wget http://somewhere.com/~joe/*football*.html

: where I'm looking for web pages with the word "football"
: somewhere in them (i.e., somewhere in the web page FILE NAME).
: In this instance I'd like to have retrieved pages such as
: scores-football.html home_football.html footballquotes.html etc..

: Thanks.

: -Tony                                                

I'm not a wget expert, but as far as I know, this is quite impossible. The
reason is that you can't really use wildcards at all with HTTP URL's. If, say,
http://somewhere.com/~joe/index.html existed, then the contents of the ~joe
directory are completely hidden. I suppose what you might want is for
wget to be able to only download html files with a certain pattern while
using the -r option, which could be possible. I don't think wget currently
supports something like that though (only extensions and domains are matched
against). Why not join the wget mailing list or contact the author? You might
get some more information there. The addresses are in the docs.

Hope this helps,
MZhang

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GM/CS d- s+: a--->? C++(+++) UL+(++) P+ L++(+++) E- W+(+++) N++ o+(++) K?

!r y?
------END GEEK CODE BLOCK------
Get your own Geek Code at http://www.geekcode.com
Visit the Z at http://www.math.swt.edu/~mz33062/

 
 
 

1. wget: wget *keyword*.html?

Anyone know how I can invoke wget
to get all urls in, say, a home directory and all
subdirectories only, with a particular key word in
the file name?

Something like:

wget http://somewhere.com/~joe/*football*.html

where I'm looking for web pages with the word "football"
somewhere in them (i.e., somewhere in the web page FILE NAME).
In this instance I'd like to have retrieved pages such as
scores-football.html home_football.html footballquotes.html etc..

Thanks.

-Tony

--------------------------------------------------------
 Abbreviate - af 2 millenia, a btr wy t rd n wri.
         http://www.eskimo.com/~oak/abr/

2. Help

3. wget: "no such file" --info for wget users!

4. RedHat 7.2

5. wget to do a Linux From Scratch install?

6. P&P

7. wget: Stop It from background downloading?

8. Create tar file with relative path

9. Wget shared library problem

10. wget won't retry

11. wget problems

12. need wget i386 Binary

13. sdm vs wget