> Hi All,
> I just got my first linux mandrake box 3 days ago.
> Linux email is not yet working. So don't blast me, please.
> I want to make some kind of fairly simple script that will get all the linked pages of a given URL.
> I strongly suspect there is something already available to do this. I just don't know the right terminology, so it is difficult to
> find.
I use wwwoffle. Below a an extract from the wwwoffle-welcome-page:
regards, bernward
The WWWOFFLE programs simplify World Wide Web browsing from computers that
use intermittent (dial-up) connections to the internet.
Description
The wwwoffled program is a simple proxy server with special features for use
with dial-up internet links. This means that it is possible to browse web pages
and read them without having to remain connected.
While Online
Caching of pages that are viewed for review later.
Conditional fetching to only get pages that have changed.
While Offline
The ability to follow links and mark other pages for download.
Browser or command line interface to select pages for downloading.
Optional info on bottom of pages showing cached date and allowing refresh.
Works with pages containing forms.
Works with pages that require basic username/password authentication.
Can be configured to use dial-on-demand for pages that are not cached.
Automated Download
Downloading of specified pages non-interactively.
Can automatically fetch inlined images in pages fetched this way.
Can automatically fetch contents of all frames on pages fetched this way.
Automatically follows links for pages that have been moved.
Can monitor pages at regular intervals to fetch those that have changed.
Makes backup copies of cached pages so server errors don't overwrite them.
Provides
Caching of web pages (http), ftp sites and finger command.
An introductory page with information and links to the built-in pages.
Multiple indexes of pages stored in cache for easy selection.
Interactive or command line control of online/offline status.
User selectable purging of pages from cache based on URL matching.
Interactive or command line option to fetch pages and links recursively.
Interactive web page to allow editing of the configuration file.
Built-in simple Web server for local pages.
Automatic proxy configuration for Netscape.
General
Can be used with one or more external proxies based on hostname.
Automates proxy authentication for external proxies that require it.
Configurable to still allow use on intranets while offline.
Can be configured to block or not cache URLs based on file type or host.
Can censor outgoing HTTP headers to maintain user privacy.
All options controlled using a simple configuration file.
Optional password control for management functions.
User customisable error message and control pages.
Further WWWOFFLE Links
The WWWOFFLE FAQ is now provided with the program and there is also an online version at
http://www.gedanken.demon.co.uk/wwwoffle/version-2.3/FAQ.html
The WWWOFFLE homepage on the internet is available at http://www.gedanken.demon.co.uk/wwwoffle/index.html and
contains the latest information about the program in general.
The latest information about using this version of WWWOFFLE is on the WWWOFFLE Version 2.3 Users Page at
http://www.gedanken.demon.co.uk/wwwoffle/version-2.3/user.html and contains more information about using this
version of the program.