Automatic Web Page Replication

Automatic Web Page Replication

Post by Ed Wehn » Thu, 08 Aug 1996 04:00:00



Does anyone know how to download a web page from another site and keep
it on a local server.  We have a lot of access for stock quotes from
our local lan to internet sources.  We would like to copy certian key
stock prices at regular intervals and store them locally, then publish
them as part of our intranet web site.

FIrst, is this legal?

Second, if so, how do we pass the arguments in an automated fashion to
retrieve only the stocks we are interested in.  We were thinking to
use cron to execute a script of some type every 15 min and download
the price.  What kind of script can we use?

Thanks in advance,

Ed

 
 
 

Automatic Web Page Replication

Post by Theo Van Dint » Fri, 09 Aug 1996 04:00:00


: Does anyone know how to download a web page from another site and keep
: it on a local server.  We have a lot of access for stock quotes from
: our local lan to internet sources.  We would like to copy certian key
: stock prices at regular intervals and store them locally, then publish
: them as part of our intranet web site.
:
: FIrst, is this legal?

I would imagine... The Stock Quotes are publically available, but I guess
it depends on where you get the quotes from... As for how to do it...  My
solution would be write a small script to request multiple HTML documents
via a cron job, parse out the information you need, and then write your
own page... (to see what I mean, goto http://www.kluge.net/weather.html
...  I wrote a small script to request 3 different weather reports from
another web server (whom I give credit to on the page...), rip out the
table (in this case) of weather information, and create my own web page
with all three...  You'd want the same thing, only different... <g>)

: Second, if so, how do we pass the arguments in an automated fashion to
: retrieve only the stocks we are interested in.  We were thinking to
: use cron to execute a script of some type every 15 min and download
: the price.  What kind of script can we use?

I worked up a little script called geturl (available via
ftp.kluge.net:/NES/geturl ...) which can retrieve an URL via the
commandline.  That would at least get you the page (you'd have to modify
it to pass form data around...), and then just parse the text that comes
back, go from there...

--
-----------------------------------------------------------------------------
Theo Van Dinter                            Vice-President WPI Lens and Lights
Systems Engineering - Web Development        Active Member in SocComm and ACM
Cabletron Systems, Inc.

Rochester, NH 03867                      www: http://www.kluge.net/~felicity/
-----------------------------------------------------------------------------

 
 
 

Automatic Web Page Replication

Post by Alan J. Flavel » Fri, 09 Aug 1996 04:00:00



> Does anyone know how to download a web page from another site and keep
> it on a local server.
> FIrst, is this legal?

In general, no.  It's a breach of copyright.  Ask permission first.

I'm no lawyer, but the presence of a page on the WWW only gives implied
license to view it, along with doing those things that are ancillary to
viewing it (such as caching, or writing temporary copies to disk).  Making
a permanent copy or creating a derivative work is surely excluded, except
for things that come under the term "fair use" (which are rather
restrictive).  This is the wrong group for discussing such issues:
there are FAQs that deal in much more detail with issues of copyright
in relation to the Internet.

Quote:> Second, if so, how do we pass the arguments in an automated fashion to
> retrieve only the stocks we are interested in.  We were thinking to
> use cron to execute a script of some type every 15 min and download
> the price.  What kind of script can we use?

How long's a piece of string?

There are many ways of implementing what you have in mind, depending
to some extent on what skills you have, how complex the page is that
you're retrieving, how much the format varies over time, and just
how detailed the extraction you propose to perform on it.  If you
can handle it, Perl is probably a great tool for this.  But again, the
c.i.w.s.u group is inappropriate - you'd want to tackle the
c.i.w.authoring.cgi group for that.

Once you have determined that your can go ahead legally, I'd recommend a
look at the CGI FAQ and see if you can find a fit between the available
library resources etc. and the skills that are available to you.

best regards

 
 
 

Automatic Web Page Replication

Post by Douglas Stewar » Sat, 10 Aug 1996 04:00:00



> Does anyone know how to download a web page from another site and keep
> it on a local server.  We have a lot of access for stock quotes from
> our local lan to internet sources.  We would like to copy certian key
> stock prices at regular intervals and store them locally, then publish
> them as part of our intranet web site.

Set up a caching proxy server.  Apache has a proxy module in the 1.1.x
releases.  It's experimental but it does HTTP okay.

  Douglas

 
 
 

Automatic Web Page Replication

Post by Parviz Doust » Sat, 10 Aug 1996 04:00:00


Does anyone know if it is possible to add an entry to the environment
variable list that NS passes to CGIs.

It appears that CGI does not inherit the server's environment so setting
the environ in the server (via NSAPI) does not do the trick.

Any help is appreciated,

Parviz

 
 
 

Automatic Web Page Replication

Post by Ken Overto » Sat, 10 Aug 1996 04:00:00



> Does anyone know how to download a web page from another site and keep
> it on a local server.  

Questions of legality aside, have you ever heard of WebWhacker?  Point
it at a URL and it sucks down everything it can reach and writes the
files (whilst recoding links, filenames, etc) into a directory on your
hard drive.  It's out for PC and Mac, dunno bout Unix ports.

- kov

--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


 
 
 

Automatic Web Page Replication

Post by Steff Watki » Tue, 13 Aug 1996 04:00:00


: Does anyone know how to download a web page from another site and keep
: it on a local server.  We have a lot of access for stock quotes from
: our local lan to internet sources.  We would like to copy certian key
: stock prices at regular intervals and store them locally, then publish
: them as part of our intranet web site.
:
: FIrst, is this legal?

Hi Ed,

  try emailling the webmaster at the site and asking if you can make
copies/if there are any problems with you making copies!!

: Second, if so, how do we pass the arguments in an automated fashion to
: retrieve only the stocks we are interested in.  We were thinking to
: use cron to execute a script of some type every 15 min and download
: the price.  What kind of script can we use?

I upped my version of Perl to V5.003 and then added the LWP module
(Lib-WWW-Perl). This gave me the capability of doing the following from
the command line:

  perl -MLWP::Simple -e 'getprint "http://sw.cse.bris.ac.uk/"' > page.html

This sort of line is SOOOOO easy to script!!!!

On the question of arguement parsing. You'd have to tell us HOW the
arguements are read at the remote side; a URL parameters, a command line
input or some form of cookie parser???

Steff

 
 
 

Automatic Web Page Replication

Post by Kevin Stev » Wed, 21 Aug 1996 04:00:00



>Does anyone know if it is possible to add an entry to the environment
>variable list that NS passes to CGIs.
>It appears that CGI does not inherit the server's environment so setting
>the environ in the server (via NSAPI) does not do the trick.

In 1.12 you can use the init-cgi Init function in magnus.conf:

    Init fn=init-cgi MY_ENV_VAR="foo bar baz"

I suspect it's the same in the 2.0 versions, but I have not tried it.

 
 
 

Automatic Web Page Replication

Post by Dav Aman » Wed, 21 Aug 1996 04:00:00




> >Does anyone know if it is possible to add an entry to the environment
> >variable list that NS passes to CGIs.

> >It appears that CGI does not inherit the server's environment so setting
> >the environ in the server (via NSAPI) does not do the trick.

> In 1.12 you can use the init-cgi Init function in magnus.conf:

>     Init fn=init-cgi MY_ENV_VAR="foo bar baz"

> I suspect it's the same in the 2.0 versions, but I have not tried it.

This also works in FastTrack and Enterprise 2.0.

-=dav

--
**********************************************************************
Dav Amann                     | "That which does not kill me had
Operations Program Manager    |  better run damn fast!"
Netscape Support              |                     -- Bumper Sticker

**********************************************************************

 
 
 

1. Utility to convert PowerPoint "web pages" into real web pages?

I am in the process of converting all of my stuff into formats not dependent
upon MicroShaft shaftware so that I can ditch Windoze altogether and go
entirely to Linux and/or *BSD.  Mostly, it is just time-consuming, but I have
run into one sticky problem.

I saved a large MicroShaft PowerPunt document as a "web page" (from within
PowerPunt 2000), and found that (as I was afraid of) the web page is only
viewable on MicroShaft Internet Exploder 5.5 disService Pack 2 (maybe all 5.x
and 6.x -- didn't try them).  When I tried to view it under NetScape 4.7x or
kfm 1.x for Linux with KDE 1.x, I could not view the web page (except for the
very 1st slide in 1 case).  And yes, I did enable JavaScrapped in NetScrape,
although in one case this simply enabled the PowerPunt "web page" to cause
NetScrape to crash with a bus error.  And yes, when given the opportunity, I
did click on the link to bypass the MicroShaft error message that says that
earlier versions of Internet Exploder are not supported, and doesn't even
acknowledge the existence of any other web browsers (of course -- in a few
years, they'll be claiming that they invented the web browser).  (And yes, I
know that MicroShaft commands that you must have an Office 2000 license and
MicroShaft Internet Exploder 5.0 or later to view their "web pages", but
they can go HERE --> http://freespace.virgin.net/andrew.harrison4/.)

So, does any utility exist to convert one of these monstrosities into a real
web page?  Here are the possibilities I can think of (feel free to suggest
others, except don't suggest keeping Windoze and Internet Exploder around):

1.      A utility that does this automatically or almost automatically
        (preferred)
2.      Hack the files manually (way too much work unless someone knows
        a clever trick -- for instance a sed script to run the files
        through to remove the curse of MicroShaft)
3.      Try to find something that can read the original PowerPunt 2000
        document and convert it properly into a format that can then be
        exported as a real web page (a possibility, but not preferred
        because it requires installing a major software package for just
        this one conversion)
4.      Upgrade from kfm to Konqueror 2.x (I don't know if this would
        work, and even if it did, this is still not a full solution,
        because it would leave the web pages not viewable by NetScrape
        users)

--
Lucius Chiaraviglio

To get the exact address:                     ^^^^^^^^^^^^^^^^^
Replace indicated characters with common 4-letter word meaning the same thing
and remove underscores (Spambots of Doom, take that!).

2. Pro-linux arguments

3. Automatic ppp web page?

4. Q: Is bus mastering with IDE drives using Triton chipset suppported?

5. Automatic replication app to copy directories to target servers ?

6. newbie question: help with gcc & apache

7. Web Server Replication

8. Linux on a single floppy

9. session replication for clustered apache web servers

10. web server replication

11. Replication-Web site.

12. web replication

13. Web Site Replication