get website content

get website content

Post by unplu » Wed, 09 Aug 2000 04:00:00



Hi all,

  I want to write a shell script to get a part of the website content.
For example.

The website www.abc.com has the following content.
<html>
<body>
You can buy whatever you want.<br>
The price of the car is 100. The price of the ship is 30.
</body>
</html>

I only want to get "100" from the above website.

I write the following script to get the whole sentence.
lynx -dump http://www.abc.com/index.html | grep car

Is is possible to write a script to get the "100" only??

Rgds,
unplug

 
 
 

get website content

Post by Andre van Straate » Wed, 09 Aug 2000 04:00:00



> Hi all,
>   I want to write a shell script to get a part of the website content.
> For example.
> The website www.abc.com has the following content.
> <html>
> <body>
> You can buy whatever you want.<br>
> The price of the car is 100. The price of the ship is 30.
> </body>
> </html>
> I only want to get "100" from the above website.
> I write the following script to get the whole sentence.
> lynx -dump http://www.abc.com/index.html | grep car
> Is is possible to write a script to get the "100" only??

I assume you want to read a variable value (the car price) in a line where
the word "car" appears.

If the line in the HTML keeps constant, and only the car price changes,
you could use:

lynx -dump www.abc.com/index.html | awk '{if (match($0, /car/)) print $7}'

where the magic number 7 refers to the 7th record (100, i.e car price) in
a line that matches car.

But thats quite dodgy, and falls over if the word car appears in other
lines in the HTML document, or if something else changes.

You can read the lines and split them up with a "while" loop, and using
"read".

I have done that analyzing a line of a httpd log file in a bash script on
my Web page (see footer, go "scripting", "log files").

That's cumbersome, but you check each record separately, and it's very
flexible.

-- avs

Andre van Straaten
http://www.vanstraatensoft.com
______________________________________________


 
 
 

get website content

Post by unplu » Thu, 10 Aug 2000 04:00:00




> I assume you want to read a variable value (the car price) in a line where
> the word "car" appears.

> If the line in the HTML keeps constant, and only the car price changes,
> you could use:

> lynx -dump www.abc.com/index.html | awk '{if (match($0, /car/)) print $7}'

  I got syntax error and illegal statement when I execute the above
statement.

Quote:

> where the magic number 7 refers to the 7th record (100, i.e car price) in
> a line that matches car.

> But thats quite dodgy, and falls over if the word car appears in other
> lines in the HTML document, or if something else changes.

> You can read the lines and split them up with a "while" loop, and using
> "read".

> I have done that analyzing a line of a httpd log file in a bash script on
> my Web page (see footer, go "scripting", "log files").

> That's cumbersome, but you check each record separately, and it's very
> flexible.

  Your script seems very complicated.  But I will try to study it.
Thanks.

> -- avs

> Andre van Straaten
> http://www.vanstraatensoft.com
> ______________________________________________


 
 
 

get website content

Post by Andre van Straate » Thu, 10 Aug 2000 04:00:00





>> I assume you want to read a variable value (the car price) in a line where
>> the word "car" appears.

>> If the line in the HTML keeps constant, and only the car price changes,
>> you could use:

>> lynx -dump www.abc.com/index.html | awk '{if (match($0, /car/)) print $7}'
>   I got syntax error and illegal statement when I execute the above
> statement.

It really doesn't work with nawk on SPARC Solaris 5.7, but it does on gawk
3.0 on Linux, and on MKS awk on Windows 95.
As I don't have my awk book handy (I'm at work, now) and cannot figure out

awk '{if($0 ~ /car/) print $7}'

which looks cleaner and works on all three platforms (what a coincidence).

Quote:

>> where the magic number 7 refers to the 7th record (100, i.e car price) in
>> a line that matches car.

>> But thats quite dodgy, and falls over if the word car appears in other
>> lines in the HTML document, or if something else changes.

>> You can read the lines and split them up with a "while" loop, and using
>> "read".

>> I have done that analyzing a line of a httpd log file in a bash script on
>> my Web page (see footer, go "scripting", "log files").

>> That's cumbersome, but you check each record separately, and it's very
>> flexible.
>   Your script seems very complicated.  But I will try to study it.
> Thanks.

Well, only the "read" block might be of interest.

-- avs

Andre van Straaten
http://www.vanstraatensoft.com
______________________________________________

 
 
 

get website content

Post by Ken Pizzi » Fri, 11 Aug 2000 04:00:00



Quote:> try this:

>awk '{if($0 ~ /car/) print $7}'

>which looks cleaner and works on all three platforms (what a coincidence).

While you're at it, you can make that more idiomatic by making
the conditional control the block:
  awk '/car/ {print $7}'

                --Ken Pizzini

 
 
 

1. getting a Frontpage website on a httpd server

I have a website  on a frontpage windows server.  How do I get it on a
Linux server.  I understand you probably have to transfer the files used
fro the website built on frontpage.  However after that, is there a web
site
developer (similar to Frontpage) that you could use  to make a website on Red Hat 9.0.

2. Progress on apache suexec query bug?

3. Help requested getting to a website with various browsers.

4. Can anyone get the to work

5. Website Hosting and Website Design for Businesses and Professionals

6. How to find out the Ethernet MAC address

7. ptrace, and getting register contents.

8. cu problems RH 6.2

9. Getting content of file

10. Getting Content

11. trouble getting directory contents

12. Getting Contents

13. Q: getting variable contents in bourne shell