bash script fetch email address

bash script fetch email address

Post by defko » Thu, 04 Jan 2001 10:47:21



Hi,

Within a directory I have dozens of text files. In these texts appear
lines which may contain email addresses.

Have you got a BASH script or do you know a web site that provides a
script to fetch these email addresses and gather them in, let's say,
another separate file??

Thank you!

Sent via Deja.com
http://www.deja.com/

 
 
 

bash script fetch email address

Post by poeppin » Thu, 04 Jan 2001 11:34:19


go into the directory and type


or a fancey way of doing it.




Quote:> Hi,

> Within a directory I have dozens of text files. In these texts appear
> lines which may contain email addresses.

> Have you got a BASH script or do you know a web site that provides a
> script to fetch these email addresses and gather them in, let's say,
> another separate file??

> Thank you!

> Sent via Deja.com http://www.deja.com/


 
 
 

bash script fetch email address

Post by Faux_Pseu » Thu, 04 Jan 2001 16:33:24


not very efficant but its faster than stipping all the excess data by hand



| sed -e "s/[<>;,]//g" -e 's/mailto://g' -e 's/remove//g' -e 's/spam//g'
done > emaillist
uniq emaillist | sort > emaillist~
cat emaillist~ > emaillist
rm -f emaillist~

--(Once apon a time, in comp.unix.shell,)--
                --(defkon said it like only they can.)--

Quote:>Hi,

>Within a directory I have dozens of text files. In these texts appear
>lines which may contain email addresses.

>Have you got a BASH script or do you know a web site that provides a
>script to fetch these email addresses and gather them in, let's say,
>another separate file??

>Thank you!

>Sent via Deja.com
>http://www.deja.com/

--
--(UIN=66618055)--

GUI's are for slackers.  ibpconf.sh 5 on freshmeat.net  
The easiest way to customize the command line.  By Faux_Pseudo
 
 
 

bash script fetch email address

Post by Mike Dowli » Thu, 04 Jan 2001 22:04:02


O>Within a directory I have dozens of text files. In these texts appear

Quote:>lines which may contain email addresses.

>Have you got a BASH script or do you know a web site that provides a
>script to fetch these email addresses and gather them in, let's say,
>another separate file??

You have a couple of suggestions already, which will probably get you
started, however, depending on how those addresses are encoded, they may
or may not be adequate.

I do hope you are not a spammer trying to work out how to extract email
address from usenet postings!

For HTML, you'll have to cope with text lines like


(taken from one of today's spams).

If the text contains email, then you won't want the message-ids that
look like


(Again, taken from a recent spam).

And the complicated bit is that comments in the form of possibly nested
"()" are legal, as is white space, so the following are the same email
address:


and


Also remember that case should be irrelevant.

Cheers,
Mike
--

address.  It is a mail alias.  Once spammed, the alias is deleted, and
the integer 'N' incremented.  Currently, mike[41,42] are valid.  If
email to mikeN bounces, try mikeN+1.

 
 
 

bash script fetch email address

Post by Angel Blu » Thu, 04 Jan 2001 23:50:03



> Hi,
> Within a directory I have dozens of text files. In these texts appear
> lines which may contain email addresses.
> Have you got a BASH script or do you know a web site that provides a
> script to fetch these email addresses and gather them in, let's say,
> another separate file??

Here below a script I use (comments explain it):

================================================================
#!/bin/sh
#
# Takes mail addresses from a file
#
# Use: this_script file_to_find

awk '{ for (i = 1; i <= NF; i++) {

    print $i }  }' $1\
      |\

#
# This script take emails from ONE file, if you want from all
# files of a dir you can do:
#
# for i in `ls`
# do
# this_script $i
# done
#
# and then you can filter output again with sort and uniq and
# redirect it in a file.
#

# using awk and filters them with sed to eliminate some
# garbage. It tries to eliminate garbage also with awk regexp:

# matched). And sed removes all chars not letters, numbers or
# one of -+_.
#
# It uses gnu awk (gawk) and "character classes" ex. [:alnum:]
# for regexp (anyway you can substitute a class with chars you
# want). It extracts also all Messages-Ids form a mbox, and
# this because we must match numbers on the left side of an
# address (much people has adress with some numbers).
#
# Sure better regexps are possible and are welcome!
#
================================================================

--

  Angel Blue  |-------------------------------------------------------------
      _*_     |  "The revolution is dead"
       |      |      (Urge Overkill)