Questions about parsing email headers

Questions about parsing email headers

Post by koy » Tue, 03 Dec 1996 04:00:00



I am writing a C program to execute on Unix systems that will receive
incoming mail on its stdin and I need to parse out the "to", "from",
"reply-to", "subject", and message body.  I have some questions.

1) What are the standard ways of doing this?  Is there a standard format
specification (at least among Unix systems)?  How standard are mail
headers found in Unix mailbox files?  I would like my program to work
under as many variants of Unix as possible.

2) What is the standard way of distinguishing the from address?  I have
seen forms such as these in my mail



    Joe Smith <joe>            Mail from another user at my host
    joe (Joe Smith)

I would like to separate out just the from address without the name.

3) How will multiple "to"'s or "cc"'s appear in a header?

4) What is the best way to get the correct from address from a mail
message to automatically send a reply (e.g. for an autoresponder
application).  Would you look for "reply-to" and if not found, look for
"from"?

Thanks in advance for any assistance.

Terry Koyn

 
 
 

Questions about parsing email headers

Post by Joerg Winkelma » Tue, 03 Dec 1996 04:00:00


: I am writing a C program to execute on Unix systems that will receive
: incoming mail on its stdin and I need to parse out the "to", "from",
: "reply-to", "subject", and message body.  I have some questions.
:
: 1) What are the standard ways of doing this?  Is there a standard format
: specification (at least among Unix systems)?  How standard are mail
: headers found in Unix mailbox files?  I would like my program to work
: under as many variants of Unix as possible.
Such standards are usually specified by so-called "RFC"'s.
One place to find such things is
ftp://ds.internic.net/rfc

Joerg

 
 
 

Questions about parsing email headers

Post by Andrew Gabri » Tue, 03 Dec 1996 04:00:00




Quote:>I am writing a C program to execute on Unix systems that will receive
>incoming mail on its stdin and I need to parse out the "to", "from",
>"reply-to", "subject", and message body.  I have some questions.

>1) What are the standard ways of doing this?  Is there a standard format
>specification (at least among Unix systems)?  How standard are mail
>headers found in Unix mailbox files?  I would like my program to work
>under as many variants of Unix as possible.

The format specification of the headers is not unix-specific.
It is described in rfc822 (with some updates in later rfcs).

Mailboxes depend on the software (user agents and MTAs in use);
there are a few different ones typically found on unix.
I am not aware of any documentation covering them - you have to
read the source of programs which use them.

>2) What is the standard way of distinguishing the from address?  I have
>seen forms such as these in my mail



>    Joe Smith <joe>            Mail from another user at my host
>    joe (Joe Smith)

>I would like to separate out just the from address without the name.

Basically, round brackets are comments (ignore).
Then if angle brackets are there, use what's in them,
else use the whole field.
However, there's quite a lot more to do with quoting
and escaping - see rfc822.

Quote:

>3) How will multiple "to"'s or "cc"'s appear in a header?

Basically, comma separated, but again, see rfc822.

Quote:

>4) What is the best way to get the correct from address from a mail
>message to automatically send a reply (e.g. for an autoresponder
>application).  Would you look for "reply-to" and if not found, look for
>"from"?

Yes.
Also, you should work out how to stop two auto-responders having an
argument with each other :-).

You could look for a public implementation of the vacation program,
which does much of what you seem to require.

--


 
 
 

Questions about parsing email headers

Post by Brian S Hil » Wed, 04 Dec 1996 04:00:00


: I am writing a C program to execute on Unix systems that will receive
: incoming mail on its stdin and I need to parse out the "to", "from",
: "reply-to", "subject", and message body.  I have some questions.
: 1) What are the standard ways of doing this?  Is there a standard format
: specification (at least among Unix systems)?  How standard are mail
: headers found in Unix mailbox files?  I would like my program to work
: under as many variants of Unix as possible.

: 2) What is the standard way of distinguishing the from address?  I have
: seen forms such as these in my mail



:    
:     Joe Smith <joe>            Mail from another user at my host
:     joe (Joe Smith)

: I would like to separate out just the from address without the name.

: 3) How will multiple "to"'s or "cc"'s appear in a header?

: 4) What is the best way to get the correct from address from a mail
: message to automatically send a reply (e.g. for an autoresponder
: application).  Would you look for "reply-to" and if not found, look for
: "from"?

Read RFC822 from the internic.net resources. Also, I hope that you are
using one of the many regular expression libraries available; failure to
do so could turn a day-long task into a month-long task!

man re_comp regcmp regexp regexpr

-Brian
--
   ,---.     ,---.     ,---.     ,---.     ,---.     ,---.     ,---.  
  /  _  \   /  _  \   /  _  \   /  _  \   /  _  \   /  _  \   /  _  \  

__,'   `.___,'   `.___,'   `.___,'   `.___,'   `.___,'   `.___,'   `.__

 
 
 

Questions about parsing email headers

Post by Heiko Hero » Fri, 06 Dec 1996 04:00:00




)>I am writing a C program to execute on Unix systems that will receive
)>incoming mail on its stdin and I need to parse out the "to", "from",
)>"reply-to", "subject", and message body.  I have some questions.
)>
...

)Yes.
)Also, you should work out how to stop two auto-responders having an
)argument with each other :-).

One way: add some custom X-mycode: header with information, and check
for it. But, I've no idea if it is "normal" to propagate such headers
with any reply... somebody ? Some other way ?

)You could look for a public implementation of the vacation program,
)which does much of what you seem to require.

Much better, look into the sources for "procmail", AFAIK the best tool
for such things - either it is all you need, our you should be able to
extract all the knowledge you need beyond the rfc from that code.


Interestingly enough, the gods of the Disc[world] have never bothered
much about judging the souls of the dead, and so people only go to
hell if that's where they believe, in their deepest heart, that they
deserve to go.
Which they won't do if they don't know about it.
This explains why it is important to shoot missionaries on sight.

 
 
 

Questions about parsing email headers

Post by Floyd Davids » Sun, 08 Dec 1996 04:00:00






>)>I am writing a C program to execute on Unix systems that will receive
>)>incoming mail on its stdin and I need to parse out the "to", "from",
>)>"reply-to", "subject", and message body.  I have some questions.
>)>
...
>)You could look for a public implementation of the vacation program,
>)which does much of what you seem to require.

>Much better, look into the sources for "procmail", AFAIK the best tool
>for such things - either it is all you need, our you should be able to
>extract all the knowledge you need beyond the rfc from that code.

Anyone who is interested might want to look at

   ftp://ftp2.polarnet.com:/archives/unix/hparse/*

which is a program specifically intended to extract return
addresses (but also will do any specific header or the text body)
from email messages.  It is fully rfc-822 compliant and might be
useful as an example of implementing rfc-822 as well as being used
directly.

Floyd

--

 
 
 

Questions about parsing email headers

Post by Al A » Tue, 17 Dec 1996 04:00:00


                                sed
 is the answer


: : I am writing a C program to execute on Unix systems that will receive
: : incoming mail on its stdin and I need to parse out the "to", "from",
: : "reply-to", "subject", and message body.  I have some questions.
: : 1) What are the standard ways of doing this?  Is there a standard format
: : specification (at least among Unix systems)?  How standard are mail
: : headers found in Unix mailbox files?  I would like my program to work
: : under as many variants of Unix as possible.

: : 2) What is the standard way of distinguishing the from address?  I have
: : seen forms such as these in my mail



: :    
: :     Joe Smith <joe>            Mail from another user at my host
: :     joe (Joe Smith)

: : I would like to separate out just the from address without the name.

: : 3) How will multiple "to"'s or "cc"'s appear in a header?

: : 4) What is the best way to get the correct from address from a mail
: : message to automatically send a reply (e.g. for an autoresponder
: : application).  Would you look for "reply-to" and if not found, look for
: : "from"?

: Read RFC822 from the internic.net resources. Also, I hope that you are
: using one of the many regular expression libraries available; failure to
: do so could turn a day-long task into a month-long task!

: man re_comp regcmp regexp regexpr

: -Brian
: --
:    ,---.     ,---.     ,---.     ,---.     ,---.     ,---.     ,---.  
:   /  _  \   /  _  \   /  _  \   /  _  \   /  _  \   /  _  \   /  _  \  

: __,'   `.___,'   `.___,'   `.___,'   `.___,'   `.___,'   `.___,'   `.__
--
=-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
al aab, seders moderator                                      sed u soon
               it is not zat we do not see the  s o l u t i o n          
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

 
 
 

Questions about parsing email headers

Post by Boyd Rober » Wed, 18 Dec 1996 04:00:00



Quote:>                                sed
> is the answer

I think you should read RFC 822 before making such pronouncements.
Sed is not up to the job, except in the trivial case.

--

``Not only is UNIX dead, but it's starting to smell really bad.''  -- rob

 
 
 

1. Can you email my parse questions answer....

I saw there were 2 replies to my question about the pl extension and apache
but my news server has deleted the messages off of the server already,
unbelievable!...if you could please, please email to me I would be eternally
grateful.....

TIA
Brad

2. /proc/NNNNN accuracy

3. Email Problems: Null's before the header of emails

4. RAID sub system / tux

5. Another Question Concerning Email Headers

6. driver Epson Stylus Color 740

7. parsing headers

8. help on s3 virge dx

9. apache non parsed headers cgi

10. no parse headers??

11. Non Parsed Header scripts in Apache?

12. parse GRE header in ip_gre.c

13. server push, non parsed header generic script