Eliminating duplicate fields from a line

Eliminating duplicate fields from a line

Post by hwk.. » Thu, 26 Aug 1999 04:00:00



Here is the input:

1|mary||1|2|4|12
2|bob|admin|2|2|3|107
3|dave|operator|4|3|4|7|8|14
4|ron|analyst|8|8|110

Here is what I want:

1|mary||1|2|4|12
2|bob|admin|2|3|107
3|dave|operator|4|3|7|8|14
4|ron|analyst|8|110

You may notice, the duplicate numbers in lines 2 3 and 4 were
eliminated.

I thought about using an array to dupe them but shells don't have
arrays, and for reasons I don't want to go into we can't use Perl.

Thanks in advance...

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.

 
 
 

Eliminating duplicate fields from a line

Post by Al Shark » Thu, 26 Aug 1999 04:00:00



> Here is the input:

> 1|mary||1|2|4|12
> 2|bob|admin|2|2|3|107
> 3|dave|operator|4|3|4|7|8|14
> 4|ron|analyst|8|8|110

> Here is what I want:

> 1|mary||1|2|4|12
> 2|bob|admin|2|3|107
> 3|dave|operator|4|3|7|8|14
> 4|ron|analyst|8|110

> I thought about using an array to dupe them but shells don't have
> arrays, and for reasons I don't want to go into we can't use Perl.

I'm sure someone will say there's a more efficient way, but here goes:

awk -F"|" '{printf "%s|%s|%s|%s", $1, $2, $3, $4;
            for (i = 5; i <= NF; i++) {
                for (j = i-1; j >= 4; j--) {
                   if ($i == $j) break;
                   else if (j == 4) printf "|%s", $i;
                }
            }
            printf "\n";
           }' your_file

Print the first four fields without a newline.
Starting with the fifth field, compare the value to all the previous
fields (except the first three,) and stop looking if you find a match.
If you got all the way back to the fourth field, it didn't match, so
print it without a newline.

Do the same with each successive field on the line.
When you get to the end of the line, print a newline character.

 
 
 

Eliminating duplicate fields from a line

Post by Ken Pizzi » Thu, 26 Aug 1999 04:00:00



>Here is the input:

>1|mary||1|2|4|12
>2|bob|admin|2|2|3|107
>3|dave|operator|4|3|4|7|8|14
>4|ron|analyst|8|8|110

>Here is what I want:

>1|mary||1|2|4|12
>2|bob|admin|2|3|107
>3|dave|operator|4|3|7|8|14
>4|ron|analyst|8|110

>You may notice, the duplicate numbers in lines 2 3 and 4 were
>eliminated.

>I thought about using an array to dupe them but shells don't have
>arrays, and for reasons I don't want to go into we can't use Perl.

Well, some shells lack arrays, but many do have them (ksh, bash,
zsh, rc, es; even csh and tcsh).

But, looking at your task, my first inclination is to suggest
using "awk".
  awk -F\| '{printf "%s|%s|%s|", $1,$2,$3
             delete a
             for (i=4;i<=NF;++i) { if (!a[$i]++) printf "%s|", $i}
             print ""}'

(The special-casing of $1,$2,$3 is because I note that you have
  2|bob|admin|2|2|3|107
mapping to
  2|bob|admin|2|3|107
instead of
  2|bob|admin|3|107
.)

                --Ken Pizzini

 
 
 

Eliminating duplicate fields from a line

Post by Neil Schemenau » Thu, 26 Aug 1999 04:00:00



>I thought about using an array to dupe them but shells don't have
>arrays, and for reasons I don't want to go into we can't use Perl.

I hope you have awk.  Note that I am not an awk export.  Go easy
on me.  This script works with mawk on my Debian system.

#!/usr/bin/awk -f
BEGIN { FS="|"}
{
        for (i=4; i<=NF; i++) {
                nums[$i] = ""
        }
        printf "%s|%s|%s", $1, $2, $3
        for (num in nums) {
                printf  "|%s", num
                delete nums[num]
        }
        printf "\n"

Quote:}

 
 
 

Eliminating duplicate fields from a line

Post by Raja » Sun, 29 Aug 1999 04:00:00


use the associative arrays of awk or nawk..

> Here is the input:

> 1|mary||1|2|4|12
> 2|bob|admin|2|2|3|107
> 3|dave|operator|4|3|4|7|8|14
> 4|ron|analyst|8|8|110

> Here is what I want:

> 1|mary||1|2|4|12
> 2|bob|admin|2|3|107
> 3|dave|operator|4|3|7|8|14
> 4|ron|analyst|8|110

> You may notice, the duplicate numbers in lines 2 3 and 4 were
> eliminated.

> I thought about using an array to dupe them but shells don't have
> arrays, and for reasons I don't want to go into we can't use Perl.

> Thanks in advance...

> Sent via Deja.com http://www.deja.com/
> Share what you know. Learn what you don't.

 
 
 

1. Eliminating duplicate emails (duplicate message ID) in MS Outlook/Exchange

****************************************************************************
How to eliminate Microsoft Outlook's propensity to deliver duplicate email

Eunice Santorini Revision 1.00 July 13, 2002
Current Revision: 1.00 July 13, 2002
****************************************************************************

DEFINITION:
 Duplicate emails referred to herein are those messages with the same
 message ID, save for case, or leading white space (these are typical
 transformations which may have occurred in transit via prior MTA,
 Listserv, or Majordomo processing).

KEYWORDS:
 Duplicate Message-ID Filter Outlook Exchange

SOLUTION:
0. As you know, traditional (aka sendmail-based) users never receive
   duplicate emails because, traditionally, they employ powerful well-known
   filters (first introduced more than a dozen years ago) such as procmail
   (i.e., process mail) which eliminates all duplicate emails on sight.

   For example, this well-known three-line text file instantly traps all
   duplicate message ids regardless of case or leading white spaces:
     :0 Wh: duplicates.lock
     * ?formail -D 65536 msgid.cache
     duplicates

   However, Microsoft Exchange/Outlook users can not avail themselves of
   these extremely well known and efficient duplicate-email filters.
   Amazingly, it's almost as if Microsoft was (is still?) clueless about
   mail users' needs when they designed (supported?) Outlook/Exchange.

   Face it: Intelligent users don't like a mail user agent which can't
            even perform the simplest of all MUA tasks.

   So, what do we do for those hapless users who made the mistake of
   moving off traditional sendmail-based email onto Microsoft
   Outlook/Exchange?

1. I would like to write up the solution for the poor souls who moved
   off of sendmail-based email onto Outlook/Exchange based email.

   Microsoft has confirmed this to be a problem in the inability of
   Outlook/Exchange to handle this simple task.

   So, we're on our own for those punished Exchange/Outlook users.

   Any ideas for their salvation?

2. where is xpmroot?

3. Eliminating Duplicate Mail Headers

4. RH 6.0, PPP, and my ISP

5. /bin/sh: eliminating duplicate elements from two lists

6. Tool for fetching mpeg system streams from videocd...

7. Any secrets to eliminate duplicate X desktops

8. Need Help - 2nd Adaptec 1740's not recognized!

9. Trying to eliminate duplicate symbol warnings

10. eliminating multiple spaces separating fields

11. removing duplicate fields within a column... should be pretty simple

12. eliminating comments and blank lines from a file

13. eliminate blank lines from text file