extracting out of a string

extracting out of a string

Post by David Gild » Tue, 18 Jun 2002 15:38:48



Hello,

What is the best way to get just the date out of this string?

$line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown TBA|Nashua, NH';


print $date[0];

# Or

($date,$time,$event,$place,$city) = split(/\|/,$line);

And just not use these other vars.

Or is there a better way with the substring function?

Thanks


Web designer for Afropop Worldwide
<http://www.afropop.org/community/contributors.php?ID=8>

====================================================
Cora Connection: Your West African Music Source
Resources, Recordings, Instruments & More!
<http://www.coraconnection.com/>
====================================================

 
 
 

extracting out of a string

Post by Sudarsan Raghav » Tue, 18 Jun 2002 15:49:11



> Hello,

> What is the best way to get just the date out of this string?

> $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown TBA|Nashua, NH';




my $date = (split(/\|/,$line))[0];
Quote:

> print $date[0];

> # Or


 
 
 

extracting out of a string

Post by Jenda Krynic » Tue, 18 Jun 2002 18:30:32




> > Hello,

> > What is the best way to get just the date out of this string?

> > $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown
> > TBA|Nashua, NH';



> my $date = (split(/\|/,$line))[0];

        my $date = (split(/\|/,$line,2))[0];
would be better and

        my $date;
        $line =~ /^([^\|])\|/
                and $date = $1;
even better.    

        use Benchmark;

        $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown
TBA|Nashua, NH';

        sub Sudarsan {
                my $date = (split(/\|/,$line))[0];
                return $date;
        }

        sub SplitUBound {
                my $date = (split(/\|/,$line,2))[0];
                return $date;
        }

        sub Re {
                my $date;
                $line =~ /^([^\|])\|/
                        and $date = $1;
                return $date;
        }

        timethese 1000000, {
                Sudarsan => \&Sudarsan,
                SplitUBound => \&SplitUBound,
                Re => \&Re,
        }

Benchmark: timing 1000000 iterations of Re, SplitUBound, Sudarsan...
        Re:  0 wallclock secs ( 0.76 usr +  0.00 sys =  0.76 CPU)

SplitUBound:  4 wallclock secs ( 3.37 usr +  0.00 sys =  3.37 CPU)

  Sudarsan:  5 wallclock secs ( 5.91 usr +  0.00 sys =  5.91 CPU)

Jenda

There is a reason for living. There must be. I've seen it somewhere.
It's just that in the mess on my table ... and in my brain
I can't find it.
                                        --- me

 
 
 

extracting out of a string

Post by David Vd Geer Inhuur Tbv Ipl » Tue, 18 Jun 2002 23:30:07


my $date = (split /\|/, $line)[0];

Regs David

> > Hello,

> > What is the best way to get just the date out of this string?

> > $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown TBA|Nashua, NH';



> > print $date[0];

> You could also try

> my ($date) = $line =~ /^(.*?)\|/;

> Cheerio,
> Janek

> --



 
 
 

extracting out of a string

Post by Janek Schleich » Tue, 18 Jun 2002 23:24:49


David Gilden wrote at Mon, 17 Jun 2002 08:38:48 +0200:

> Hello,

> What is the best way to get just the date out of this string?

> $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown TBA|Nashua, NH';



> print $date[0];

You could also try

my ($date) = $line =~ /^(.*?)\|/;

Cheerio,
Janek

 
 
 

extracting out of a string

Post by Shishir K. Sin » Tue, 18 Jun 2002 23:32:13


I was just wondering, Which one would be faster, the regex, or the split ?? If I had a million record to process, it would make a difference, wouldn't it ??
-----Original Message-----
From: David vd Geer Inhuur tbv IPlib


Sent: Monday, June 17, 2002 10:30 AM

Subject: Re: extracting out of a string

my $date = (split /\|/, $line)[0];

Regs David

> > Hello,

> > What is the best way to get just the date out of this string?

> > $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown TBA|Nashua, NH';



> > print $date[0];

> You could also try

> my ($date) = $line =~ /^(.*?)\|/;

> Cheerio,
> Janek

> --


--



 
 
 

extracting out of a string

Post by Drie » Tue, 18 Jun 2002 23:37:03



> Hello,

> What is the best way to get just the date out of this string?

> $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown TBA|Nashua, NH'
> ;



> print $date[0];

> # Or

> ($date,$time,$event,$place,$city) = split(/\|/,$line);

> And just not use these other vars.

if you don't need the rest then you could get away with

        my ($date) = split(/\|/,$line);

or if you're not sure, the trick some of us use is


HTH

ciao
drieux

---

 
 
 

extracting out of a string

Post by Drie » Wed, 19 Jun 2002 00:05:37


[..]

Quote:>    sub Re {
>            my $date;
>            $line =~ /^([^\|])\|/
>                    and $date = $1;
>            return $date;
>    }

[..]
http://www.wetware.com/drieux/pbl/BenchMarks/split_v_Re.txt

Do hate to ding you here, but you will notice that
your regEx will never 'work' hence will make no assignment
hence runs really fast.

if you had made it

        sub Re {
                my $date;
                $line =~ /^([^\|]+)\|/
                        and $date = $1;
                return $date;
        }

it would have assigned $1 to $date and slowed the
process down a bit....

ciao
drieux

---

 
 
 

extracting out of a string

Post by David Gild » Wed, 19 Jun 2002 00:43:56


This one seems to work fine,

($date) = split (/\|/, $line);

I guess it basically the same as:

my $date = (split(/\|/,$line))[0];

comments?

Have great day...

Dave

> > What is the best way to get just the date out of this string?

> > $line ='June 29, 2002|6-7:45pm|Nashua Summer Fest|Downtown TBA|Nashua, NH';



> my $date = (split(/\|/,$line))[0];

> > print $date[0];

> > # Or

 
 
 

extracting out of a string

Post by Ovi » Wed, 19 Jun 2002 00:49:38



> Do hate to ding you here, but you will notice that
> your regEx will never 'work' hence will make no assignment
> hence runs really fast.

> if you had made it

>    sub Re {
>            my $date;
>            $line =~ /^([^\|]+)\|/
>                    and $date = $1;
>            return $date;
>    }

> it would have assigned $1 to $date and slowed the
> process down a bit....

The following is not meant to correct anyone, but rather, to show some of the beginning
programmers a few aspects of good coding style and a couple of subtle Perl "gotchas".

drieux is correct about that '+' being needed there.  In this context, however, the final \| may
be not needed (unless it's explicitly put there to tell the regex that there *must* be a trailing
pipe.  However, here's how I might change this subroutine:

  sub Re {
    my $line = shift;
    if ( -1 < index $line, '|' ) {
      return (split /\|/, $line, 2)[0];
    }
    return;
  }

This has several benefits.  Going through it line by line, with an explanation after the line:

  sub Re {

I would change the name to something more descriptive, such as "get_date()".  This makes the code
more self-documenting.

    my $line = shift;

It's a "black box".  The subroutine is not dependant on anything declared outside of itself, such
as the $line variable.  This makes the sub easier to reuse.  If you need to change the $line
variable name to something else, you won't have to touch this subroutine.

    if ( -1 < index $line, '|' ) {

This ensures that we actually have a pipe character in there, thus meaning that we won't return a
false positive.  This may be irrelevant with truly trusted data, or if this check is done earlier.
 If we know the date will be a minimum length, we can change the -1 constant to a number setting
the minimum length (minus one:  if the date is 8 characters, we'd use a seven).

      return (split /\|/, $line, 2)[0];

Here, we use the optional 3rd parameter to split will tells us to only split $line into two
chunks.  If there are multiple pipes, there is no need to split line into a lengthy list if all we
will do is throw the rest away.

Further, by wrapping the split function in parentheses, we force this into list context and use
the [0] to take the first element.

    return;

A bare return here is to stop subtle bugs if we call this in list context.  Here's a
demonstration.  One might think that this will have the same result for both print statements, but
it doesn't.





  sub foo { my $foo; return $foo }
  sub bar { return }




which is also something we probably don't want.

However, the bar() subroutine has a bare return, which returns nothing, thus not adding any
elements to the array.   It's easy to get bitten by this.

Cheers,
Curtis "Ovid" Poe

=====
"Ovid" on http://www.perlmonks.org/
Someone asked me how to count to 10 in Perl:


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

 
 
 

extracting out of a string

Post by Jenda Krynic » Wed, 19 Jun 2002 00:49:09




> [..]
> >       sub Re {
> >               my $date;
> >               $line =~ /^([^\|])\|/
> >                       and $date = $1;
> >               return $date;
> >       }
> [..]
> http://www.wetware.com/drieux/pbl/BenchMarks/split_v_Re.txt

> Do hate to ding you here, but you will notice that
> your regEx will never 'work' hence will make no assignment
> hence runs really fast.

Whoooops. That big difference should have warned me.

Thanks for catching this. God knows what I was thinking of.

Jenda

There is a reason for living. There must be. I've seen it somewhere.
It's just that in the mess on my table ... and in my brain
I can't find it.
                                        --- me

 
 
 

extracting out of a string

Post by Drie » Wed, 19 Jun 2002 03:38:09


[..]

Quote:>> if you had made it

>>        sub Re {
>>                my $date;
>>                $line =~ /^([^\|]+)\|/

[..]

Quote:> drieux is correct about that '+' being needed there.  In this context,
> however, the final \| may be not needed (unless it's explicitly put there
> to tell the regex that there *must* be a trailing pipe.

Technically you are correct since we are trying to break out
only the lead element - and my complements for noting!

hence let us do the RegEx side with a simple break it out -

        [^\|]           - all things not "|"
        [^\|]+          - one or more of these "not pipes"
        ([^\|]+)        - set the $1 to that which we match as not pipe
        ^([^\|]+)       - set the $1 to the first notPipe at the head of the $line
        ^([^\|]+)\|     - set the $1 to the first notPipe at the head of the $line
                                        that is followed by a pipe

IF you absolutely were Positive that the $line would always be "|"
delimited - then the simpler and faster would work - The problem with
this shorter stack is that IF the $line is not "pipe delimited" that
simpler approach would GOBBLE UP the whole line of say

        my $line = "This is all the Evil Things that the Voices in my Head
Say";

since we want to KNOW that we got an Undefined value when we
play with these things:

cf: http://www.wetware.com/drieux/pbl/bloopers/delimGuard.txt

Since we know three things:

        a) Other Coders may dishonor the API - and pass us Drek

        b) End Users should breath oxygen while reading the documents
                about what they pass to code - it helps them avoid passing Drek

        c) Management has seen code maintenance issues somewhere at some time
                and may need to have the code protected for the code maintenance
issues.
                hence we code to make our lives collectively easier.

But I am willing to bench mark the classical form
using your basic thesis:

        sub Re3 {
                my $date = $1 if $line =~ /^([^\|]+)/;
        return $date;
        } # end of Re3

cf:
http://www.wetware.com/drieux/pbl/BenchMarks/split_v_Re.txt

ciao
drieux

---

 
 
 

extracting out of a string

Post by John W. Kra » Wed, 19 Jun 2002 06:18:32




> > Do hate to ding you here, but you will notice that
> > your regEx will never 'work' hence will make no assignment
> > hence runs really fast.

> > if you had made it

> >       sub Re {
> >               my $date;
> >               $line =~ /^([^\|]+)\|/
> >                       and $date = $1;
> >               return $date;
> >       }

> > it would have assigned $1 to $date and slowed the
> > process down a bit....

> The following is not meant to correct anyone, but rather,
> to show some of the beginning programmers a few aspects of
> good coding style and a couple of subtle Perl "gotchas".

> drieux is correct about that '+' being needed there.  In this
> context, however, the final \| may be not needed (unless it's
> explicitly put there to tell the regex that there *must* be a
> trailing pipe.  However, here's how I might change this subroutine:

>   sub Re {
>     my $line = shift;
>     if ( -1 < index $line, '|' ) {
>       return (split /\|/, $line, 2)[0];
>     }
>     return;
>   }

A more robust subroutine would be:

sub Re {
    $_[0] and (split /\|/, $_[0])[0];
    }

John
--
use Perl;
program
fulfillment

 
 
 

extracting out of a string

Post by Jeff 'Japhy' Piny » Wed, 19 Jun 2002 06:46:30


On Jun 17, John W. Krahn said:


>>   sub Re {
>>     my $line = shift;
>>     if ( -1 < index $line, '|' ) {
>>       return (split /\|/, $line, 2)[0];
>>     }
>>     return;
>>   }

Ovid, if you're going to calculate the location of the first '|' in $line,
why not use it and avoid split() altogether?

  sub Re {
    my $line = shift;
    (my $pos = index $line, '|') > -1 or return;
    return substr $line, 0, $pos;
  }

Quote:>A more robust subroutine would be:

>sub Re {
>    $_[0] and (split /\|/, $_[0])[0];
>    }

I'd use defined(), because I'm a stickler.

--

RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]

 
 
 

extracting out of a string

Post by Ovi » Wed, 19 Jun 2002 06:58:41



> On Jun 17, John W. Krahn said:


> >>   sub Re {
> >>     my $line = shift;
> >>     if ( -1 < index $line, '|' ) {
> >>       return (split /\|/, $line, 2)[0];
> >>     }
> >>     return;
> >>   }

> Ovid, if you're going to calculate the location of the first '|' in $line,
> why not use it and avoid split() altogether?

>   sub Re {
>     my $line = shift;
>     (my $pos = index $line, '|') > -1 or return;
>     return substr $line, 0, $pos;
>   }

Ooh.  Good point.  I was making extra work when I didn't need to.

As for japhy's reply to John's comments:

Quote:> >A more robust subroutine would be:

> >sub Re {
> >    $_[0] and (split /\|/, $_[0])[0];
> >    }

> I'd use defined(), because I'm a stickler.

That concerns me:

    perl -e 'print Re("asdf");sub Re{$_[0] and (split /\|/,$_[0])[0]}'

That snippet prints "asdf".  The problem lies in no test to ensure a valid form of the data (i.e.,
it must at least have one pipe character).  I feel more comfortable with japhy's snippet provided
in his reply to me above.

Cheers,
Curtis "Ovid" Poe

=====
"Ovid" on http://www.perlmonks.org/
Someone asked me how to count to 10 in Perl:


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

 
 
 

1. strings, strings, strings ...

Heya,

  I began coding C++ with Borland C++ Builder and actually never really
bothered for things like strcmp and stuff
  i have 2 questions ..

  first .. the function itoa or StrItoA, does it require an existing buffer
as char* or will it create one and change the value accordingly ?
  (which makes more sense in my eyes since i cant know how long the output
string will be, can be 5 or can be 324.34423)

  second question ....

  is there some kind of string class in cw8 ? i cant overcome the "loss" of
my AnsiString class from bcb =(
  all the char* stuff takes too much time .. id rather like to spend it on
the important things .. if i need to reallocate memory manually
  everytime i wanna put 2 strings together of create some kind of output ill
start dreaming about chars pretty soon ... and i really really
  dont want THAT to happen ;)

regards,
  J?rg Eitemller

--

            . )) -:|:-
              ?.  .))
            ((??.  .   -:|:-
            -:|:-  ((??.* dth

2. Event ID 3034

3. Extract text from position x as string

4. Question WEB-Relais

5. Extracting a String from an Any type

6. Giving CDRW access to Groups other than Administrator

7. extracting a string from between parens

8. Anyone know about midi program for live use?

9. Regex problem extracting middle-word part of string

10. extracting numbers from a string

11. Extracting bits out of a string of bytes

12. Extracting bits from a string of bits

13. need help extracting words from a string - Thanks and need more help