How to find the what number is missing in Sequence of 19 or 20 digit number.

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by chan » Thu, 26 Sep 2002 10:02:06



I got a data file with 19 or 20 digit number in sequence, i need to
check what are the number's were missing in that sequence.
Ex:
89014102069901080256
89014102069901080264
89014102069901080272
89014102069901080280
89014102069901080298
89014102069901080306
89014102069901080314
89014102069901080322
89014102069901080330
89014102069901080348
89014102069901080355
89014102069901080363
89014102069901080371
89014102069901080389
89014102069901080397
89014102069901080405
89014102069901080413
89014102069901080421
89014102069901080439
89014102069901080447

Thanks
Chan

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Andreas K?h?r » Thu, 26 Sep 2002 10:41:12


Submitted by "chan01" to comp.unix.shell:

Quote:> I got a data file with 19 or 20 digit number in sequence, i need to
> check what are the number's were missing in that sequence.
> Ex:
> 89014102069901080256
> 89014102069901080264
> 89014102069901080272
> 89014102069901080280
> 89014102069901080298
> 89014102069901080306
> 89014102069901080314
> 89014102069901080322
> 89014102069901080330
> 89014102069901080348
> 89014102069901080355
> 89014102069901080363
> 89014102069901080371
> 89014102069901080389
> 89014102069901080397
> 89014102069901080405
> 89014102069901080413
> 89014102069901080421
> 89014102069901080439
> 89014102069901080447

> Thanks
> Chan

Try this:

#!/bin/ksh

FILE=seq                    # Our file containing numbers.
PREFIX=89014102069901080    # Needed to avoid integer overflow.

while read NUM; do
    NUM=${NUM#$PREFIX}      # Strip the prefix.

    if [[ -z "$COUNT" ]]; then
        COUNT=$NUM          # First number in file.
    else
                            # Step over found number.
        COUNT=$(( $COUNT + 1 ))
    fi

    while (( $COUNT < $NUM )); do
                            # Missing all up until $NUM
        print "Missing $PREFIX$COUNT"
        COUNT=$(( $COUNT + 1 ))
    done
done < $FILE

--
Andreas K?h?ri               +------ Have a Unix: netbsd.org
-----------------------------+------ This post ends with :wq

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Stephane Chazela » Thu, 26 Sep 2002 20:42:34



> I got a data file with 19 or 20 digit number in sequence, i need to
> check what are the number's were missing in that sequence.
> Ex:
> 89014102069901080256
> 89014102069901080264

[...]

The right tool to do this would be awk or a combination of (tail, head,
GNU seq, comm). But your numbers are two big for most standard UNIX
tools dealing with numbers (including most shells).

So, you'll have to use some text utilities or dc.

Here is a sed solution:

sed -ne '1{h;d;}
  x;:2
  s/$/,,0123456789,0/;s/^,/0,/
  s/\(.\),\([^,]*\).*\1\(,*.\).*/\3\2/;/,/b2
  ;G;/^\(.*\)\n\1$/!{s/\n.*//;p;b2
  }'

--
Stphane

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Tapani Tarvaine » Thu, 26 Sep 2002 22:45:38



> I got a data file with 19 or 20 digit number in sequence, i need to
> check what are the number's were missing in that sequence.

Do you simply want to print all numbers that are missing?
Something like this perhaps:

read number
while read number2 ;do
   echo 'for (i='$number';++i<'$number2';) print i,"\n"'
   number=$number2
done | bc

--
Tapani Tarvainen

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by mats.blomstr.. » Fri, 27 Sep 2002 18:26:52



> I got a data file with 19 or 20 digit number in sequence, i need to
> check what are the number's were missing in that sequence.
> Ex:
> 89014102069901080256
> 89014102069901080264

Here is my attempt!
(I use bash, but this should work in ksh too, not tested)

bash$ cat script
#!/bin/bash
#
NEXT=`head -1 "$1"`
cat "$1" |while read VAR ;do
  while [ "$NEXT" != "$VAR" ] ;do
    if [ "`echo ${NEXT}+1 |bc -q`" != "$VAR" ] ;then
      echo "`echo ${NEXT}+1 |bc -q`"
    fi
    NEXT=`echo ${NEXT}+1 |bc -q`
  done
  NEXT="$VAR"
done
bash$ cat input
10
12
14
16
18
20
bash$ ./script input
11
13
15
17
19
bash$

And the same on your sampla data. Its long, so bye! :)
//Mats

bash$ ./script input2
89014102069901080257
89014102069901080258
89014102069901080259
89014102069901080260
89014102069901080261
89014102069901080262
89014102069901080263
89014102069901080265
89014102069901080266
89014102069901080267
89014102069901080268
89014102069901080269
89014102069901080270
89014102069901080271
89014102069901080273
89014102069901080274
89014102069901080275
89014102069901080276
89014102069901080277
89014102069901080278
89014102069901080279
89014102069901080281
89014102069901080282
89014102069901080283
89014102069901080284
89014102069901080285
89014102069901080286
89014102069901080287
89014102069901080288
89014102069901080289
89014102069901080290
89014102069901080291
89014102069901080292
89014102069901080293
89014102069901080294
89014102069901080295
89014102069901080296
89014102069901080297
89014102069901080299
89014102069901080300
89014102069901080301
89014102069901080302
89014102069901080303
89014102069901080304
89014102069901080305
89014102069901080307
89014102069901080308
89014102069901080309
89014102069901080310
89014102069901080311
89014102069901080312
89014102069901080313
89014102069901080315
89014102069901080316
89014102069901080317
89014102069901080318
89014102069901080319
89014102069901080320
89014102069901080321
89014102069901080323
89014102069901080324
89014102069901080325
89014102069901080326
89014102069901080327
89014102069901080328
89014102069901080329
89014102069901080331
89014102069901080332
89014102069901080333
89014102069901080334
89014102069901080335
89014102069901080336
89014102069901080337
89014102069901080338
89014102069901080339
89014102069901080340
89014102069901080341
89014102069901080342
89014102069901080343
89014102069901080344
89014102069901080345
89014102069901080346
89014102069901080347
89014102069901080349
89014102069901080350
89014102069901080351
89014102069901080352
89014102069901080353
89014102069901080354
89014102069901080356
89014102069901080357
89014102069901080358
89014102069901080359
89014102069901080360
89014102069901080361
89014102069901080362
89014102069901080364
89014102069901080365
89014102069901080366
89014102069901080367
89014102069901080368
89014102069901080369
89014102069901080370
89014102069901080372
89014102069901080373
89014102069901080374
89014102069901080375
89014102069901080376
89014102069901080377
89014102069901080378
89014102069901080379
89014102069901080380
89014102069901080381
89014102069901080382
89014102069901080383
89014102069901080384
89014102069901080385
89014102069901080386
89014102069901080387
89014102069901080388
89014102069901080390
89014102069901080391
89014102069901080392
89014102069901080393
89014102069901080394
89014102069901080395
89014102069901080396
89014102069901080398
89014102069901080399
89014102069901080400
89014102069901080401
89014102069901080402
89014102069901080403
89014102069901080404
89014102069901080406
89014102069901080407
89014102069901080408
89014102069901080409
89014102069901080410
89014102069901080411
89014102069901080412
89014102069901080414
89014102069901080415
89014102069901080416
89014102069901080417
89014102069901080418
89014102069901080419
89014102069901080420
89014102069901080422
89014102069901080423
89014102069901080424
89014102069901080425
89014102069901080426
89014102069901080427
89014102069901080428
89014102069901080429
89014102069901080430
89014102069901080431
89014102069901080432
89014102069901080433
89014102069901080434
89014102069901080435
89014102069901080436
89014102069901080437
89014102069901080438
89014102069901080440
89014102069901080441
89014102069901080442
89014102069901080443
89014102069901080444
89014102069901080445
89014102069901080446
bash$

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Stephane Chazela » Fri, 27 Sep 2002 18:46:25


[...]

Quote:> read number
> while read number2 ;do
>    echo 'for (i='$number';++i<'$number2';) print i,"\n"'
>    number=$number2
> done | bc

HPUX's bc, for instance, needs a non-empty third "for" expression and
doesn't know "print"

The "\n" may be expanded by some "echo"es.

read number
while read number2 ;do
   echo 'for (i='$number'+1;i<'$number2';i++) i'
   number=$number2
done

Should be better.

--
Stphane

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Chris F.A. Johnso » Sat, 28 Sep 2002 03:00:30



> Submitted by "chan01" to comp.unix.shell:
>> I got a data file with 19 or 20 digit number in sequence, i need to
>> check what are the number's were missing in that sequence.
>> Ex:
>> 89014102069901080256
[snip]
>> 89014102069901080447

> Try this:

> #!/bin/ksh

> FILE=seq                    # Our file containing numbers.
> PREFIX=89014102069901080    # Needed to avoid integer overflow.

> while read NUM; do
>     NUM=${NUM#$PREFIX}      # Strip the prefix.

>     if [[ -z "$COUNT" ]]; then
>         COUNT=$NUM          # First number in file.
>     else
>                             # Step over found number.
>         COUNT=$(( $COUNT + 1 ))
>     fi

>     while (( $COUNT < $NUM )); do
>                             # Missing all up until $NUM
>         print "Missing $PREFIX$COUNT"
>         COUNT=$(( $COUNT + 1 ))
>     done
> done < $FILE

    I have 2 quibbles with this:

      It will fail if $COUNT is already defined.

      I don't like testing whether $COUNT is empty on each and every
      line of the file.

    Both can be mitigated by reading the first line outside
    the loop:

       {
       read COUNT
       PREFIX=${COUNT%????} ## generalize for any file; adjust ???? to taste
       COUNT=${COUNT#$PREFIX}
       while read NUM
       do
         COUNT=$(( $COUNT + 1 ))
         while [ $COUNT -lt ${NUM#$PREFIX} ]
           do
           echo "Missing $PREFIX$COUNT"
               COUNT=$(( $COUNT + 1 ))
         done
       done
       } < $FILE

    (OK, 3 quibbles: I don't like using ksh-specific code when
    there is no reason to do so.)

--
    Chris F.A. Johnson                        http://cfaj.freeshell.org
    ===================================================================
    My code (if any) in this post is copyright 2002, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Stephane CHAZELA » Sat, 28 Sep 2002 03:41:03



Quote:> I got a data file with 19 or 20 digit number in sequence, i need to
> check what are the number's were missing in that sequence.
> Ex:
> 89014102069901080256
> 89014102069901080264

[...]

Another solution:

sed -ne '1p;$s/.*/sa&[dlap1+dsa<f]dsfx/p' file|dc|comm -23 - file

where "file" is the file that contains the numbers.

--
Stphane

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Stephane CHAZELA » Sat, 28 Sep 2002 04:11:13


[...]

Quote:> sed -ne '1p;$s/.*/sa&[dlap1+dsa<f]dsfx/p' file|dc|comm -23 - file

Just to explain:

<< sed -ne '1p;$s/.*/sa&[dlap1+dsa<f]dsfx/p' file|dc >>
returns a list of numbers from <first> to <last> - 1

comm -23 - file
compares that list with "file", and returns the lines that are
only in that list.

sed -ne '1p;$s/.*/sa&[dlap1+dsa<f]dsfx/p' file

returns
<first> sa <last> [d la p 1+ d sa <f]d sf x

That must be read as a dc script

<first> sa: stores <first> in register "a"
[d la p 1+ d sa <f]d sf x: stores the "d la p 1+ d sa <f" dc
script in register "f" and executes it.

d la p 1+ d sa <f:

d: duplicates <last> on the dc stack
lap1+dsa: increments register a and put the result on top of the
stack (p to print "a")

<f: if "a" < <last>, execute script stored in register f.

So, written another way, that dc script means:

a=<first>
do {
  print a
  a++

Quote:} while (a < <last>)

--
Stphane
 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by Stephane CHAZELA » Sat, 28 Sep 2002 04:11:14


[...]

Quote:> sed -ne '1{h;d;}
>   x;:2
>   s/$/,,0123456789,0/;s/^,/0,/
>   s/\(.\),\([^,]*\).*\1\(,*.\).*/\3\2/;/,/b2
>   ;G;/^\(.*\)\n\1$/!{s/\n.*//;p;b2
>   }'

To explain a bit:
1 # for first line only of file,
{h;d;} # store the line into hold space and skip to line 2
       # The hold space will hold an incremented value starting
       # with the content of line1 (that we'll call "counter")

# now, for every line:
x # swap hold-space and pattern-space

:2 # this is the start of our "counter++" procedure
s/$/,,0123456789,0/ # append ",,0123456789,0" to the
                    # pattern-space (the counter)

s/^,/0,/   # if pattern-space starts with "," prepend 0

s/\(.\),\([^,]*\).*\1\(,*.\).*/\3\2/ # we increment the
                                     # figure that is before
                                     # the first coma, if that
                                     # finger was 9, then we end
                                     # up with ",0"

/,/b2 # while there's a "," in the pattern space, we branch to 2

G # append the hold space to the pattern space

/^\(.*\)\n\1$/!{ # if the pattern space does not (!) consist in a
                 # sequence of chars followed by \n followed by
                 # that very same sequence of chars (i.e. if the
                 # current line is not equal to our counter),
                 # then:

s/\n.*// # retreive our counter

p  # print it

b2 # branch to our counter++ procedure

Quote:}'

--
Stphane
 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by mats.blomstr.. » Sat, 28 Sep 2002 22:22:34



> Another solution:
> sed -ne '1p;$s/.*/sa&[dlap1+dsa<f]dsfx/p' file|dc|comm -23 - file
> where "file" is the file that contains the numbers.

That is a quick one. Impressive!
I didnt know about 'comm', brandnew to me.
Ive looked into 'bc' and 'dc' and gave up when i realized that they
cant do modulus ( '%' ) correctly if 'scale' > 0.

Speed comparson on sampla data from OP (original poster)

Mine                            Yours
----                            -----
real    0m2.300s                real    0m0.020s
user    0m1.260s                user    0m0.000s
sys     0m0.990s                sys     0m0.000s

115 times faster!
//Mats

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by mats.blomstr.. » Sat, 28 Sep 2002 22:37:05



> 115 times faster!

Ive modified my script a to be little faster, but its still
40 times slower than Stephane's solution :(
//Mats

bash $ cat script
#!/bin/bash
#
NEXT=`head -1 "$1"`
cat "$1" |while read VAR ;do
  while [ "$NEXT" != "$VAR" ] ;do
    NEXTANDONE="`echo ${NEXT}+1 |bc -q`"
    if [ "$NEXTANDONE" != "$VAR" ] ;then
      echo "$NEXTANDONE"
    fi
    NEXT="$NEXTANDONE"
  done
  NEXT="$VAR"
done
bash $

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by John W. Krah » Sun, 29 Sep 2002 07:48:49




> > Another solution:
> > sed -ne '1p;$s/.*/sa&[dlap1+dsa<f]dsfx/p' file|dc|comm -23 - file
> > where "file" is the file that contains the numbers.

> That is a quick one. Impressive!
> I didnt know about 'comm', brandnew to me.
> Ive looked into 'bc' and 'dc' and gave up when i realized that they
> cant do modulus ( '%' ) correctly if 'scale' > 0.

> Speed comparson on sampla data from OP (original poster)

> Mine                            Yours
> ----                            -----
> real    0m2.300s                real    0m0.020s
> user    0m1.260s                user    0m0.000s
> sys     0m0.990s                sys     0m0.000s

> 115 times faster!

If you want speed this is about twice as fast (on my system) as the
sed/dc/comm version:

perl -nle'BEGIN{chomp($a=<>);$a++}print$a++ while $a ne $_;$a=$_;$a++'
file

John
--
use Perl;
program
fulfillment

 
 
 

How to find the what number is missing in Sequence of 19 or 20 digit number.

Post by mats.blomstr.. » Tue, 01 Oct 2002 20:14:50



Quote:> If you want speed this is about twice as fast (on my system) as the
> sed/dc/comm version:

Its four times faster on my machine.

  real    0m0.005s
  user    0m0.010s
  sys     0m0.000s

But for me it will not be better in this case. You will have to add
the time i need to learn how to handle perl :)
//Mats