Filtering Variable length files

Filtering Variable length files

Post by stuar » Sun, 07 Jul 1996 04:00:00



Does any one know how to filter a file with variable length records
so that records of the same specified length go to one file
and the others are trashed

thanks

-reality is an illusion created by * deficiency-

 
 
 

Filtering Variable length files

Post by Harrison Berger » Mon, 08 Jul 1996 04:00:00


 >Does any one know how to filter a file with variable length records
 >so that records of the same specified length go to one file
 >and the others are trashed
 >
 >
 >thanks
 >

        Are they ascii files? What separates the records?
        What separates the fields in a record? How many
        fields in a record? Is the number of fields/record
        different for the saved/trashed records or is it
        based soley on the length of a record?

        A solution for an ascii file which depends only
        on line length would be:

awk 'length >somenumber {print}' filein>fileout

or an ascii file cut by the number of fields:

awk 'NF >somenumber {print}' filein>fileout

or an ascii file cut by the number of fields with the
3rd field longer than 3 letters:

awk 'NF >somenumber && (length($3)>3) {print}' filein>fileout

This could go on ad infinitum, supply more information.

--

    Steinberger:
State of the Instrument

 
 
 

1. what is fastest way to reformat file from variable to fixed length

We need to convert a file from a variable lenght delimted format to a fixed
length format.

Speed is the biggest concern since we have close to 100 gigs of files we want
to change.

For example we have a 24 gig file in 12 pieces.  Currently the file is
variable length delimited by |  with fields enclosed with ^
sample:
^717764002^|^71776401.^|^2000-09-11-19.23.00.000000^|^2000-05-25^
^300102^|^30011.^|^2000-06-28-19.57.29.670634^|^2000-05-30^

we need to convert this to a fixed lenght format with each field taking a
fixed length.  in addtion the 3 rd field needs to be broken into 2 pieces
sample
  717764002   71776401.    2000-09-11   -19.23.00.000000        2000-05-25
       300102         30011.   2000-06-28   -19.57.29.670634        2000-05-30

I know that this can be done in various tools (C, awk, PERL)  couple of
questions:

1 -any opioions experiences with what would be the very fastest method?

2 - any other unix util that would be able to do this?

hardware is Sun ES10k with  8 CPUS,   800 gigs HD over 36 disks, and 6 gigs
RAM. we are only users on machine.

2. ServerAlias? by another newbie

3. tcsh filter to evaluate variables in a text file

4. CD Rom Troubles

5. Filters for Apache2 - removing Content-length

6. LINUX FOR MCA/SCSI

7. Filters, Filters, where are you Filters...

8. Possible to mix SuExec and CGI ScriptAlias?

9. Compile error : variable length argument processing

10. Variable length recs: Mainframe -> Unix - Help!!!

11. Q: any way to expand variable length in csh beyond 1024?

12. variable length subnet mask

13. Reading tapes in variable block length mode with UnixWare