two unix questions: split and inserting text in each file.

two unix questions: split and inserting text in each file.

Post by shre » Sat, 05 Oct 2002 12:38:58



Dear Unix gurus,

I have to solve the following types of problem - which I'm hoping can
be solved more efficiently than my crude and inefficient ways of
solving. Basically I have lists in the order 20,000 records which I
need to run a SQL query involving 'WHERE' clause. My files/environment
is a Sun (OS 5.6) unix box.
SQL 'WHERE' clause can only accept 1000 records at a time - hence, I
need to split the list in individual files of 1000 records each.

Here is a sample input file - its a list, so each record is separated
by a new line character and there are no duplicates in the list.

InputFile
---------
100789A10
100789A11
100789A12
100789A13
...
...
887648A55
887648A56
887648A57

When I use the split command as shown below
split -1000 InputFile
The resulting files get named as
xaa
xab
xac
..
..
xgh
question 1: how can I make the starting and ending line numbers appear
in the resulting output files. For example
if my Input file name is InputFile and has 20000 records, I would like
the resulting Output files named as
InputFile_00000-01000
InputFile_01001-02000
..
..
InputFile_19001-20000

Upon running the split command, I now have 20 output files - each has
1000 records. I need to insert some text in each output file so to
convert them to .sql files. When I run the split command, the
resulting xaa, xab..etc files will obviously be in the same directory?
question - can anyone show me how to do this. An illustration is
below.

xaa (sample file)
100789A10
100789A11
100789A12
100789A13
...
...
887648A55
887648A56
887648A57

xaa.sql (resulting .sql file)
SELECT PART_NUMBER '|' DESCRIPTION FROM PARTS_TABLE where PART_NUMBER
IN (
'100789A10',
'100789A11',
'100789A12',
'100789A13',
'...      ',
'...      ',
'887648A55',
'887648A56',
'887648A57'
);

Thank you,
Sri

 
 
 

two unix questions: split and inserting text in each file.

Post by Bruce Burhan » Sat, 05 Oct 2002 17:43:45



Quote:> Dear Unix gurus,

> I have to solve the following types of problem - which I'm hoping can
> be solved more efficiently than my crude and inefficient ways of
> solving. Basically I have lists in the order 20,000 records which I
> need to run a SQL query involving 'WHERE' clause. My files/environment
> is a Sun (OS 5.6) unix box.
> SQL 'WHERE' clause can only accept 1000 records at a time - hence, I
> need to split the list in individual files of 1000 records each.

> Here is a sample input file - its a list, so each record is separated
> by a new line character and there are no duplicates in the list.

> InputFile
> ---------
> 100789A10
> 100789A11
> 100789A12
> 100789A13
> ...
> ...
> 887648A55
> 887648A56
> 887648A57

> When I use the split command as shown below
> split -1000 InputFile
> The resulting files get named as
> xaa
> xab
> xac
> ..
> ..
> xgh
> question 1: how can I make the starting and ending line numbers appear
> in the resulting output files. For example
> if my Input file name is InputFile and has 20000 records, I would like
> the resulting Output files named as
> InputFile_00000-01000
> InputFile_01001-02000
> ..
> ..
> InputFile_19001-20000

> Upon running the split command, I now have 20 output files - each has
> 1000 records. I need to insert some text in each output file so to
> convert them to .sql files. When I run the split command, the
> resulting xaa, xab..etc files will obviously be in the same directory?
> question - can anyone show me how to do this. An illustration is
> below.

> xaa (sample file)
> 100789A10
> 100789A11
> 100789A12
> 100789A13
> ...
> ...
> 887648A55
> 887648A56
> 887648A57

-------------------------------------------------------------------
#!/bin/bash

for file in `ls /dir ` ; do     ## notice backticks
 head -1 $file  > /tmp/D$$
 tail -1 $file >> /tmp/D$$
 sed 'H;$!d;${g;s/\n/-/g;s/ //;}' /tmp/D$$ > /tmp/E$$
k=`cat /tmp/E$$`    ## notice backticks
mv $file inputfile$k
done
______________________________________

this will yield-

inputfile-100789A10-100789A13,   etc....

Almost what you want. Replace  /dir with the full path to the directory with
xaa etc in it...(.And NOTHING else.)....

do chmod +rx on the script and put it in your $PATH

Should work for you.....May tackle second half
tommorow

Bruce<+>

- Show quoted text -

> xaa.sql (resulting .sql file)
> SELECT PART_NUMBER '|' DESCRIPTION FROM PARTS_TABLE where PART_NUMBER
> IN (
> '100789A10',
> '100789A11',
> '100789A12',
> '100789A13',
> '...      ',
> '...      ',
> '887648A55',
> '887648A56',
> '887648A57'
> );

> Thank you,
> Sri



 
 
 

two unix questions: split and inserting text in each file.

Post by Peter J. Ackl » Sat, 05 Oct 2002 20:40:47



> for file in `ls /dir ` ; do     ## notice backticks

Why not just

   for file in /dir/*; do

Quote:>  [...]
>   k=`cat /tmp/E$$`    ## notice backticks

Why not just

    k=$(</tmp/E$$)

Peter

--
No electrons used in the production of this message were harmed or
mistreated in any manner.

 
 
 

two unix questions: split and inserting text in each file.

Post by Doug Mill » Sat, 05 Oct 2002 22:22:09



>Dear Unix gurus,

>I have to solve the following types of problem - which I'm hoping can
>be solved more efficiently than my crude and inefficient ways of
>solving. Basically I have lists in the order 20,000 records which I
>need to run a SQL query involving 'WHERE' clause. My files/environment
>is a Sun (OS 5.6) unix box.
>SQL 'WHERE' clause can only accept 1000 records at a time - hence, I
>need to split the list in individual files of 1000 records each.

[major snippage]

Quote:>SELECT PART_NUMBER '|' DESCRIPTION FROM PARTS_TABLE where PART_NUMBER
>IN (
>'100789A10',
>'100789A11',
>'100789A12',
>'100789A13',
>'...      ',
>'...      ',
>'887648A55',
>'887648A56',
>'887648A57'
>);

A *far* better solution to this problem is to create a second SQL table (call
it parts_list, for example), load it with the list of part numbers you're
interested in, and rewrite your SQL query along these lines:

SELECT parts_list.part_number '|' parts_table.description
FROM parts_list JOIN parts_table
ON (parts_list.part_number = parts_table.part_number)
;

Regards,
        Doug Miller
--
Real email address is alphageek /at/ milmac /dot/ com

.. Ted Kennedy's car has killed more people than my gun.

 
 
 

two unix questions: split and inserting text in each file.

Post by F.T » Sun, 06 Oct 2002 03:15:05



> question 1: how can I make the starting and ending line numbers appear
> in the resulting output files. For example
> if my Input file name is InputFile and has 20000 records, I would like
> the resulting Output files named as
> InputFile_00000-01000
> InputFile_01001-02000
> ..
> ..
> InputFile_19001-20000

Sri,

Thou it might not be exactly what you are looking for, here's a
little _perl_ script. It reads the files from the command-line

$ splt file1 file2 ...

and outputs them into the same directory as file1, file2 etc. as

file1_00001-01000
file1_01001-02000
...
file2_00001-01000
file2_01001-02000
...

each file containing 1000 lines. One drawback however is, that
the last line of each input file has to be a blank one or else it
will be lost. Perhaps someone alse can work around that one, as
my perl skills are rather limited (for the protocol: this is one
of my first perl scripts ever, uh, so comments are welcome :-)).

#! /usr/bin/perl      # modyfy the path to suit your needs



    open FILE, $filename or die "can't open file: $!";

    $max=$#lines;
    $init=0;
    $step=1000; # modify to change the nr. of lines in each file

    while ($step<=$max) {
        &generate_output;
        &write_to_file;
        $max -= $step;
        $init += $step;
    }

    $step=$max;
    &generate_output;
    &write_to_file;

Quote:}

sub generate_output {
    for ($i=$init; $i<$init+$step; $i++) {
        $line_nr=$i+1;
        $out[$line_nr]=$lines[$i];
    }

Quote:}

sub write_to_file {
    $begin=(sprintf "%05d", $init+1);
    $end=(sprintf "%05d", $line_nr);
    open OUT, "> ${filename}_$begin-$end";

    close OUT;

Quote:}
> Upon running the split command, I now have 20 output files - each has
> 1000 records. I need to insert some text in each output file so to
> convert them to .sql files. When I run the split command, the
> resulting xaa, xab..etc files will obviously be in the same directory?
> question - can anyone show me how to do this.

Not sure if I understood correctly, but you could try that one:

---sqlify.awk---
BEGIN{print "SELECT PART_NUMBER '|' DESCRIPTION FROM PARTS_TABLE
where PART_NUMBER\nIN ("}
{print "'" $1"',"}
END{print ");"}
---

$ awk -f sqlify.awk file1_00001-01000

There is however a colon on the last line that should not be
there :-(

...
'887648A56',
'887648A57', <--
);

Anyway, HTH

Florian

 
 
 

two unix questions: split and inserting text in each file.

Post by John W. Krah » Sun, 06 Oct 2002 08:01:35




> > question 1: how can I make the starting and ending line numbers appear
> > in the resulting output files. For example
> > if my Input file name is InputFile and has 20000 records, I would like
> > the resulting Output files named as
> > InputFile_00000-01000
> > InputFile_01001-02000
> > ..
> > ..
> > InputFile_19001-20000

> Sri,

> Thou it might not be exactly what you are looking for, here's a
> little _perl_ script. It reads the files from the command-line

> $ splt file1 file2 ...

> and outputs them into the same directory as file1, file2 etc. as

> file1_00001-01000
> file1_01001-02000
> ...
> file2_00001-01000
> file2_01001-02000
> ...

> each file containing 1000 lines. One drawback however is, that
> the last line of each input file has to be a blank one or else it
> will be lost. Perhaps someone alse can work around that one, as
> my perl skills are rather limited (for the protocol: this is one
> of my first perl scripts ever, uh, so comments are welcome :-)).

> #! /usr/bin/perl      # modyfy the path to suit your needs



>     open FILE, $filename or die "can't open file: $!";

>     $max=$#lines;
>     $init=0;
>     $step=1000; # modify to change the nr. of lines in each file

>     while ($step<=$max) {
>         &generate_output;
>         &write_to_file;
>         $max -= $step;
>         $init += $step;
>     }

>     $step=$max;
>     &generate_output;
>     &write_to_file;
> }

> sub generate_output {
>     for ($i=$init; $i<$init+$step; $i++) {
>         $line_nr=$i+1;
>         $out[$line_nr]=$lines[$i];
>     }
> }

> sub write_to_file {
>     $begin=(sprintf "%05d", $init+1);
>     $end=(sprintf "%05d", $line_nr);
>     open OUT, "> ${filename}_$begin-$end";

>     close OUT;

> }

Here's a smaller one that works correctly.

#!/usr/bin/perl -w
use strict;

my $temp  = "temp-$$";
my $first = 1;
open OUT, ">$temp" or die "Cannot open $temp: $!";
while ( <> ) {
    print OUT;
    if ( $. % 1000 == 0 or eof ) {
        rename $temp, sprintf( "${ARGV}_%05d-%05d", $first, $. )
            or die "Cannot rename $temp: $!";
        open OUT, ">$temp" or die "Cannot open $temp: $!";
        $first = $. + 1;
        }
    }
unlink $temp" or die "Cannot unlink $temp: $!";

__END__

John
--
use Perl;
program
fulfillment

 
 
 

two unix questions: split and inserting text in each file.

Post by Chris F.A. Johnso » Sun, 06 Oct 2002 08:09:54




>>   k=`cat /tmp/E$$`    ## notice backticks

> Why not just

>     k=$(</tmp/E$$)

    Because it's not portable?

--
    Chris F.A. Johnson                        http://cfaj.freeshell.org
    ===================================================================
    My code (if any) in this post is copyright 2002, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License

 
 
 

two unix questions: split and inserting text in each file.

Post by shre » Sun, 06 Oct 2002 12:29:48


Dear Everyone,

Thanks to everyone for replying back. I'm grateful and hope someday
can be of service to you. Special thanks to Florian and John
Krahn..both of their proposed perl solutions for question 1 worked
perfectly.

With reference to my 2nd question which involved taking an input file
and transforming it into .sql file by adding SQL related lines to
it..to answer Florian's question - no there is not an error in the 2nd
to last line of my post.

...
'887648A56',
'887648A57' <-- note no comma
);

Florian's awk solution did exactly what script was intended to do. It
does add the last comma..removing it might be tricky. I guess I will
manually remove them after running the awk script.

Sri

 
 
 

two unix questions: split and inserting text in each file.

Post by Peter J. Ackl » Sun, 06 Oct 2002 17:58:45





> :
> : > Why not just
> : >
> : >     k=$(</tmp/E$$)
> :
> :     Because it's not portable?

> But this is comp.unix.ksh!  Can't you tell by the way that
> everyone recommends ksh-specific solutions to every problem?

ksh-specific?  That syntax works with ksh, bash, and zsh.

Peter

--
"If I haven't seen further it is because giants have
been standing in my way."     -- Peter J. Acklam

 
 
 

two unix questions: split and inserting text in each file.

Post by Peter J. Ackl » Sun, 06 Oct 2002 17:58:54





>>>   k=`cat /tmp/E$$`    ## notice backticks

>> Why not just

>>     k=$(</tmp/E$$)

>     Because it's not portable?

What version of bash does not support it?

Peter

--
"If I haven't seen further it is because giants have
been standing in my way."     -- Peter J. Acklam

 
 
 

two unix questions: split and inserting text in each file.

Post by Peter J. Ackl » Mon, 07 Oct 2002 03:20:22





> : >
> : > But this is comp.unix.ksh!  Can't you tell by the way that
> : > everyone recommends ksh-specific solutions to every problem?
> :
> : ksh-specific?  That syntax works with ksh, bash, and zsh.

> Bash and zsh are both ksh clones.  I know bash didn't start out
> that way, but it is now.

So what?  If people really want a "pure" Bourne shell script, then
they should say so.  This is a group for all UNIX shells.  Ask a
general question and you get a general answer.  It's not like ksh,
bash, and zsh are rare, quite the contrary, and specifically OP's
operating environment Solaris 5.6 has ksh.

Peter

--
"If I haven't seen further it is because giants have
been standing in my way."     -- Peter J. Acklam

 
 
 

two unix questions: split and inserting text in each file.

Post by Peter J. Ackl » Mon, 07 Oct 2002 03:27:47





> :

> : >
> : >> Why not just
> : >>
> : >>     k=$(</tmp/E$$)
> : >
> : >     Because it's not portable?
> :
> : What version of bash does not support it?

> Any version before 2.0.

> Besides, the original poster did NOT ask for a bash solution.

The original poster didn't ask for ANY particular solution...

Anyway, Bruce Burhans suggested a bash solution, and I commented
it as such.

Peter

--
"If I haven't seen further it is because giants have
been standing in my way."     -- Peter J. Acklam

 
 
 

two unix questions: split and inserting text in each file.

Post by Paul D. Smit » Tue, 08 Oct 2002 05:45:25



  db> Bash and zsh are both ksh clones.  I know bash didn't start out
  db> that way, but it is now.

I don't know what would prompt you to say that.  Bash and ksh support
the same functionality insofar as both of them attempt to conform to the
POSIX definition of the UNIX shell.

However, bash has many features that ksh doesn't provide, and there are
some features of ksh that bash doesn't provide.

--
-------------------------------------------------------------------------------

 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist
-------------------------------------------------------------------------------
   These are my opinions---Nortel Networks takes no responsibility for them.

 
 
 

two unix questions: split and inserting text in each file.

Post by Doug Mill » Wed, 09 Oct 2002 02:56:09






>> :

>> : >
>> : >> Why not just
>> : >>
>> : >>     k=$(</tmp/E$$)
>> : >
>> : >     Because it's not portable?
>> :
>> : What version of bash does not support it?

>> Any version before 2.0.

>> Besides, the original poster did NOT ask for a bash solution.

>The original poster didn't ask for ANY particular solution...

>Anyway, Bruce Burhans suggested a bash solution, and I commented
>it as such.

And I'm sitting here wondering why there's *any* discussion of which shell to
use for the solution, when the problem is so clearly one of deficient design
of the SQL query being used by the OP. The proper solution is to use SQL in
the way it's intended to be used, and then shell scripting ceases to be an
issue.

Regards,
        Doug Miller
--
Real email address is alphageek /at/ milmac /dot/ com

.. Ted Kennedy's car has killed more people than my gun.

 
 
 

two unix questions: split and inserting text in each file.

Post by Chris F.A. Johnso » Wed, 09 Oct 2002 04:41:12



> And I'm sitting here wondering why there's *any* discussion of which shell to
> use for the solution, when the problem is so clearly one of deficient design
> of the SQL query being used by the OP. The proper solution is to use SQL in
> the way it's intended to be used, and then shell scripting ceases to be an
> issue.

    Because this newsgroup is comp.unix.shell.

    As far as this group is concerned, the problem consisted of
    manipulating a number of files; for all we care, they could have
    been produced by monkeys.

    We don't know whether the OP even has access to SQL, or just
    received a number of files.

--
    Chris F.A. Johnson                        http://cfaj.freeshell.org
    ===================================================================
    My code (if any) in this post is copyright 2002, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License

 
 
 

1. How to split text file into two files that have ODD and EVEN pages.

The way I'd do this is to convert the file to Postscript using a2ps, then
use psselect to select the odd and even pages for printing.  Better yet,
use the psmandup script, which handles odd/even page selection and printing
automatically.

a2ps and psmandup are in the a2ps package, and psselect is in psutils.

You probably want to convert the file to Postscript first anyway, since
this gives you a lot of control over fonts, formatting and layout.

-Tom

2. Oracle install

3. need to split large text file into two, via every-other line

4. using sigqueue/sigaction

5. Help W/Unix Shell Script - File Splitting Based on Text String

6. sane problem - unable to detect scanner

7. tools to seperate one large text file into two small text files?

8. Modem Prob

9. Sed question (How to insert text into all the files)

10. insert text from one file into another (multiple files)

11. conver dos text file to unix text file

12. Help with splitting a text file to multiple files

13. Help with splitting a text file to multiple files !