large file and dilemma

large file and dilemma

Post by Kira » Thu, 05 Mar 1998 04:00:00



I have a very large file, our web server access_log which is around 320 MB
and another one which is around 290 MB.  I need to erase the first
few (around 1 million) lines of the top of the file.  I tried using vi,
but it gave me an error message and crashed.  I tried to use a few other
editors like, pico or jot and they did the same thing.  I was wondering if
anyone has any suggestion how to edit these large files.

Thanks

Kiran

 
 
 

large file and dilemma

Post by Arthur Hag » Thu, 05 Mar 1998 04:00:00



Quote:

> I have a very large file, our web server access_log which is around 320 MB
> and another one which is around 290 MB.  I need to erase the first
> few (around 1 million) lines of the top of the file.  I tried using vi,
> but it gave me an error message and crashed.  I tried to use a few other
> editors like, pico or jot and they did the same thing.  I was wondering if
> anyone has any suggestion how to edit these large files.

This will keep just the last 100000 lines of the file:

last -100000 filename >/path/to/new_file
mv /path/to/new_file filename

Regards,
--
*Art

 
 
 

large file and dilemma

Post by Walter Robers » Fri, 06 Mar 1998 04:00:00




:> I have a very large file, our web server access_log which is around 320 MB
:> and another one which is around 290 MB.  I need to erase the first
:> few (around 1 million) lines of the top of the file.  I tried using vi,
:> but it gave me an error message and crashed.  I tried to use a few other
:> editors like, pico or jot and they did the same thing.  I was wondering if
:> anyone has any suggestion how to edit these large files.
:
:This will keep just the last 100000 lines of the file:
:
:last -100000 filename >/path/to/new_file
:mv /path/to/new_file filename

I would modify that slightly to

last -100000 filename >/path/to/new_file
cat /path/to/new_file > filename
/bin/rm /path/to/new_file

as this will preserve the ownership and permissions, where Art's version
would not.

However, I would also worry about locking. If there is any chance that
the server is going to be writting new records while you are working with
the file, then you must take a different approach in order to not
lose the newest records. You must either stop the server, or else convince
it to start writing to a new log file on the fly.

Unfortunately the NCSA 1.4 server doesn't offer a way to change the
log file on the fly, so I usually stop the server. [I haven't had time
to install the 1.5 server.] I haven't ever looked to see if the
netscape servers can handle starting a new file.

A number of applications will close and re-open their log file if you
send them a HUP signal. If your WWW server will do that, then you can [e.g.]

mv filename /path/to/new_file/on/same/filesystem
killall -HUP server_name

This works because when you 'mv' on the same filesystem, only the
directory entry changes, and open connections are preserved.
The process will continue to write to the now-renamed file until
it is told otherwise. Then when it gets the HUP signal, it will close
that file and open 'filename' again and write to that... but that's
now a different file than before.

Note: Not all programs are set up to use the HUP signal in this manner!
Generally speaking, programs that are not set up to treat HUP this way,
will die when sent a HUP signal -- so don't do your experimentation
during a mission-critical time!

 
 
 

large file and dilemma

Post by Vibhavasu Vuppa » Fri, 06 Mar 1998 04:00:00


most probably you meant 'tail' and not 'last'.

- vasu




: :
: :This will keep just the last 100000 lines of the file:
: :
: :last -100000 filename >/path/to/new_file
: :mv /path/to/new_file filename

: I would modify that slightly to

: last -100000 filename >/path/to/new_file
: cat /path/to/new_file > filename
: /bin/rm /path/to/new_file

: as this will preserve the ownership and permissions, where Art's version
: would not.

 
 
 

large file and dilemma

Post by Arthur Hag » Fri, 06 Mar 1998 04:00:00



> most probably you meant 'tail' and not 'last'.

Hehe, speaking for both Walter and myself, yes, we did.
C|N>K

--
*Art

 
 
 

large file and dilemma

Post by Lyle Batema » Fri, 06 Mar 1998 04:00:00



> I have a very large file, our web server access_log which is around 320 MB
> and another one which is around 290 MB.  I need to erase the first
> few (around 1 million) lines of the top of the file.  I tried using vi,
> but it gave me an error message and crashed.  I tried to use a few other
> editors like, pico or jot and they did the same thing.  I was wondering if
> anyone has any suggestion how to edit these large files.

> Thanks

> Kiran

You could acheive the result you want with the command 'tail -1000 filename >
new_filename'  This will store the last thousand lines of filename in
new_filename.  Another option would be a stream editor like awk or sed - the
advantage with these is that you can use expressions to test each line and
decide if you want to keep it based on its content rather than its position
in the file.  Awk and sed scripts for this would be a bit more complicated,
but also more functional.

--
Sincerely,

Lyle W. Bateman
System Consultant
PECC Ltd.

NOTE: My views are my own, and do not represent the views
of my employer, unless explicitly stated.

 
 
 

large file and dilemma

Post by Scott Lurnd » Fri, 06 Mar 1998 04:00:00


|>

|> >
|> > most probably you meant 'tail' and not 'last'.
|>
|> Hehe, speaking for both Walter and myself, yes, we did.
|> C|N>K
|>
|> --
|> *Art

Unfortunately, when tail is used relative to the end of the file (e.g. -100000),
only the last 256k bytes of the file will be examined;  so, the only way
you can get the last 100,000 lines is if each line is about 2.5 bytes (see man tail).

scott

 
 
 

large file and dilemma

Post by Larry Hunte » Fri, 06 Mar 1998 04:00:00


Kiran asks:

  I have a very large file, our web server access_log which is around 320 MB
  and another one which is around 290 MB.

Emacs can handle files more or less as large as your swap space, at least
since version 19 (now at 20.2).  

Larry

 
 
 

large file and dilemma

Post by Peter Shenki » Fri, 06 Mar 1998 04:00:00



> Kiran asks:

>   I have a very large file, our web server access_log which is around 320 MB
>   and another one which is around 290 MB.

> Emacs can handle files more or less as large as your swap space, at least
> since version 19 (now at 20.2).

So can vi if you set TMPDIR equal to a directory which has enough
room for a backup.

awk would be easy:

awk '(NR>1000000) { print $0 }' < input > output

--
************** In memoriam, Grandpa Jones, 1913-1998, R.I.P.
**************
* Peter S. Shenkin; Chemistry, Columbia U.; 3000 Broadway, Mail Code
3153 *

***
*MacroModel WWW page:
http://www.columbia.edu/cu/chemistry/mmod/mmod.html *

 
 
 

large file and dilemma

Post by Peter Shenki » Fri, 06 Mar 1998 04:00:00




> |>

> |> >
> |> > most probably you meant 'tail' and not 'last'.
> |>
> |> Hehe, speaking for both Walter and myself, yes, we did.
> |> C|N>K
> |>
> |> --
> |> *Art

> Unfortunately, when tail is used relative to the end of the file (e.g. -100000),
> only the last 256k bytes of the file will be examined;  so, the only way
> you can get the last 100,000 lines is if each line is about 2.5 bytes (see man tail).

In another posting, I suggested awk.  But I believe tail can be used
this way with a "+" argument:

tail +1000001 < input > output

should give you all lines starting with line 1000001.

        -P.

--
************** In memoriam, Grandpa Jones, 1913-1998, R.I.P.
**************
* Peter S. Shenkin; Chemistry, Columbia U.; 3000 Broadway, Mail Code
3153 *

***
*MacroModel WWW page:
http://www.columbia.edu/cu/chemistry/mmod/mmod.html *

 
 
 

large file and dilemma

Post by Serge Pachkovs » Sat, 07 Mar 1998 04:00:00


: So can vi if you set TMPDIR equal to a directory which has enough
: room for a backup.

Quoting from 'man vi':

     vi has a limit of 15,687,678 editable lines.  Attempts to edit or create
     files larger than this limit will cause vi to terminate with an
     appropriate error message.

For a 320Mb file, this means that the lines in the said file should be longer
than 21 character on average, or vi will fail.

Regards,

/Serge.P

--

Russian guy from the Zurich university...

 
 
 

large file and dilemma

Post by Peter Shenki » Sat, 07 Mar 1998 04:00:00




> : So can vi if you set TMPDIR equal to a directory which has enough
> : room for a backup.

> Quoting from 'man vi':

>      vi has a limit of 15,687,678 editable lines....

I stand corrected.  However, my "awk" and "tail +<n>" solutions
should work, as should the "sed" solution recently posted.

        -P.

--
************** In memoriam, Grandpa Jones, 1913-1998, R.I.P.
**************
* Peter S. Shenkin; Chemistry, Columbia U.; 3000 Broadway, Mail Code
3153 *

***
*MacroModel WWW page:
http://www.columbia.edu/cu/chemistry/mmod/mmod.html *