Finding identical files

Finding identical files

Post by Alexander Sclearu » Thu, 04 May 2000 04:00:00



 Hello,

How can I find all identical files (identical - 'cmp' command returns 0)
from one directory tree?

I already have a solution for that: 3 scripts, first one finds (using
find) files in the directory and calls the second one with the name of
the find it found (-exec option), the second one does the same and
executes the third one, which in turn compares the files and returns the
result.

I suppose that it is not the best solution (a lot of forks), so if
somebody have any ideas how to improve it - let me know, please

CY,
Alexander Sclearuc

Sent via Deja.com http://www.deja.com/
Before you buy.

 
 
 

Finding identical files

Post by Michael Sternber » Thu, 04 May 2000 04:00:00



> How can I find all identical files (identical - 'cmp' command returns 0)
> from one directory tree?

I wrote a perl script to identify disk hogging files; it shows _possible_ dups
based on name and size [short of identifying links or running cmp] but you may
find it helpful anyway:

    http://www.phys.uni-paderborn.de/~stern/filestat/filestat
    http://www.phys.uni-paderborn.de/~stern/filestat/filestat.sample

Regards,
--
Michael Sternberg                        | Uni-GH Paderborn
http://www.phys.uni-paderborn.de/~stern/ | FB6 Theoretische Physik
phone: +49-(0)5251-60-2329   fax: -3435  | 33098 Paderborn, Germany
"Who disturrrbs me at this time?"  << Zaphod Beeblebrox IV >>     <*>

 
 
 

Finding identical files

Post by Barry Margoli » Thu, 04 May 2000 04:00:00




Quote:> Hello,

>How can I find all identical files (identical - 'cmp' command returns 0)
>from one directory tree?

>I already have a solution for that: 3 scripts, first one finds (using
>find) files in the directory and calls the second one with the name of
>the find it found (-exec option), the second one does the same and
>executes the third one, which in turn compares the files and returns the
>result.

>I suppose that it is not the best solution (a lot of forks), so if
>somebody have any ideas how to improve it - let me know, please

What I would do is:

find <directory> -type f -print | xargs sum /dev/null | sort > tempfile

Then find all the lines with the same checksum (they'll be adjacent because
of the sort) and cmp the two files.  An awk script would be appropriate for
this step.

--

Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

 
 
 

Finding identical files

Post by Ken Pizzi » Fri, 05 May 2000 04:00:00


On Wed, 03 May 2000 19:12:43 GMT,


>How can I find all identical files (identical - 'cmp' command returns 0)
>from one directory tree?

Below is what I've used.  Its basic algorithm is to look at
file sizes and then only do a compare on files of the same
size (since files of different sizes will never cmp the same).

                --Ken Pizzini

#! /usr/bin/perl -w

# find-eq-files -- find files that are equal
# Ken Pizzini, 1999-02-09, based on a sh+awk+sed version by
# Carlos Duarte, 990131/990201
# updated (simplified) 1999-09-18, kpp

# usage: find-eq-files [dirs] [find options]
# example: find-eq-files ~/tmp ~/sources

use File::Find qw(find);
use File::Compare qw(compare); # from CPAN
use strict;

my %files_by_size = ();


for my $to_check (values %files_by_size) {




            if (compare($first, $other) == 0)

            else

        }


    }

Quote:}

 
 
 

1. find identical files

Due to superb stupidness of mine I created copies from about 2000 small
files (approx. 5k, some up to 5M) and now I need to remove all duplicates.

Whats the best approach ?

best idea I had would be running md5sum on each file and run diff on files
with same hash. Any better idea ?

thnx.
peter

--
peter pilsl

http://www.goldfisch.at

2. termio errors on console after adding ifconfig

3. find multiple identical files?

4. Apache 1.2.4 as SSL proxy

5. Identical lines in multiple (large) files, how to find?

6. Forms submit to CGI question

7. Finding files that are identical (in Unix)

8. Click here for Extreme Net Toolz

9. - Two questions: find files with specific permission, find files that belong to..

10. removing multiple identical lines from a file

11. File restored to two dif. servers: Date and size identical but cksum different?

12. Identical files?

13. How install two identical linux file systems