findDuplicates.pl #!/usr/bin/perl # ##################################### # # Filename: findDuplicates.pl # Author: Jeremy Pyne # Licence: CC:BY/NC/SA http://creativecommons.org/licenses/by-nc-sa/3.0/ # Last Update: 02/10/2010 # Version: 1.5 # Requires: perl # Description: # This script will look through a directory of files and find and duplicates. It will then # return a list of any such duplicates it finds. This is done by calculating the md5 checksum # of each file and recording it along with the filename. Then the list is sorted by the checksum # and read in line by line. Any time multiple records in a row share a checksum the file names # are written out to stdout. As a result all empty files will be flagged as duplicates as well. # ##################################### # # Get the path from the command line. Thos could be expanded to provide more granular control. $dir = shift; # Set up the location of the temp files. $file = "/tmp/pictures.txt"; $sort = "/tmp/sorted.txt"; # Find all files in the selected directory and calculate their md5sum. This is by far the longest step. `find "$dir" -type file -print0 | xargs -0 md5 -r > $file`; # Sort the resulting file by the md5sum's. `sort $file > $sort`; open FILE, "<$sort" or die $!; my $newmd5; my $newfile; my $lastmd5; my $lastfile; my $lastprint = 0; # Read each line fromt he file. while() { # Extract the md5sum and the filename. $_ =~ /([^ ]+) (.+)/; $newmd5 = $1; $newfile = $2; # If this is the same checksum as the last file then flag it. if($1 =~ $lastmd5) { # If this is the first duplicate for this checksup then print the first file's name. if(!$lastprint) { print("$lastfile\n"); $lastprint = 1; } # Print the conflicting file's name/ print("$newfile\n"); } else { $lastprint = 0; } # Record the last filename and checksup for future testing. $lastmd5 = $newmd5; $lastfile = $newfile; } close(FILE); # Remove the temp files. unlink($file); unlink($sort);
Find Duplicate Files in the Terminal
I posted an Automator Service last week for finding duplicate photo's in an iPhoto Library. Here is a slightly modified version of the internal script it uses. You can save this script and run it in a terminal to find duplicate file of any kind in any directory tree of your choice. This can also be included in Automater actions itself with the Shell Script action.
2 comments:
I tried the Automator Service that you posted for finding duplicate photo's in an iPhoto Library.Its good to see that you provided modification and that also for free. Thanks.digital signature PDF
i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.
Post a Comment