I want to create a Java application to identify duplicates. So far I can find duplicates only by name, but I also need size, file type, and maybe content. This is my code so far, using a HashMap
:
public static void find(Map<String, List<String>> lists, File dir) {
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
find(lists, f);
} else {
String hash = f.getName() + f.length();
List<String> list = lists.get(hash);
if (list == null) {
list = new LinkedList<String>();
lists.put(hash, list);
}
list.add(f.getAbsolutePath());
}
}
}
I used MessageDigest and checked some files and find the duplicates according to all the criteria I have listed in the title and description. Thank you all.
private static MessageDigest messageDigest;
static {
try {
messageDigest = MessageDigest.getInstance("SHA-512");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("cannot initialize SHA-512 hash function", e);
}
}
and this is the result after implementation in the search code for duplicates
public static void find(Map<String, List<String>> lists, File dir) {
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
find(lists, f);
} else {
try{
FileInputStream fi = new FileInputStream(f);
byte fileData[] = new byte[(int) f.length()];
fi.read(fileData);
fi.close();
//Crearea id unic hash pentru fisierul curent
String hash = new BigInteger(1, messageDigest.digest(fileData)).toString(16);
List<String> list = lists.get(hash);
if (list == null) {
list = new LinkedList<String>();
}
//Adăugați calea către listă
list.add(f.getAbsolutePath());
//Adauga lista actualizată la tabelul Hash
lists.put(hash, list);
}catch (IOException e) {
throw new RuntimeException("cannot read file " + f.getAbsolutePath(), e);
}
}
}
}