Search code examples
javahashmapduplicatessize

Java: How to search duplicate files in a folder not only by name ,but also by size and content?


I want to create a Java application to identify duplicates. So far I can find duplicates only by name, but I also need size, file type, and maybe content. This is my code so far, using a HashMap:

public static void find(Map<String, List<String>> lists, File dir) {
    for (File f : dir.listFiles()) {
        if (f.isDirectory()) {
            find(lists, f);
        } else {
            String hash = f.getName() + f.length();
            List<String> list = lists.get(hash);
            if (list == null) {
                list = new LinkedList<String>();
                lists.put(hash, list);
            }
            list.add(f.getAbsolutePath());
        }
    }
}

Solution

  • I used MessageDigest and checked some files and find the duplicates according to all the criteria I have listed in the title and description. Thank you all.

    private static MessageDigest messageDigest;
    static {
        try {
            messageDigest = MessageDigest.getInstance("SHA-512");
        } catch (NoSuchAlgorithmException e) {
            throw new RuntimeException("cannot initialize SHA-512 hash function", e);
        }
    }   
    

    and this is the result after implementation in the search code for duplicates

    public static void find(Map<String, List<String>> lists, File dir) {
    for (File f : dir.listFiles()) {
      if (f.isDirectory()) {
        find(lists, f);
      } else {
          try{
              FileInputStream fi = new FileInputStream(f);
              byte fileData[] = new byte[(int) f.length()];
                    fi.read(fileData);
                    fi.close();
                    //Crearea id unic hash pentru fisierul curent
                    String hash = new BigInteger(1, messageDigest.digest(fileData)).toString(16);
                    List<String> list = lists.get(hash);
                    if (list == null) {
                        list = new LinkedList<String>();
                    }
                    //Adăugați calea către listă
                    list.add(f.getAbsolutePath());
                    //Adauga lista actualizată la tabelul Hash
                    lists.put(hash, list);
    
          }catch (IOException e) {
                    throw new RuntimeException("cannot read file " + f.getAbsolutePath(), e);
                }
    
      }
    }
    

    }