Search code examples
javaalgorithmfilehashmaptf-idf

Count frequency of a string individually from query


I want to search for a query from a file named a.java. If my query is String name I want to get the frequency of a string individually from the query from the text file. First I have to count the frequency of String and then name individually and then add the frequency both. how can I implement this program in java platform?

public class Tf2 {
Integer k;
int totalword = 0;
int totalfile, containwordfile = 0;
Map<String, Integer> documentToCount = new HashMap<>();
File file = new File("H:/java");
File[] files = file.listFiles();
public void Count(String word) {
   File[] files = file.listFiles();
    Integer count = 0;
    for (File f : files) {
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader(f));
            count = documentToCount.get(word);

            documentToCount.clear();

            String line;
            while ((line = br.readLine()) != null) {
                String term[] = line.trim().replaceAll("[^a-zA-Z0-9 ]", " ").toLowerCase().split(" ");


                for (String terms : term) {
                    totalword++;
                    if (count == null) {
                        count = 0;
                    }
                    if (documentToCount.containsKey(word)) {

                        count = documentToCount.get(word);
                        documentToCount.put(terms, count + 1);
                    } else {
                        documentToCount.put(terms, 1);

                    }

                }

            }
          k = documentToCount.get(word);

            if (documentToCount.get(word) != null) {
                containwordfile++;
       
               System.out.println("" + k);

            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
} public static void main(String[] args) throws IOException {Tf2  ob = new Tf2();String query="String name";ob.Count(query);
}}

I tried this with hashmap. but it cannot count the frequency of the query individually.


Solution

  • If I have a file that contains a line "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world".I want to search a query "edited Wikipedia volunteers ".then my program first count the frequency edited from the text file, then count Wikipedia frequency and then volunteers frequency, and at last it sum up all the frequency. can I solve it by using hashmap?

    You can do it as follows:

    import java.util.HashMap;
    import java.util.Map;
    
    public class Main {
        public static void main(String[] args) {
            // The given string
            String str = "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world.";
    
            // The query string
            String query = "edited Wikipedia volunteers";
    
            // Split the given string and the query string on space
            String[] strArr = str.split("\\s+");
            String[] queryArr = query.split("\\s+");
    
            // Map to hold the frequency of each word of query in the string
            Map<String, Integer> map = new HashMap<>();
    
            for (String q : queryArr) {
                for (String s : strArr) {
                    if (q.equals(s)) {
                        map.put(q, map.getOrDefault(q, 0) + 1);
                    }
                }
            }
    
            // Display the map
            System.out.println(map);
    
            // Get the sum of all frequencies
            int sumFrequencies = map.values().stream().mapToInt(Integer::intValue).sum();
    
            System.out.println("Sum of frequencies: " + sumFrequencies);
        }
    }
    

    Output:

    {edited=1, Wikipedia=1, volunteers=1}
    Sum of frequencies: 3
    

    Check the documentation of Map#getOrDefault to learn more about it.

    Update

    In the original answer, I've used the Java Stream API to get the sum of values. Given below is an alternative way of doing it:

    // Get the sum of all frequencies
    int sumFrequencies = 0;
    for (int value : map.values()) {
        sumFrequencies += value;
    }
    

    Your other question is:

    if I have multiple files in a folder then how can i know of how many times is this query os occurring in which file

    You can create a Map<String, Map<String, Integer>> in which the key will be the name of the file and the value (i.e. Map<String, Integer>) will be the frequency map for the file. I've already shown above the algorithm to create this frequency map. All you will have to do is to loop through the list of files and populate this map (Map<String, Map<String, Integer>>).