Search code examples
javastringarraylisttokenhashtable

Trying to create a hashtable to get an arraylist from the text file - Java


I am trying to create a hashtable to get an ArrayList from my text file read it and then count it into an another text file. I should tokenize each word and get the keys and values by counting them. So far I am still at the beginning and I don't get what is wrong with my code, it seems there is no error but it doesn't connect to the text and get the ArrayList or simply my code is wrong. I would appreciate any help. Thanks.

This is the Map file


public class Map {
    public static String fileName= "C:Users\\ruken\\OneDrive\\Desktop\\workshop.txt";

    private ArrayList<String> arr = new ArrayList<String>();
    public ArrayList <String>getList () {
        return this.arr;
    }

    private Hashtable<String, Integer> map = new Hashtable<String, Integer>();

    public void load(String path) {
        try{
            FileReader f2 = new FileReader("C:Users\\ruken\\OneDrive\\Desktop\\workshop.txt");
            Scanner s = new Scanner(f2);
            while (s.hasNextLine()) {
                String line = s.nextLine();
                String[] words = line.split("\\s");
                for (int i=0;i<words.length; i++){
                    String word = words[i];
                    if (! word.isEmpty()){
                        System.out.println(word);
                        arr.add(word);
                    }
                }
            }
            f2.close();
            System.out.println("An error occurred");
        }
        catch(IOException ex1)
        {
            Collections.sort(arr);
            System.out.println("An error occurred.");
            for (String counter: arr) {
                System.out.println(counter);
            }
            ex1.printStackTrace();
        }

    }

    public static void main(String[] args) {
        Map m =new Map();
        m.load("C:Users\\ruken\\OneDrive\\Desktop\\out.txt");
    }


    public Object get(String word) {
        return null;
    }

    public void put(String word, int i) {

    }


}

This is the Reduce file

package com.company;

import java.io.*;
import java.util.*;

public class Reduce {

    private Hashtable<String, Integer> map=new Hashtable< String, Integer>();

    public Hashtable < String, Integer> getHashTable () {
        return map;
    }

    public void setHashTable ( Hashtable < String, Integer> map){
        this.map =map;
    }

    public void findMin () {

    }

    public void findMax() {

    }

    public void sort (ArrayList<String> arr) throws IOException {
        Collections.sort(arr);
        Iterator it1 = arr.iterator();
        while (it1.hasNext()) {
            String word = it1.next().toString();
            System.out.println(word);

        }
    }
    //constructors
    public void reduce (ArrayList<String> words) {
        Iterator<String> it1 =words.iterator();
        while (it1.hasNext()) {
            String word=it1.next();
            System.out.println (word);
            if (map.containsKey(word)) {
                map.put(word, 1);
            }
            else {
                int count = map.get(word);
                map.put(word, count+1);
            }

            System.out.println( map.containsValue(word));


            }
        }


    }

Here is a part of workshop.txt. It is s basic simple text

" Acknowledgements

I would like to thank Carl Fleischhauer and Prosser Gifford for the opportunity to learn about areas of human activity unknown to me a scant ten months ago, and the David and Lucile Packard Foundation for supporting that opportunity. The help given by others is acknowledged on a separate page.

                                                      19 October 1992


           ***   ***   ***   ******   ***   ***   ***


                          INTRODUCTION

The Workshop on Electronic Texts (1) drew together representatives of various projects and interest groups to compare ideas, beliefs, experiences, and, in particular, methods of placing and presenting historical textual materials in computerized form. Most attendees gained much in insight and outlook from the event. But the assembly did not form a new nation, or, to put it another way, the diversity of projects and interests was too great to draw the representatives into a cohesive, action-oriented body.(2)"


Solution

  • Counting word frequency in text can be accomplished using the java stream API

    Here is my implementation, followed by explanatory notes.

    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import java.util.Arrays;
    import java.util.Hashtable;
    import java.util.Map;
    import java.util.function.BiConsumer;
    import java.util.function.BinaryOperator;
    import java.util.function.Function;
    import java.util.function.Supplier;
    import java.util.stream.Collectors;
    
    public class WordFreq {
    
        public static void main(String[] args) {
            Path path = Paths.get("workshop.txt");
            Function<String, String> keyMapper = Function.identity();
            Function<String, Integer> valueMapper = (word) -> Integer.valueOf(1);
            BinaryOperator<Integer> mergeFunction = (a, b) -> Integer.valueOf(a.intValue() + b.intValue());
            Supplier<Hashtable<String, Integer>> mapSupplier = () -> new Hashtable<>();
            try {
                Map<String, Integer> map = Files.lines(path)
                     .flatMap(line -> Arrays.stream(line.split("\\b")))
                     .filter(word -> word.matches("^\\w+$"))
                     .map(word -> word.toLowerCase())
                     .collect(Collectors.toMap(keyMapper, valueMapper, mergeFunction, mapSupplier));
                BiConsumer<String, Integer> action = (k, v) -> System.out.printf("%3d %s%n", v, k);
                map.forEach(action);
            }
            catch (IOException xIo) {
                xIo.printStackTrace();
            }
        }
    }
    
    • Method lines() in class java.nio.file.Files creates a stream of the lines of text in the file. In this case the file is your workshop.txt file.
    • For each line of the file that is read, I split it into words using method split() in class java.lang.String and convert the array returned by method split() into another stream.
    • Actually each line of text is split at every word boundary so the array of words that method split() returns may contain strings that aren't really words. Therefore I filter the "words" in order to extract only real words.
    • Then I convert each word to lower case so that my final map will be case-insensitive. In other words, the word The and the word the will be considered the same word.
    • Finally I create a Map where the map key is a distinct word in the text of file workshop.txt and the map value is an Integer which is the number of occurrences of that word in the text.

    Since you stipulated that the Map must be a Hashtable, I explicitly created a Hashtable to store the results of the collect operation on the stream.

    The last part of the above code displays the contents of the Hashtable.