Search code examples
javastringdesign-patterns

When is it beneficial to flyweight Strings in Java?


I understand the basic idea of java's String interning, but I'm trying to figure out which situations it happens in, and which I would need to do my own flyweighting.

Somewhat related:

Together they tell me that String s = "foo" is good and String s = new String("foo") is bad but there's no mention of any other situations.

In particular, if I parse a file (say a csv) that has a lot of repeated values, will Java's string interning cover me or do I need to do something myself? I've gotten conflicting advice about whether or not String interning applies here in my other question

HashMap alternatives for memory-efficient data storage


The full answer came in several fragments, so I'll sum up here:

By default, java only interns strings that are known at compile-time. String.intern(String) can be used at runtime, but it doesn't perform very well, so it's only appropriate for smaller numbers of Strings that you're sure will be repeated a lot. For larger sets of Strings it's Guava to the rescue (see ColinD's answer).


Solution

  • Don't use String.intern() in your code. At least not if you might get 20 or more different strings. In my experience using String.intern slows down the whole application when you have a few millions strings.

    To avoid duplicated String objects, just use a HashMap.

    private final Map<String, String> pool = new HashMap<String, String>();
    
    private void interned(String s) {
      String interned = pool.get(s);
      if (interned != null) {
        return interned;
      pool.put(s, s);
      return s;
    }
    
    private void readFile(CsvFile csvFile) {
      for (List<String> row : csvFile) {
        for (int i = 0; i < row.size(); i++) {
          row.set(i, interned(row.get(i)));
          // further process the row
        }
      }
      pool.clear(); // allow the garbage collector to clean up
    }
    

    With that code you can avoid duplicate strings for one CSV file. If you need to avoid them on a larger scale, call pool.clear() in another place.