Search code examples
lispracket

How do I convert a string to a hash ( key: character, value: list of the index of the character ) in Racket?


I want to convert a string to a hash table, whose keys are the characters in the string, and values are are the list of index of this character in the string. Is there some elegant way to do this?

For example:

String: "abcaad"

Hash table: { a: [0 3 4] , b: [1] , c: [2] , d: [5] }

I tried to write one with hash-set! and iterate the string from the end to avoid append which is slower than insert at begin. But I hope there will be some more elegant way (better if I can get a vector instead of a list for each hash value):

(define (s->hash s)
  (define h (make-hash))
  (for ([c (in-string s (sub1 (string-length s)) -1 -1)]
        [i (in-range (sub1 (string-length s)) -1 -1)])
    (hash-set! h c (cons i (hash-ref h c empty))))
  h)

Solution

  • We can simplify the solution a bit by using in-indexed:

    ; procedural programming
    (define (s->hash s)
      (define ht (make-hash))
      (for (((c i) (in-indexed s)))
        (hash-set! ht c (cons i (hash-ref ht c empty))))
      ht)
    

    Or we can write a functional programming style solution by composing existing built-in procedures and using hash-update to get rid of the hash-set! mutation operation, which is frowned upon when programming in Scheme:

    ; functional programming
    (define (s->hash s)
      (for/fold ((ht #hash()))
                (((key index) (in-indexed s)))
        (hash-update ht
                     key
                     (lambda (acc) (cons index acc))
                     '())))
    

    Either solution returns a list of indexes in reverse order, if this is a problem you can reverse the lists later as shown below, or use append instead of cons (discouraged for performance reasons.)

    ; procedural programming
    (define (reverse-index-lists ht)
      (hash-for-each
       ht
       (lambda (key value) (hash-set! ht key (reverse value))))
      ht)
    
    ; functional programming
    (define (reverse-index-lists ht)
      (foldl (lambda (key acc) (hash-update acc key reverse))
             ht
             (hash-keys ht)))
    

    Using a standard vector is not a good fit for this problem, because we don't know beforehand the size of each vector (in Scheme vectors are more like arrays in other programming languages, in the sense that they cannot be resized). Let's see the solutions in action:

    (s->hash "abcaad")
    '#hash((#\a . (4 3 0)) (#\b . (1)) (#\c . (2)) (#\d . (5)))
    
    (reverse-index-lists (s->hash "abcaad"))
    '#hash((#\a . (0 3 4)) (#\b . (1)) (#\c . (2)) (#\d . (5)))