Search code examples
unicodecharacter-encodingschemechez-schemer6rs

Encode "ä", "ö", "ü" and "ß" in scheme (get german text from file)


I'm new to scheme and try to build a simple application in scheme. I want to use simple text files as a database, so the program basically load text from a file, do something with it and save the result to a (possibly different) file. My problem is, that the text I'm working with, is in german, so i need the "latin-1-symbols" Ä/ä, Ö/ö, Ü/ü and ß. I am using chez scheme and i read about the transcoders (R6RS), but i can't get it working.

Could someone please give me a simple example, how to get (input-port) and save (output-port) text in latin-1-codec in chez scheme?

Thank you very much.

With the information provided by R6RS, R6RS-lib and the Chez-User-Guide I couldn't solve the problem. On the Internet I couldn't find any further explanation. If someone know a good source for learning material, I would appreciate it.


Solution

  • With strict R6RS, you have to use the 4-argument version of open-file-input-port/open-file-output-port with a transcoder returned by (make-transcoder (latin-1-codec)) to get a textual input port that converts from ISO-8859-1. Unfortunately, there's no version of call-with-input-file, with-input-from-file, etc. that lets you specify a transcoder, so you have to remember to close the returned port manually. Chez has a current-transcoder parameter that can used to change the default one, allowing you to use those functions, though.

    Examples for input (The same concepts apply for output):

    #!r6rs
    
    (import (rnrs base)
            (rnrs io simple)
            (rnrs io ports)
            (only (chezscheme) current-transcoder parameterize))
    
    (define filename "sample.txt") ; A file with Latin-1 text
    
    ;; Read one line from a port and echo it to current-output-port
    (define (echo-line p)
      (display (get-line p))
      (newline))
    
    ;;; R6RS
    (define p (open-file-input-port filename
                                    (file-options)
                                    (buffer-mode block)
                                    (make-transcoder (latin-1-codec))))
    (echo-line p)
    (close-port p)
    
    ;;; Chez specific
    (parameterize ((current-transcoder (make-transcoder (latin-1-codec))))
      (call-with-input-file filename echo-line))