Search code examples
smalltalksqueakpharofile-typevisualworks

How to identify binary and text files using Smalltalk


I want to verify that a given file in a path is of type text file, i.e. not binary, i.e. readable by a human. I guess reading first characters and check each character with :

  • isAlphaNumeric
  • isSpecial
  • isSeparator
  • isOctetCharacter ???

but joining all those testing methods with and: [ ... and: [ ... and: [ ] ] ] seems not to be very smalltalkish. Any suggestion for a more elegant way?

(There is a Python version here How to identify binary and text files using Python? which could be useful but syntax and implementation looks like C.)


Solution

  • only heuristics; you can never be really certain...

    For ascii, the following may do:

    |isPlausibleAscii numChecked|
    
    isPlausibleAscii := 
        [:char |
            ((char codePoint between:32 and:127)
            or:[ char isSeparator ])
        ].
    
    numChecked := text size min: 1024.
    isPossiblyText := text from:1 to:numChecked conform: isPlausibleAscii.
    

    For unicode (UTF8 ?) things become more difficult; you could then try to convert. If there is a conversion error, assume binary.

    PS: if you don't have from:to:conform:, replace by (copyFrom:to:) conform:

    PPS: if you don't have conform: , try allSatisfy: