Search code examples
haskelltextfunctional-programmingobfuscationpurely-functional

Haskell Selective Text Obfuscation


I would like to obfuscate a text file report without obscuring certain keywords, like report titles, column headers, etc. I've built such a program using newLisp. I'm trying to implement the functionality in Haskell from scratch. Here is the code I've got so far, which compiles and runs successfully for the case of simple obfuscation.

module Main where

import Data.Char (isAlpha, isNumber, isUpper, toUpper)
import System.Environment (getArgs)
import System.Random (getStdGen, randomR, StdGen)

helpMessage = [ "Usage: cat filename(s) | obfuscate [-x filename] > filename",
  "",
  "Obfuscates text files. This obliterates the text--there is no recovery. This",
  "is not encryption. It's simple, if slow, obfuscation.",
  "",
  "To include a list of words not to obfuscate, use the -x option. List one word",
  "per line in the file.",
  "" ]

data CLOpts = CLOpts { help           :: Bool
                     , exceptionFileP :: Bool
                     , exceptionFile  :: String }

main = do
  args <- getArgs
  if length args > 0
  then do let opts = parseCL args CLOpts { help=False, exceptionFileP=False, exceptionFile="" }
          if help opts
          then do putStrLn $ unlines helpMessage
          else do if exceptionFileP opts
                  then do exceptions <- readFile $ exceptionFile opts
                          obf complexObfuscation $ lines exceptions
                  else do obf simpleObfuscation []
  else do obf simpleObfuscation []
  where obf f xs = do
          g <- getStdGen
          c <- getContents
          putStrLn $ f xs g c

parseCL :: [String] -> CLOpts -> CLOpts
parseCL []          opts = opts
parseCL ("-x":f:xs) opts = parseCL xs opts { exceptionFileP=True, exceptionFile=f }
parseCL      (_:xs) opts = parseCL xs opts { help=True }

simpleObfuscation xs = obfuscate

complexObfuscation exceptions g c = undefined

obfuscate :: StdGen -> String -> String
obfuscate g = obfuscate' g []
  where
    obfuscate' _ a [] = reverse a
    obfuscate' g a text@(c:cs)
      | isAlpha  c = obf obfuscateAlpha g a text
      | isNumber c = obf obfuscateDigit g a text
      | otherwise  = obf id             g a text
    obf f g a (c:cs) = let (x,g') = f (c,g) in obfuscate' g' (x:a) cs

obfuscateAlpha, obfuscateDigit :: (Char, StdGen) -> (Char, StdGen)
obfuscateAlpha (c,g) = obfuscateChar g range
  where range
          | isUpper c = ('A','Z')
          | otherwise = ('a','z')

obfuscateDigit (c,g) = obfuscateChar g ('0','9')

obfuscateChar :: StdGen -> (Char, Char) -> (Char, StdGen)
obfuscateChar = flip randomR

I cannot get my head around how to obfuscate all text except words passed in as exceptions. My newLisp implementation relied on it's built-in regular expression handling. I've not had much luck using regular expressions in Haskell. Probably old libraries or something.

I've tried splitting the text up into lines and words and creating what in J would be called a fret. That approach is quickly getting unwieldy. I tried to use a parser, but I think that's going to get pretty hairy, too.

Does anyone have suggestion(s) on a simple, straight-forward approach to identifying exception words in the text and how not to send those to the obfuscate function? Haskell is such a brilliant language, surely I'm missing something right under my nose.

I tried Google, but it seems my desire for providing an exception list of words not to obfuscate is novel. Otherwise, obfuscation is quite simple.

Update

Following the idea I marked as the answer, I created my own words function:

words' :: String -> [String]
words' text = f text [] []
  where f [] wa ta = reverse $ wa:ta
        f (c:cs) wa ta =
          if isAlphaNum c
          then f cs (c:wa) ta
          else f cs [] $ if length wa > 0 then [c]:(reverse wa):ta else [c]:ta

Using break didn't work. I think mutual recursion with break and span would have worked, but I went with the code above before I thought of trying that.

Then I implemented complexObfuscation as follows:

complexObfuscation exceptions g = unlines . map obfuscateLine . lines
  where obfuscateLine = concatMap obfuscateWord . words'
        obfuscateWord word =
          if word `elem` exceptions
          then word
          else obfuscate g word

This accomplished what I was after. Unfortunately, I did not anticipate that the same generator would generate the same characters with every call to obfuscate. So each word starts with the same characters. Lol. A problem for another day.


Solution

  • Read the exceptions file and build a Data.Set.Set.

    After splitting the input file into lines, split it further into words.

    Then, obfuscate each word individually. If a word is an element of the Set you built before, leave it as it is. Otherwise, apply your obfuscate function to each character.