I would like to obfuscate a text file report without obscuring certain keywords, like report titles, column headers, etc. I've built such a program using newLisp. I'm trying to implement the functionality in Haskell from scratch. Here is the code I've got so far, which compiles and runs successfully for the case of simple obfuscation.
module Main where
import Data.Char (isAlpha, isNumber, isUpper, toUpper)
import System.Environment (getArgs)
import System.Random (getStdGen, randomR, StdGen)
helpMessage = [ "Usage: cat filename(s) | obfuscate [-x filename] > filename",
"",
"Obfuscates text files. This obliterates the text--there is no recovery. This",
"is not encryption. It's simple, if slow, obfuscation.",
"",
"To include a list of words not to obfuscate, use the -x option. List one word",
"per line in the file.",
"" ]
data CLOpts = CLOpts { help :: Bool
, exceptionFileP :: Bool
, exceptionFile :: String }
main = do
args <- getArgs
if length args > 0
then do let opts = parseCL args CLOpts { help=False, exceptionFileP=False, exceptionFile="" }
if help opts
then do putStrLn $ unlines helpMessage
else do if exceptionFileP opts
then do exceptions <- readFile $ exceptionFile opts
obf complexObfuscation $ lines exceptions
else do obf simpleObfuscation []
else do obf simpleObfuscation []
where obf f xs = do
g <- getStdGen
c <- getContents
putStrLn $ f xs g c
parseCL :: [String] -> CLOpts -> CLOpts
parseCL [] opts = opts
parseCL ("-x":f:xs) opts = parseCL xs opts { exceptionFileP=True, exceptionFile=f }
parseCL (_:xs) opts = parseCL xs opts { help=True }
simpleObfuscation xs = obfuscate
complexObfuscation exceptions g c = undefined
obfuscate :: StdGen -> String -> String
obfuscate g = obfuscate' g []
where
obfuscate' _ a [] = reverse a
obfuscate' g a text@(c:cs)
| isAlpha c = obf obfuscateAlpha g a text
| isNumber c = obf obfuscateDigit g a text
| otherwise = obf id g a text
obf f g a (c:cs) = let (x,g') = f (c,g) in obfuscate' g' (x:a) cs
obfuscateAlpha, obfuscateDigit :: (Char, StdGen) -> (Char, StdGen)
obfuscateAlpha (c,g) = obfuscateChar g range
where range
| isUpper c = ('A','Z')
| otherwise = ('a','z')
obfuscateDigit (c,g) = obfuscateChar g ('0','9')
obfuscateChar :: StdGen -> (Char, Char) -> (Char, StdGen)
obfuscateChar = flip randomR
I cannot get my head around how to obfuscate all text except words passed in as exceptions. My newLisp implementation relied on it's built-in regular expression handling. I've not had much luck using regular expressions in Haskell. Probably old libraries or something.
I've tried splitting the text up into lines and words and creating what in J
would be called a fret. That approach is quickly getting unwieldy. I tried to use a parser, but I think that's going to get pretty hairy, too.
Does anyone have suggestion(s) on a simple, straight-forward approach to identifying exception words in the text and how not to send those to the obfuscate function? Haskell is such a brilliant language, surely I'm missing something right under my nose.
I tried Google, but it seems my desire for providing an exception list of words not to obfuscate is novel. Otherwise, obfuscation is quite simple.
Update
Following the idea I marked as the answer, I created my own words
function:
words' :: String -> [String]
words' text = f text [] []
where f [] wa ta = reverse $ wa:ta
f (c:cs) wa ta =
if isAlphaNum c
then f cs (c:wa) ta
else f cs [] $ if length wa > 0 then [c]:(reverse wa):ta else [c]:ta
Using break
didn't work. I think mutual recursion with break and span would have worked, but I went with the code above before I thought of trying that.
Then I implemented complexObfuscation as follows:
complexObfuscation exceptions g = unlines . map obfuscateLine . lines
where obfuscateLine = concatMap obfuscateWord . words'
obfuscateWord word =
if word `elem` exceptions
then word
else obfuscate g word
This accomplished what I was after. Unfortunately, I did not anticipate that the same generator would generate the same characters with every call to obfuscate. So each word starts with the same characters. Lol. A problem for another day.
Read the exceptions file and build a Data.Set.Set
.
After splitting the input file into lines
, split it further into words
.
Then, obfuscate each word individually. If a word is an elem
ent of the Set
you built before, leave it as it is. Otherwise, apply your obfuscate
function to each character.