Search code examples
haskellrecursionmonads

I want to read a file, remove duplicate lines and write (unique lines) into another file


I am new to haskell and I'm trying to work out this problem. Given 2 files, an input file and output file, I want to read from the input file, remove duplicate lines then write (unique lines) into the output file. I managed to read the file and remove duplicates now I'm having trouble writing to file.

this is my thought process so far:

onlyUnique :: FilePath -> FilePath -> IO ()
onlyUnique inputFile outputFile = do
-- read input file 
-- remove duplicated lines in file 
-- writes remaining lines into outputFile 
  contents <- readFile inputFile
  let noDups = nub (words contents)
  fileNoDups <- foldr (\x acc -> (writeFile acc x)) outputFile noDups

the issue for this solution is the writeFile is the wrong type but I dont know how to go through the list of strings to write to file without using foldr. Recursive? but how do I do that in this case.

OR

onlyUnique :: FilePath -> FilePath -> IO ()
onlyUnique inputFile outputFile = do
-- read input file 
-- remove duplicated lines in file 
-- writes remaining lines into outputFile 
  contents <- readFile inputFile
  let noDups = nub (words contents)
  when (length noDups > 0) $ 
     writeFile outputFile (noDups)

Maybe I could create a helper function? But how would that look, I'm just confused.


Solution

  • You said in your question you wanted to remove duplicate lines, but in your program, it looks like you're trying to remove duplicate words.

    Otherwise, you're on the right track. Assuming it's lines you're interested in, then you can use the pair of functions lines (which breaks a String into a list of lines) and unlines (which glues a list of lines back together into a single string) to do what you want.

    onlyUnique :: FilePath -> FilePath -> IO ()
    onlyUnique inputFile outputFile = do
      -- read input file 
      contents <- readFile inputFile
      -- remove duplicated lines in file
      let noDups = nub (lines contents)
      -- put lines back together
      let output = unlines noDups
      -- write the result to output file
      writeFile outputFile output