Search code examples
haskellpattern-matchingbytestring

pattern doesn't match in haskell bytestring


I am writing a DNA translator using Haskell (bytestrings in particular). I have the following code:

import Data.Maybe
import Data.Monoid ((<>))
import System.Environment

import qualified Data.ByteString as B
import Data.ByteString.Lazy.Char8 (ByteString, singleton, splitWith)
import qualified Data.ByteString.Lazy as LB
  
-- Extract DNA sequence from fasta file
xtractDNA :: [ByteString] -> Maybe ByteString
xtractDNA dna = Just (LB.concat dna)
--xtractDNA = foldr ((<>) . Just) Nothing 

-- Reverse Complement DNA
compStrand :: Maybe ByteString -> Maybe ByteString
compStrand = foldr ((<>) . compPairs) Nothing
  where
    compPairs nt | nt == (singleton 'A') = Just (singleton 'T')
                 | nt == (singleton 'T') = Just (singleton 'A')
                 | nt == (singleton 'G') = Just (singleton 'C')
                 | nt == (singleton 'C') = Just (singleton 'G')
                 | otherwise = Nothing


main :: IO ()
main = do
  putStrLn "Welcome to volcano"
  let fname = "/home/trinix/Development/hs_devel/local_data/shbg.fasta"
  fid <- LB.readFile fname
  let dna = LB.concat $ tail (LB.splitWith (==10) fid)
  --putStrLn $ show (LB.length (head dna))
  let dsDna = compStrand (Just dna)
  print dsDna

When I execute I get Nothing as answer. Part of the input is

"AATTCTCCATGTGCTTGGATCGTGGGGAAGATGTGATTAAGGTCTAAGGTATGTCTTCCACCAGACAACGGACACAGTCAATTAGAAGCTGGGTAAAGGGGTCTCTCCTGCGGAGCGGGGAGCGCCAAGCCAGGGACAATAATGGCCTGAAGTTCATTCTCCCGGAGATGGGGGTAGAAGCAGGTGCAGGTGCCTTAGAGGGGTCAAAAATAAGAGGAACAGGGTTCACTCTAAGCGGTCTCCCAGGGAAGGCTGCGGGTTGGAGCAAGGGTCCAAGATTCTAAGGGCCAGGACTCAGCTCCAGAAGCTCGATCCCGCCCCACGCGTTCCTGCTCCGGCCAGGGGAGGGGGCTAAGGACCGGCGTCCCCAGTCGGCGCGCCGTCTCACCTTGTAGAAGGCCCCGTTGGAGCCGCGCACCTCGACGGGCAGTCCCGGCTCCACATCCCCCCCAGAGGCCAGGCCGCCCATGGCGCCGCCACCGCCTCCGACTCCCCCGGCGGCGGCTGCAGCAGCAGTCTGAGTGCGGGCCGGGCCAGGCCCCCGGCGTCTCCCCGGAGGAGGAGCCGGAGGGGGAGCCGCGGGGGGCGGGAGCCGGGCCGGCCCCACGGCGGCCCTGCCACAGCCAACGAGCAGGGGGCCGGGGCCGGGCCGCTCCCCGTCCGCCGCCGCCGCCTTGGTCTCCGCC...ACAAGGTCAGAGGCTGGATGTGGACCAGGCCCTGAACAGAAGCCATGAGATCTGGACTCACAGCTGCCCCCAGAGCCCAGGCAATGGCACTGACGCTTCCCATTAAAGCTCCACCTAAGAACCCCC"

My doubt is that my pattern matching guard has some problem. How can I figure that out and solve this issue? Any insights would be much appreciated.


Solution

  • You are using foldr with as Foldable the Maybe, not the ByteString. It will thus inspect the Maybe a. In case it is a Just it will call comPairs with the entire ByteString of DNA, otherwise it will return Nothing.

    Your comPairs will return Nothing for any ByteString that is empty or has two or more bytes, hence it returns Nothing.

    You can work with a mapM :: Monad m => (a -> m b) -> [a] -> m [b] to construct a Maybe [Word8] and then convert it back to a ByteString:

    import Data.ByteString.Lazy.Char8 (ByteString, pack, unpack)
    
    compStrand :: Maybe ByteString -> Maybe ByteString
    compStrand = (>>= fmap pack . mapM comPairs . unpack)
        where comPairs 'A' = Just 'T'
              comPairs 'C' = Just 'G'
              comPairs 'G' = Just 'C'
              comPairs 'T' = Just 'A'
              comPairs _ = Nothing