Search code examples
csvhaskellurl

How to read a CSV Format from an URL in Haskell


I started learning the Haskell language 24 hours ago. I’m starting to understand but it’s not as simple as Python for now. I’ve tried a lot and read the docs but I’m not making progress. That said, I can’t read a CSV format from a URL, I suspect I have an issue with the ByteString library, does anyone have an idea?

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Conduit (simpleHttp)
import Data.Csv
import qualified Data.ByteString.Lazy as BL
import qualified Data.Vector as V

readCSV :: String -> IO (Either String (V.Vector BL.ByteString))
readCSV url = do
    csvData <- simpleHttp url
    return $ decode HasHeader csvData

main :: IO ()
main = do
    let url = "https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2023-12-30&endtime=2023-12-31&minlatitude=32&maxlatitude=42&minlongitude=-124&maxlongitude=-114&minmagnitude=3"
    result <- readCSV url
    case result of
        Left err -> print err
        Right csv -> print csv

EDIT:

The error message.

    • Couldn't match expected type ‘BL.ByteString’
                  with actual type ‘bytestring-0.11.5.3:Data.ByteString.Lazy.Internal.ByteString’
      NB: ‘BL.ByteString’
            is defined in ‘Data.ByteString.Lazy.Internal’
                in package ‘bytestring-0.12.0.2’
          ‘bytestring-0.11.5.3:Data.ByteString.Lazy.Internal.ByteString’
            is defined in ‘Data.ByteString.Lazy.Internal’
                in package ‘bytestring-0.11.5.3’
    • In the second argument of ‘decode’, namely ‘csvData’
      In the second argument of ‘($)’, namely ‘decode HasHeader csvData’
      In a stmt of a 'do' block: return $ decode HasHeader csvData
   |
11 |     return $ decode HasHeader csvData

Solution

  • I wasn't able to reproduce your error but it seems like you have a mismatch between versions of bytestring. A solution to that might be to loosen restriction in your cabal file.

    However, your code won't work even after we fix that specific error so let's work through the actual issue.

    In the type signature readCSV :: String -> IO (Either String (V.Vector BL.ByteString)) the second argument to Either, in our case V.Vector BL.ByteString, will tell us to which datatype decode tries to decode the data to. So we are trying to tell it to decode each row of the file into a bytestring. That doesn't really make sense, because a CSV contains of rows of records but we're telling it to lump each field of a record into one.

    Instead what we really want is to decode the file to a vector of records. And in fact, there is a type alias in cassava to help exactly with that: Record! If we change our type signature to readCSV :: String -> IO (Either String (V.Vector Record)) our code works! Now if you run the code it will print list of lists of fields in the data.

    That probably won't be quite satisfactory to us yet because the all the fields are now bytestrings and the only way to select a specific field is to index it with an integer. Not very convenient or type-safe I would say.

    Instead we'd probably like to have our own type to represent the data, something like (shortened for brevity):

    data EarthQuake = EarthQuake
        { time :: String
        , latitude :: Double
        , longitude :: Double
        }
    

    But of course decode can't magically know how to use this so we can't just slap it into the type signature and be done with it. We're going to have make the type an instance of the type class FromRecord:

    instance FromRecord EarthQuake where
        parseRecord v =
            EarthQuake
            <$> v .! 0
            <*> v .! 1
            <*> v .! 2
    

    This instance will tell decode function how to convert a record into our own datatype. And now that we've done that, we can again modify our type signature: readCSV :: String -> IO (Either String (V.Vector EarthQuake)). Much more convenient.