Search code examples
haskelltypestype-systems

In Haskell, how do you restrict functions to only one constructor of a data type?


I'm not sure how to word this question. Say I'm trying to pass the paths of tmpfiles around, and I want to capture the idea that there are different formats of tmpfile, and each function only works on one of them. This works:

data FileFormat
  = Spreadsheet
  | Picture
  | Video
  deriving Show

data TmpFile = TmpFile FileFormat FilePath
  deriving Show

videoPath :: TmpFile -> FilePath
videoPath (TmpFile Video p) = p
videoPath _ = error "only works on videos!"

But there must be a better way to write it without runtime errors right? I thought of two alternatives, this:

type TmpSpreadsheet = TmpFile Spreadsheet
type TmpPicture     = TmpFile Picture
type TmpVideo       = TmpFile Video

videoPath :: TmpVideo -> FilePath

Or this:

data TmpFile a = TmpFile a FilePath
  deriving Show

videoPath :: TmpFile Video -> FilePath

But obviously they don't compile. What's the proper way to do it? Some other ideas, none particularly appealing:

  • Wrap TmpFile in the format instead of the other way around, so the values are Video (TmpFile "test.avi") etc.
  • Make lots of separate data types VideoTmpFile, PictureTmpFile etc.
  • Make a TmpFile typeclass
  • Use partial functions everywhere, but add guard functions to abstract the pattern matching

I also considered learning the -XDataKinds extension, but suspect I'm missing something much simpler that can be done without it.

EDIT: I'm learning a lot today! I tried both the approaches outlined below (DataKinds and phantom types, which have dummy value constructors that can be removed with another extension), and they both work! Then I tried to go a little further. They both let you make a nested type TmpFile (ListOf a) in addition to regular TmpFile a, which is cool. But I've tentatively decided to go with plain phantom types (intact value constructors), because you can pattern match on them. For example, I was surprised that this actually works:

data Spreadsheet = Spreadsheet deriving Show
data Picture     = Picture     deriving Show
data Video       = Video       deriving Show
data ListOf a    = ListOf a    deriving Show

data TmpFile a = TmpFile a FilePath
  deriving Show

videoPath :: TmpFile Video -> FilePath
videoPath (TmpFile Video p) = p

-- read a file that contains a list of filenames of type a,
-- and return them as individual typed tmpfiles
listFiles :: TmpFile (ListOf a) -> IO [TmpFile a]
listFiles (TmpFile (ListOf fmt) path) = do
  txt <- readFile path
  let paths = map (TmpFile fmt) (lines txt)
  return paths

vidPath :: TmpFile Video
vidPath = TmpFile Video "video1.txt"

-- $ cat videos.txt
-- video1.avi
-- video2.avi
vidsList :: TmpFile (ListOf Video)
vidsList = TmpFile (ListOf Video) "videos.txt"

main :: IO [FilePath]
main = do
  paths <- listFiles vidsList  -- [TmpFile Video "video1.avi",TmpFile Video "video2.avi"]
  return $ map videoPath paths -- ["video1.avi","video2.avi"]

As far as I can tell, the equivalent with DataKinds is very similar, but can't access fmt as a value:

{-# LANGUAGE DataKinds, KindSignatures #-}

data FileFormat
  = Spreadsheet
  | Picture
  | Video
  | ListOf FileFormat
  deriving Show

data TmpFile (a :: FileFormat) = TmpFile FilePath
  deriving Show

vidPath :: TmpFile Video
vidPath = TmpFile "video.avi"

vidsList :: TmpFile (ListOf Video)
vidsList = TmpFile "videos.txt"

videoPath :: TmpFile Video -> FilePath
videoPath (TmpFile p) = p

listFiles :: TmpFile (ListOf a) -> IO [TmpFile a]
listFiles (TmpFile path) = do
  txt <- readFile path
  let paths = map TmpFile (lines txt)
  return paths

main :: IO [FilePath]
main = do
  paths <- listFiles vidsList
  return $ map videoPath paths

(It may seem like a weird thing to want, but my actual program is going to be an interpreter for a small language that compiles to Shake rules with a tmpfile corresponding to each variable, so typed lists of tmpfiles will be useful)

Does that seem right? I like the idea of DataKinds better, so I would go with it instead if I could inspect them as values, or if it turns out that's never needed.


Solution

  • You're right: with -XDataKinds, the TmpFile Video -> FilePath approach would work. And indeed I think this may be a good application for that extension.

    {-# LANGUAGE DataKinds #-}
    
    data TmpFile (a :: FileFormat) = TmpFile FilePath
      deriving Show
    
    videoPath :: TmpFile Video -> FilePath
    

    The reason you need this extension to write TmpFile Video is that the constructors of FileFormat are ab initio value-level (thus only exist at runtime), while TmpFile is type-level / compile-time.

    Of course there's another way to generate type-level entities: define types!

    data Spreadsheet = Spreadsheet
    data Picture = Picture
    data Video = Video
    
    data TmpFile a = TmpFile a FilePath
      deriving Show
    
    videoPath :: TmpFile Video -> FilePath
    

    Such types are called phantom types. But really, they're a bit of a hack to work around the former lack of proper type-level values, which DataKinds has now given us. So, unless you need compatibility with old compilers, do use DataKinds!

    An alternative would be to not enforce the file type at compile time, but simply make it explicit that the functions are partial.

    data TmpFile = TmpFile FileFormat FilePath
      deriving Show
    
    videoPath :: TmpFile -> Maybe FilePath
    videoPath (TmpFile Video p) = p
    videoPath _ = Nothing
    

    In fact, that approach might well be the more rational one, depending on what you're planning to do.