Search code examples
typestypeerrornim-lang

Nim : How to constrain an existing type


I have a question about type definition.

I would like to restrict an existing type to enforce certain additional criterion. For example, I would like to construct a type for a DNA string.

A DNA strand can be seen as an arbitrary long string of characters that only contains the characters 'A', 'C', 'G', 'T' (Nucleotides). Similarly, I would define a RNA string as a string with only characters 'A', 'C', 'G', 'U' .

A RNA string can be decomposed into codons, which is a string with only three characters among the four nucleotides ('A', 'C', 'G', 'U'). Can I make a codon type, that would automatically check (e.g. at the initialisation or after a type conversion), whether the string is of length 3 and does not contain any other characters than the ones valid ?

I have attempted to use a concept type :

var
  NucleotideSet: set[char] = {'A','C','G','U'}

type
  Nucleotide {.explain.} = concept var a
    a is char
    a in {'A','C','G','U'}

  RnaCodon = seq[Nucleotide]

but this experimental feature does not coerce existing type, it only checks if a type verifies some properties, but I might be mistaken.

What I want to do is to manipulate RNA strings without having to check manually that each character is indeed a Nucleotide.

With the definitions in my code above, the following fails :

echo 'A' is Nucleotide

I get a type mismatch : ''A'' is char but expected Nucleotide. What I have done wrong, in this example and how could I fix it to define a RNAstring and a codon ? My guess now is that in the concept type, a is not the type but the variable and I would probably need to write something like :

type
  Nucleotide {.explain.} = concept var a, type T
    a is T
    T is char
    a in {'A','C','G','U'}

but I get also a type mismatch error.


Solution

  • As far as you have explained the only problem you have is that you want to have a kind of variables were you are sure only a certain value is held. I'd use normal strings as a distinct type. The avoiding sql injection attacks section in the documentation explain how this could word:

    1. Create a distinct string for both a DNA and RNA strand.
    2. Create a validation/parsing function that converts any input to DNA or RNA distinct strings.
    3. For each input string, pass the input to those conversion functions. If they are valid DNA/RNA, your function returns the string converted to the distinct type, otherwise the input is discarded and/or an error is generated.
    4. From then on, other code uses only the distinct type, so you can't pass unvalidated strings there, and have confidence that the data sent to those procs has been validated.

    When you are using the distinct types you don't even need to check of a certain element of the distinct type is a nucleotide or not, since your input validation/conversion proc already deals with that once.