Search code examples
jsonparsingrustserdeserde-json

How to parse complicated JSON data in rust correctly?


I'm writing my own Magic the gathering implementation in rust as an exercise in (futility and) learning the language and I'm trying to parse the json data into data structures using serde and serde_json. The problem I'm running into is that some of the properties on the structs and enums in my data structure are IMPLIED by the data given in json format, so I have to have some properties run through some functions when serde parses the json and I don't know the right way to do that. I've tried using #[serde(default="parse_costs")] but I need to be able to pass arguments to the function called by default.

Here's what the json data looks like:

{
  "library" : {
    "+2 Mace": [{
      "colorIdentity": [
        "W"
      ],
      "colors": [
        "W"
      ],
      "convertedManaCost": 2,
      "keywords": [
        "Equip"
      ],
      "layout": "normal",
      "manaCost": "{1}{W}",
      "manaValue": 2,
      "name": "+2 Mace",
      "subtypes": [
        "Equipment"
      ],
      "supertypes": [ ],
      "text": "Equipped creature gets +2/+2.\nEquip {3} ({3}: Attach to target creature you control. Equip only as a sorcery.)",
      "type": "Artifact — Equipment",
      "types": [
        "Artifact"
      ]
    }],
    // ... about 30,000 more cards that look more or less like the above.
  }
}

Here's an excerpt of what my rust implementation looks like:

enum Color {
    #[strum(
        serialize = "black",
        serialize = "b",
        serialize = "{black}",
        ascii_case_insensitive
    )] // all colors have these strum serialize derives, i'm leaving them out for brevity.
    B,
    U,
    C,
    G,
    R,
    W,
    None,
}

// all structs have these derives, i'm leaving them out for brevity.
#[derive(Debug, Deserialize)]
struct Payment {
    color: Color,
    quantity: u8,
}

struct Cost {
    cost: HashMap<Color, u8>,
}

impl Cost {
    fn new(payments: Vec<Payment>) -> Self{
        let mut cost =  HashMap::new();
        if payments.len() == 0 {
            cost.insert(Color::None, 0);
        }
        payments.iter().for_each(|payment|{
            let key = &payment.color;
            if cost.contains_key(key) {
                let mut val = cost.get_mut(key).unwrap();
                *val += &payment.quantity;
            } else {
                cost.insert(payment.color.clone(), payment.quantity);
            }
        });
        return Self {cost}
    }
}

fn parse_costs(mana_cost: &str) -> Cost{
    let re = Regex::new(r"\{(\w+)}").unwrap();
    let haystack = mana_cost;
    let mut payments_vec:Vec<Payment> = vec!();

    for (_, [color]) in re.captures_iter(haystack).map(|c| c.extract()){
        if color.parse::<u8>().is_ok(){
            payments_vec.push(Payment{ color: Color::C, quantity: color.parse().unwrap() })
        } else {
            payments_vec.push(Payment { color: Color::from_str(color).unwrap(), quantity: 1 });
        }
    }
    Cost::new(payments_vec)
}

struct Card {
    // ... other properties that work fine and arent complicated
    #[serde(rename(deserialize = "text"), default)]
    description: String,
    #[serde(default)]
    keywords: Vec<String>,
    layout: String,
    #[serde(rename(deserialize = "manaCost"), default)]
    mana_cost: String,
    #[serde(rename(deserialize = "manaValue"), default)]
    mana_value: u8,
    name: String,

    // THE PROBLEM:

    // this is what I'm trying to do but this doesn't work because you cant pass arguments
    // to the default function
    #[serde(default = "parse_costs(manaCost)")]
    cost: Cost,
}

So, firstly, is there a way to do what I'm doing without manually implementing Deserialize for a struct? This looks SUPER complicated and I only want to go down that rabbit hole if I absolutely have to.

And secondly, is there a better way of doing what I'm trying to do here? Because if this were javascript, I would solve this problem by just converting the JSON object into a javascript object and then simply iterating over each of the properties in library, mapping and transforming each property into my data structure along the way. But Serde is getting me REALLY close to doing what I need it to do without having to duplicate a bunch of stuff in memory but... I just don't know how to get it to do the last bit.


Solution

  • When using Serde, separate fields in the same struct aren't aware of each other, so such transformations have to happen on the struct as a whole. What Serde does allow is doing as much processing as you want on a single value.

    Indeed, implementing Deserialize can be very complex, but you can avoid most of that by leveraging another type's Deserialize implementation.

    Here's how you can do that for Cost:

    impl<'de> Deserialize<'de> for Cost {
        fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where
            D: Deserializer<'de>,
        {
            // `&str` can't deserialize JSON strings with escapes, and `String`
            // is not optimally efficient when there are no escapes, so we use
            // `Cow`. `Cow`'s deserialization uses `str` when it can, otherwise
            // it falls back to `String`.
            let cow = Cow::<str>::deserialize(deserializer)?;
            let s: &str = cow.as_ref();
            Ok(parse_costs(s))
        }
    }
    

    Then put this in mana_cost instead of String.

    #[derive(Debug, Deserialize)]
    #[serde(rename_all = "camelCase")]
    struct Card {
        #[serde(rename = "text", default)]
        description: String,
        #[serde(default)]
        keywords: Vec<String>,
        layout: String,
        #[serde(default)]
        mana_cost: Cost,
        mana_value: u8,
        name: String,
    }
    

    The whole playground, with some other style fixes.