Search code examples
vega-lite

Vega lite calculate issues


I am trying to create a graph that displays a new column values calculated from many existing columns in my csv file. I know that there are way more countries and that they all have a score out of 10 (for each field) so out of 50 total.

     {
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "url": "https://raw.githubusercontent.com/Anika6138/InfoVis/master/arabica_data_cleaned.csv",
    "format": {
      "type": "csv"
    }
  },
  "transform": [
   {"calculate": "datum.Aroma + datum.Flavor + datum.Aftertaste + datum.Acidity + datum.Sweetness  ", "as": "Taste_Points"}
  ],
      "mark": "bar",
      "encoding": {
        "y": {
          "field": "Country_of_Origin",
          "type": "nominal"
        },
        "x": {
          "field": "Taste_Points",
          "type": "quantitative"
        }
  },
  "config": {}
}

enter image description here

This is what I get. Many countries with values are ignored and no filters are added.


Solution

  • Your data is specified as CSV, which means all values in calculations are interpreted as strings unless you specify otherwise. There are two ways to fix this; you can add a parse statement in the data format definition:

      "data": {
        "url": "https://raw.githubusercontent.com/Anika6138/InfoVis/master/arabica_data_cleaned.csv",
        "format": {
          "type": "csv",
          "parse": {"Aroma": "number", "Flavor": "number", "Aftertaste": "number", "Acidity": "number", "Sweetness": "number"}
        }
      }
    

    or you can use parseFloat within the calculate expression:

      "transform": [
        {
          "calculate": "parseFloat(datum.Aroma) + parseFloat(datum.Flavor) + parseFloat(datum.Aftertaste) + parseFloat(datum.Acidity) + parseFloat(datum.Sweetness)",
          "as": "Taste_Points"
        }
      ]
    

    The reason fields were implicitly filtered in your original specification is because the result of the sum was, in many cases, a concatenated string that could not be parsed as a valid number, and NaN values are implicitly removed from quantitative encodings; for example:

    {
      "data": {
        "values": [
          {"y": "A", "x": 1},
          {"y": "B", "x": 2},
          {"y": "C", "x": null},
          {"y": "D", "x": null}
        ]
      },
      "mark": "bar",
      "encoding": {
        "x": {"field": "x", "type": "quantitative"},
        "y": {"field": "y", "type": "ordinal"}
      }
    }
    

    enter image description here