Search code examples
rustrust-polars

How to add Column names in a Polars DataFrame while using CsvReader


I can read a csv file which does not have column headers in the file. With the following code using polars in rust:

use polars::prelude::*;

fn read_wine_data() -> Result<DataFrame> {
    let file = "datastore/wine.data";
    CsvReader::from_path(file)?
        .has_header(false)
        .finish()
}


fn main() {
    let df = read_wine_data();
    match df {
        Ok(content) => println!("{:?}", content.head(Some(10))),
        Err(error) => panic!("Problem reading file: {:?}", error)
    }
}

But now I want to add column names into the dataframe while reading or after reading, how can I add the columns names. Here is a column name vector:

let COLUMN_NAMES = vec![
    "Class label", "Alcohol",
    "Malic acid", "Ash",
    "Alcalinity of ash", "Magnesium",
    "Total phenols", "Flavanoids",
    "Nonflavanoid phenols",
    "Proanthocyanins",
    "Color intensity", "Hue",
    "OD280/OD315 of diluted wines",
    "Proline"
];

How can I add these names to the dataframe. The data can be downloaded with the following code:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

Solution

  • This seemed to work, by creating a schema object and passing it in with the with_schema method on the CsvReader:

    use polars::prelude::*;
    use polars::datatypes::DataType;
    
    fn read_wine_data() -> Result<DataFrame> {
      let file = "datastore/wine.data";
    
      let mut schema: Schema = Schema::new();
      schema.with_column("wine".to_string(), DataType::Float32);
    
      CsvReader::from_path(file)?
          .has_header(false)
          .with_schema(&schema)
          .finish()
     }
    
    
    fn main() {
        let df = read_wine_data();
        match df {
            Ok(content) => println!("{:?}", content.head(Some(10))),
            Err(error) => panic!("Problem reading file: {:?}", error)
        }
    }
    

    Granted I don't know what the column names should be, but this is the output I got when adding the one column:

    shape: (10, 1)
    ┌──────┐
    │ wine │
    │ ---  │
    │ f32  │
    ╞══════╡
    │ 1.0  │
    ├╌╌╌╌╌╌┤
    │ 1.0  │
    ├╌╌╌╌╌╌┤
    │ 1.0  │
    ├╌╌╌╌╌╌┤
    │ 1.0  │
    ├╌╌╌╌╌╌┤
    │ ...  │
    ├╌╌╌╌╌╌┤
    │ 1.0  │
    ├╌╌╌╌╌╌┤
    │ 1.0  │
    ├╌╌╌╌╌╌┤
    │ 1.0  │
    ├╌╌╌╌╌╌┤
    │ 1.0  │
    └──────┘