Search code examples
sqlpostgresqlrustrust-diesel

How do I count the number of distinct elements in an array column with Diesel?


I'm trying to implement the count_distinct_labels function to count the distinct elements in a column of arrays with Diesel and PostgreSQL.

For example, I have a table like this:

------------------
|     labels     |
------------------
| ['foo', 'bar'] |
------------------
| ['bar', 'baz'] |
------------------

In this case, count_distinct_labels() should be 3, because there are 3 unique labels ('foo', 'bar', 'baz').

I found that the following SQL returns the desired result, but I don't know how to translate this into a Diesel expression.

SELECT COUNT(*) FROM (SELECT DISTINCT unnest(labels) FROM t) AS label;

Here is my source code:

#[macro_use]
extern crate diesel;
extern crate dotenv;

use diesel::pg::PgConnection;
use diesel::prelude::*;
use dotenv::dotenv;
use std::env;

mod schema {
    table! {
        t (id) {
            id -> Int4,
            labels -> Array<Text>,
        }
    }

    #[derive(Insertable)]
    #[table_name = "t"]
    pub struct NewRow<'a> {
        pub labels: &'a [String],
    }
}

fn count_distinct_labels(conn: &PgConnection) -> i64 {
    // SELECT COUNT(*) FROM (SELECT DISTINCT unnest(labels) FROM t) AS label
    unimplemented!()
}

fn main() {
    dotenv().ok();

    let database_url = env::var("DATABASE_URL").expect("DATABASE_URL must be set");
    let conn = PgConnection::establish(&database_url)
        .expect(&format!("Error connecting to {}", database_url));

    diesel::insert_into(schema::t::dsl::t)
        .values(&vec![
            schema::NewRow {
                labels: &["foo".to_string(), "bar".to_string()],
            },
            schema::NewRow {
                labels: &["bar".to_string(), "baz".to_string()],
            },
        ]).execute(&conn)
        .unwrap();

    // how to implement?
    assert_eq!(count_distinct_labels(&conn), 3);
}

and Cargo.toml:

[package]
name = "how-to-count-distinct"
version = "0.1.0"
authors = ["name"]

[dependencies]
diesel = { version = "1.0", features = ["postgres"] }
dotenv = "0.13"

I also created a repo containing the full example. If you want to reproduce, clone this repo and cargo run. Note you have to start the Postgres service before you run the code.


Solution

  • Add a view

    The simplest thing to do as of Diesel 1.31 is to add a view to the database:

    CREATE VIEW unique_labels AS (SELECT DISTINCT unnest(labels) FROM t);
    

    You can then tell Diesel about the view:

    table! {
        unique_labels (unnest) {
            unnest -> Text,
        }
    }
    

    And query it directly:

    fn count_distinct_labels(conn: &PgConnection) -> i64 {
        use schema::unique_labels::dsl::*;
        use diesel::dsl;
    
        unique_labels.select(dsl::count_star()).first(conn).unwrap()
    }
    

    Use sql_query

    You can always fall back to the "big hammer" of directly executing SQL:

    fn count_distinct_labels(conn: &PgConnection) -> i64 {
        use diesel::sql_types::BigInt;
    
        #[derive(QueryableByName)]
        struct Count {
            #[sql_type = "BigInt"]
            count: i64,
        }
    
        diesel::sql_query(r#"SELECT COUNT(*) FROM (SELECT DISTINCT unnest(labels) FROM t) AS label"#)
            .load::<Count>(conn)
            .expect("Query failed")
            .pop()
            .expect("No rows")
            .count
    }
    

    1 Diesel 1.3 does not have the ability to pass your own FROM clause manually, perhaps among other restrictions.