Search code examples
sqldatetimeapache-spark-sqlgreatest-n-per-groupmin

Trying to fetch the first record from a group in SQL


I'm trying to query the earliest timestamp for a candidate, for a project, for a contract on spark SQL.

spark.sql(
      """
        |SELECT
        | DISTICT
        | timestamp,
        | candidate_id,
        | project_id,
        | contract_id
        |FROM candidatesHistory
        |GROUP BY timestamp, candidate_id, project_id, contract_id
        |ORDER BY timestamp DESC
        |LIMIT 1
        |""".stripMargin)

This code doesn't do it, it just fetches one record - how do I get the oldest timestamp for a candidate for a project for a contract?

Any help appreciated


Solution

  • If you only have 4 columns in the table, then you can use aggregation:

    select candidate_id, project_id, contract_id, min(timestamp) first_timestamp
    from candidateshistory
    group by candidate_id, project_id, contract_id
    

    If there are more columns and you want to bring all of them, then you can use row_number() to filter the table:

    select ch.*
    from (
        select ch.*,
            row_number() over(partition by candidate_id, project_id, contract_id order by timestamp) rn
        from candidateshistory ch
    ) ch
    where rn = 1
    

    For each (candidate_id, project_id, contract_id) tuple, this gives you the row with the earliest timestamp.