Search code examples
mysqlsqlrconditional-statementsrmysql

SQL query for calling multiple columns with conditions(with NAs)


This is a continuous question from my previous question. I want to write an SQL query calling for several columns with conditions. I'm working on R Studio using RMySQL package. My server is MySQL.

The table looks like this.

organisation          A           B           C           D
Ikea         2018-04-01  2018-05-07  2018-05-09  2018-05-01
Ikea         2018-06-01  2018-05-03  2018-05-29          NA   
Orange       2018-04-02  2018-05-01  2018-07-08  2018-05-26 
Ikea         2018-06-02  2018-05-01          NA  2018-05-26
Nestle       2018-06-02  2018-05-01          NA  2018-05-26
Ikea                 NA  2018-05-05  2018-04-02  2018-06-01

And I want to get a row where the organisation is Ikea, and where the earliest date among four columns (A, B, C, D) is between 2018-05-01 and 2018-05-31.

In a row which contains NA values, I want to ignore the NAs and see what's the earliest date among the rest of the values. For example, for the second row, the earliest date is "2018-05-03"(column B) therefore it meets the criteria.

Therefore only the second the fourth row of the original table above match the conditions. And the result I want to get should be:

organisation          A           B           C           D
Ikea         2018-06-01  2018-05-03  2018-05-29          NA    
Ikea         2018-06-02  2018-05-01          NA  2018-05-26

How should I write an SQL query? Here is my attempt after getting an answer from my previous question, but it doesn't work well for rows with NAs.

SELECT * FROM myTable 
WHERE organisation LIKE Ikea
LEAST(A, B, C, D) >= '2018-05-01' AND
LEAST(A, B, C, D) < '2018-06-01'

Thank you for any kinds of help!


Solution

  • Just use coalesce():

    SELECT *
    FROM myTable 
    WHERE organisation LIKE 'Ikea' AND
          LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) >= '2018-05-01' AND
          LEAST(COALESCE(A, '2019-01-01'), COALESCE(B, '2019-01-01'), COALESCE(C, '2019-01-01'), COALESCE(D, '2019-01-01')) < '2018-06-01';
    

    '2019-01-01' is an arbitrary date after the range.