Search code examples
sqlpostgresqlduplicatespostgresql-9.3

Postgresql group by for multiple lines


I have this table named hr_holidays_by_calendar. I just want to filter out the rows where the same employee is having two leaves in same day.

Table hr_holidays_by_calendar:

enter image description here

Query I tried:
Wasn't anywhere near in solving this.

select hol1.employee_id, hol1.leave_date, hol1.no_of_days, hol1.leave_state
from hr_holidays_by_calendar hol1
inner join
    (select employee_id, leave_date 
    from hr_holidays_by_calendar hol1
    group by employee_id, leave_date 
    having count(*)>1)sub
on hol1.employee_id=sub.employee_id and hol1.leave_date=sub.leave_date
where hol1.leave_state != 'refuse'
order by hol1.employee_id, hol1.leave_date

Solution

  • This returns all rows where a duplicate exists:

    SELECT employee_id, leave_date, no_of_days, leave_state
    FROM   hr_holidays_by_calendar h
    WHERE  EXISTS (
       SELECT -- select list can be empty for EXISTS
       FROM   hr_holidays_by_calendar
       WHERE  employee_id = h.employee_id
       AND    leave_date = h.leave_date
       AND    leave_state <> 'refuse'
       AND    ctid <> h.ctid
       )
    AND    leave_state <> 'refuse'
    ORDER  BY employee_id, leave_date;
    

    It's unclear where leave_state <> 'refuse' should apply. You would have to define requirements. My example excludes rows with leave_state = 'refuse' (and leave_state IS NULL with it!) completely.

    ctid is a poor man's surrogate for your undisclosed (undefined?) primary key.

    Related: