Search code examples
postgresqljoinduplicatesidentify

'View' (NOT DELETE) Duplicate Rows from a Postgresql table obtained from joins


So I have temp table I created by joining three tables :

  1. Trips
  2. Stops
  3. Stop_times

The Stop_times table has a list of trip_ids, the corresponding stops and the scheduled arrival and departure times of buses at those stops.

I searched online and everywhere I seem to find answers for how to delete duplicates (using ctid, nested queries) but not view them.

My query looks something like this :

CREATE TEMP TABLE temp as
SELECT 
 (CASE st.arrival_time < current_timestamp::time     
 WHEN true THEN (current_timestamp::date + interval '1 day') + st.arrival_time     
 ELSE (current_timestamp::date) + st.arrival_time     
 END) as arrival,      
 CASE st.departure_time < current_timestamp::time     
 WHEN true THEN (current_timestamp::date + interval '1 day') + st.departure_time     
 ELSE (current_timestamp::date) + st.departure_time     
 END as departure,     st.trip_id, st.stop_id, st.stop_headsign,route_id,   t.trip_headsign, s.stop_code, s.stop_name,      s.stop_lat, s.stop_lon

 FROM schema.stop_times st     
 JOIN schema.trips t ON t.trip_id=st.trip_id     
 JOIN schema.stops s ON s.stop_id=st.stop_id

 order by arrival, departure;  

I know that there are duplicates (by running the select * and select DISTINCT on temp), I just need to identify the duplicates...any help will be appreciated!

PS : I know I can use DISTINCT and get rid of duplicates, but it is slowing down the query a lot so I need to rework the query for which I need to identify the duplicates, the resultant records are greater than 200,000 so exporting them to excel and filtering duplicates is not an option either (I tried but excel can't handle it)


Solution

  • I believe this will give you what you want:

    SELECT arrival, departure, trip_id, stop_id, stop_headsign, route_id,
    headsign, stop_code, stop_name, stop_lat, stop_lon, count(*)
    FROM temp
    GROUP BY arrival, departure, trip_id, stop_id, stop_headsign, route_id,
    headsign, stop_code, stop_name, stop_lat, stop_lon
    HAVING count(*) > 1;