So I have temp table I created by joining three tables :
The Stop_times table has a list of trip_ids, the corresponding stops and the scheduled arrival and departure times of buses at those stops.
I searched online and everywhere I seem to find answers for how to delete duplicates (using ctid, nested queries) but not view them.
My query looks something like this :
CREATE TEMP TABLE temp as
SELECT
(CASE st.arrival_time < current_timestamp::time
WHEN true THEN (current_timestamp::date + interval '1 day') + st.arrival_time
ELSE (current_timestamp::date) + st.arrival_time
END) as arrival,
CASE st.departure_time < current_timestamp::time
WHEN true THEN (current_timestamp::date + interval '1 day') + st.departure_time
ELSE (current_timestamp::date) + st.departure_time
END as departure, st.trip_id, st.stop_id, st.stop_headsign,route_id, t.trip_headsign, s.stop_code, s.stop_name, s.stop_lat, s.stop_lon
FROM schema.stop_times st
JOIN schema.trips t ON t.trip_id=st.trip_id
JOIN schema.stops s ON s.stop_id=st.stop_id
order by arrival, departure;
I know that there are duplicates (by running the select * and select DISTINCT on temp), I just need to identify the duplicates...any help will be appreciated!
PS : I know I can use DISTINCT and get rid of duplicates, but it is slowing down the query a lot so I need to rework the query for which I need to identify the duplicates, the resultant records are greater than 200,000 so exporting them to excel and filtering duplicates is not an option either (I tried but excel can't handle it)
I believe this will give you what you want:
SELECT arrival, departure, trip_id, stop_id, stop_headsign, route_id,
headsign, stop_code, stop_name, stop_lat, stop_lon, count(*)
FROM temp
GROUP BY arrival, departure, trip_id, stop_id, stop_headsign, route_id,
headsign, stop_code, stop_name, stop_lat, stop_lon
HAVING count(*) > 1;