Search code examples
pythonpandasdatetimetimezonedata-analysis

Python pandas DataFrame: convert local UTM time to GMT using longitude


I have a very large pandas DataFrame. A small example:

          BaseDateTime       LAT        LON
0  2018-10-18T00:00:00  36.97696  -89.10680
1  2018-10-18T00:00:00  46.08972 -122.92928
2  2018-10-18T00:00:00  48.10739 -122.77227
3  2018-10-18T00:00:00  28.72571  -89.52151
4  2018-10-18T00:00:00  61.11447 -146.35110

How can I transform the column BaseDateTime (local time) to GMT time (according to the column LON which tells in what Universal Transverse Mercator or UTM time zone the data was measured)?

I've googled for answers. There are a lot of tutorials about time zones, but none of them use longitude + local time.


Solution

  • Local shipboard time for ships at sea is complicated by three factors:

    • In open international waters, time zone can be computed from longitude as a fixed hour offset from UTC. For example, UTC-9 is in use far west of the Pacific coast. In the TZ database that is referenced as Etc/GMT+9 - the sign is inverted.

    • In territorial waters, the time zone is usually that of the nearest land-based zone. For example, immediately west of the Pacific coast, the zone identifier is still America/Los_Angeles, which is UTC-8 during standard time and UTC-7 during daylight time.

    • The rules/regulations/laws/etc that govern this sort of thing are inconsistent. In all practicality, the captain of the ship can declare whatever time zone they like. This happens a lot on passenger cruise ships, where the local ship time might be switched overnight. Passengers often get confused when their cell phone picks up a signal from elsewhere that switches the time to something other than the shipboard time.

    You can read more about this in Wikipedia's article about Nautical Time. There's also a note about ships at sea in the IANA tz database sources.

    Ignoring the third point above, you can get the time zone id from the lat/lon coordinates. Randy's answer is generally correct with regard to that. However I'd recommend timezonefinder rather than pytzwhere, as it uses the timezone-boundary-builder data set which includes land-based time zones in territorial waters. (Pytzwhere uses older "tz_world" data, which does not have territorial water boundaries.)

    That said, in the data set you are using - you do not need to convert time zones. The data is already in UTC. Here is the chart posted in their FAQ:

    chart
    (source: marinecadastre.gov)

    Field 2, the BaseDateTime is described as the "Full UTC date and time". In other words, they should have a Z at the end. Interpret 2017-02-01T20:05:07 as 2017-02-01T20:05:07Z. (UTC and GMT are essentially the same.) Thus, all the timestamps in the files already have the same basis - UTC. That eliminates the ambiguity of local time at sea.

    With regard to UTM, that's not related to time, but rather position. A UTM zone is a square on the surface of the Earth with specific calculated boundaries. In the data set you're working with, each smaller file is divided up by UTM zone (Zone1, Zone2, Zone3, etc.). All the data points within each file will have lat/lon in their respective UTM zones. There also appears (below the main data on the same page) to be links to larger files that have data from all zones combined, so you could use those instead if you were interested in the whole world, and UTM would not need to be considered.

    You said:

    ... I get trajectories. Some look normal, and some are horrible (looks like time travel) ...

    This is addressed in the same FAQ:

    Q: How do I account for apparent inconsistencies of the AIS timestamp and of an observed vessel’s voyage?

    A: The full timestamp is added to the record by the base station, using the time clock of the base station that is reporting in UTC. Be sure to account for your time zone shift to UTC and other offsets such as Daylight Savings Time.

    They appear to be speaking with regard to how you are observing the data. Since the timestamp added to the record is done by the base station, and that station is reporting in UTC, then it's only if you need to convert to some local time to observe the data that time zone offsets and DST come into play. If you are only tracking trajectories, then you should keep things in their original UTC basis.

    If the data is still wildly incorrect, then their is probably just bad data. It is indeed probable that incorrect GPS pings could result in what looks like wildly incorrect trajectories. You may need to find another way to filter such anomalies out of the data.

    With regard to:

    ... (I suspect) their time is measured according to the zone they're in ...

    I didn't see anything that would lead to that conclusion in the data description on the source web site. The data is in UTC, not shipboard time.