Search code examples
pythonpandasnumpydatetimepython-polars

How should I parse times in the Japanese "30-hour" format for data analysis?


I'm considering a data analysis project involving information on Japanese TV broadcasts. The relevant data will include broadcast times, and some of those will be for programs that aired late at night.

Late-night Japanese TV schedules follow a non-standard time format called the 30-hour system (brief English explanation here). Most times are given in normal Japan Standard Time, formatted as %H:%M. Times from midnight to 6 AM, however, are treated as an extension of the previous day and numbered accordingly, under the logic that that's how people staying up late experience them. For example, Macross Frontier was broadcast in Kansai at 1:25 AM, but it was written as 25:25.

I want to use this data in a Pandas or Polars DataFrame. Theoretically, it could be left as a string, but it'd be more useful to convert it to a standard format for datetimes -- either Python's built-in type, or the types used in NumPy or Polars. One simple approach could be:

from datetime import date, time, datetime
from zoneinfo import ZoneInfo

def process_30hour(d: date, t: str):
    h, m = [int(n) for n in t.split(':')] # assumes format 'HH:MM' for t
    if h > 23:
        h -= 24
        d += 1
    return datetime.combine(d, time(h, m), ZoneInfo('Japan'))

This could then be applied to a whole DataFrame with DataFrame.apply(). There may be a more performant way, however, especially considering the vectorization features of DataFrames -- both libraries recommend avoiding DataFrame.apply() if there's an alternative.


Solution

  • IIUC, you could use create a datetime with '00:00' as time and add the hours as timedelta:

    from datetime import date, time, datetime, timedelta
    from zoneinfo import ZoneInfo
    
    def process_30hour(d: date, t: str):
        h, m = map(int, t.split(':')) # assumes format 'HH:MM' for t
        return (datetime.combine(d, time(), ZoneInfo('Japan'))
               + timedelta(hours=h, minutes=m))
    
    process_30hour(date(2024, 12, 20), '25:25')
    

    Output:

    datetime.datetime(2024, 12, 21, 1, 25, tzinfo=zoneinfo.ZoneInfo(key='Japan'))
    

    The same logic can be used vectorially with pandas:

    df = pd.DataFrame({'date': ['2024-12-20 20:25', '2024-12-20 25:25']})
    
    # split string
    tmp = df['date'].str.split(expand=True)
    
    # convert to datetime/timedelta, combine
    df['datetime'] = pd.to_datetime(tmp[0]) + pd.to_timedelta(tmp[1]+':00')
    

    For fun, as a one-liner:

    df['datetime'] = (df['date'].add(':00')
                      .str.split(expand=True)
                      .astype({0: 'datetime64[ns]',
                               1: 'timedelta64[ns]'})
                      .sum(axis=1)
                     )
    

    Output:

                   date            datetime
    0  2024-12-20 20:25 2024-12-20 20:25:00
    1  2024-12-20 25:25 2024-12-21 01:25:00