Search code examples
pythonpandasdataframejupyter-labdatestamp

Printing row based on datestamp condition of another column


Background:
I have a DataFrame ('weather_tweets') containing two columns of interest, weather (weather on the planet Mars) and date (the date the weather relates). Structure as follows:
enter image description here

Objective:
I am trying to write code that will determine the latest datestamp (date column) and print that row's corresponding weather column value.

Sample rows:
Here is a sample row:

weather_tweets = [
    ('tweet', 'weather', 'date'),
    ('Mars Weather@MarsWxReport·Jul 15InSight sol 58', 'InSight sol 580 (2020-07-14) low -88.8ºC (-127.8ºF) high -8.4ºC (16.8ºF) winds from the WNW at 5.9 m/s (13.3 mph) gusting to 15.4 m/s (34.4 mph) pressure at 7.80 hPa, '2020-07-14')]

My code:
Thus far, I have only been able to formulate some messy code that will return the latest dates in order, but it's pretty useless for my expected results:

latest_weather = weather_tweets.groupby(['tweet', 'weather'])['date'].transform(max) == weather_tweets['date']

print(weather_tweets[latest_weather])

Any advice on how to reach the desired result would be much appreciated.


Solution

  • Try:

    weather_tweets[weather_tweets.date == weather_tweets.date.max()].weather
    

    You can add to_frame() at the end to obtain more elegant dataframe result:

    weather_tweets[weather_tweets.date == weather_tweets.date.max()].weather.to_frame()
    

    Or create new dataframe:

    df_latest = weather_tweets.loc[weather_tweets.date == weather_tweets.date.max(),['weather','date']]
    df_max.columns = ['latest_weather','latest_date']