Search code examples
rdataframedata-analysis

Time sheet with data for each days of month given in columns – how to convert to date?


I have a data.frame representing the time sheet for several staff over a period of several months spanning 2 years. The data looks like:

Name         Month      1     2     3   ...  31    Total  Job     ... [more columns]

John Smith   Aug 2017   1:20                       1:20   Typing
Mary Jones   Sep 2017                                     Prooing
John Smith   Oct 2017         0:15           1:10  1:25   Typing
...
Jim Miles    Feb 2018         1:30  2:10           3:40   Admin

There are 31 columns, each representing a date in the corresponding month. There will be multiple rows with the same Name.

So looking at the first entry, John Smith did 1 hour and 20 minutes of work on 1 August 2017.

What I want to do is to analyse these data in a granular way, e.g.

  • How many hours did John Smith spend on Typing in Sept 2017?
  • How much Proofing was done in Jan-Feb 2018?

I am a bit stuck on how to proceed in order to have the data to analyse. Suggestions appreciated.

Added for clarification:

Having read three very helpful replies and looked at tidyr, I have clarified my thoughts and think that I need to modify the data so there is one row for each entry, so the example table will become:

Name         Date          Duration  Job     ... [more columns]

John Smith   01 Aug 2017   1:20      Typing
John Smith   02 Oct 2017   0:15      Typing
John Smith   31 Oct 2017   0:15      Typing
...
Jim Miles    02 Feb 2018   1:30      Admin
Jim Miles    03 Feb 2018   2:10      Admin

Date will need to be formatted correctly but that is not major. The problem is matching the day of month to the relevant Month and year to produce the composite date. Any ideas welcome.


Solution

  • After following up on the suggestions of @Khlick, I succeeded in using gather():

    mydata <- mydata %>% gather(new_date, time_spent, "1":"31")
    

    This produced two new columns, new_date and time_spent, then created a new row for each data point of columns 1–31.

    So now I had, for each data point, the month, e.g. Aug 2017 in one column, the day the work was done, e.g. 12 in another. I changed the month to a date in the original spreadsheet, so it became 2017-08-01 (all dates now have 01). Then in R I used substr() and paste() to replace the day to the correct one, i.e. 2017-08-12.

    Finally, I was left with a large number of rows with no value in time_spent. I removed those rows.

    I now have:

    Name Date Duration Job ... [more columns]

    John Smith   2017-08-01   1:20      Typing
    John Smith   2017-10-02   0:15      Typing
    John Smith   2017-10-31   0:15      Typing
    ...
    Jim Miles    2018-02-02   1:30      Admin
    Jim Miles    2018-02-03   2:10      Admin
    

    I did a few spot checks and seems data was transformed correctly. Thanks to all, especially to @Khlick.