Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes

Posted on

Solving problem is about exposing yourself to as many situations as possible like Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes, which can be followed any time. Take easy to follow this discuss.

Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes

I have two columns, fromdate and todate, in a dataframe.

import pandas as pd
data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
df = pd.DataFrame(data)

I add a new column, diff, to find the difference between the two dates using

df['diff'] = df['fromdate'] - df['todate']

I get the diff column, but it contains days, when there’s more than 24 hours.

                   todate                fromdate                   diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000

How do I convert my results to only hours and minutes (i.e. days are converted to hours)?

Asked By: sbalajis

||

Answer #1:

Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so

import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')

to yield,

0    58
1     3
2     8
dtype: float64
Answered By: nitin

Answer #2:

This was driving me bonkers as the .astype() solution above didn’t work for me. But I found another way. Haven’t timed it or anything, but might work for others out there:

t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')
print pd.Timedelta(t2 - t1).seconds / 3600.0

…if you want hours. Or:

print pd.Timedelta(t2 - t1).seconds / 60.0

…if you want minutes.

Answered By: elPastor

Answer #3:

  • How do I convert my results to only hours and minutes
    • The accepted answer only returns days + hours. Minutes are not included.
  • To provide a column that has hours and minutes, as hh:mm or x hours y minutes, would require additional calculations and string formatting.
  • This answer shows how to get either total hours or total minutes as a float, using timedelta math, and is faster than using .astype('timedelta64[h]')
  • Pandas Time Deltas User Guide
  • Pandas Time series / date functionality User Guide
  • python timedelta objects: See supported operations.
  • The following sample data is already a datetime64[ns] dtype. It is required that all relevant columns are converted using pandas.to_datetime().
import pandas as pd
# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
        'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)
# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date)  # also works
df['time_delta'] = (df.from_date - df.to_date)
# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)
# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)
# display(df)
                  to_date               from_date             time_delta  tot_hour_diff  tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000      58.636061    3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000       3.684528     221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000       8.714933     522.896000

Other methods

  • An item of note from the podcast in Other Resources, .total_seconds() was added and merged when the core developer was on vacation, and would not have been approved.
    • This is also why there aren’t other .total_xx methods.
# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0    211089.82
1     13264.30
2     31373.76
dtype: float64
# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0    2
1    0
2    0
dtype: int64
# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0    38289
1    13264
2    31373
dtype: int64

Other Resources

%%timeit test

import pandas as pd
# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)
%%timeit
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
(df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Answered By: Trenton McKinney
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published. Required fields are marked *