Solving problem is about exposing yourself to as many situations as possible like Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes, which can be followed any time. Take easy to follow this discuss.
I have two columns, fromdate
and todate
, in a dataframe.
import pandas as pd
data = {'todate': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'fromdate': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
df = pd.DataFrame(data)
I add a new column, diff
, to find the difference between the two dates using
df['diff'] = df['fromdate'] - df['todate']
I get the diff
column, but it contains days
, when there’s more than 24 hours.
todate fromdate diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000
How do I convert my results to only hours and minutes (i.e. days are converted to hours)?
Answer #1:
Pandas timestamp differences returns a datetime.timedelta object. This can easily be converted into hours by using the *as_type* method, like so
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
to yield,
0 58
1 3
2 8
dtype: float64
Answer #2:
This was driving me bonkers as the .astype()
solution above didn’t work for me. But I found another way. Haven’t timed it or anything, but might work for others out there:
t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')
print pd.Timedelta(t2 - t1).seconds / 3600.0
…if you want hours. Or:
print pd.Timedelta(t2 - t1).seconds / 60.0
…if you want minutes.
Answer #3:
- How do I convert my results to only hours and minutes
- The accepted answer only returns
days + hours
. Minutes are not included.
- The accepted answer only returns
- To provide a column that has hours and minutes, as
hh:mm
orx hours y minutes
, would require additional calculations and string formatting. - This answer shows how to get either total hours or total minutes as a float, using
timedelta
math, and is faster than using.astype('timedelta64[h]')
- Pandas Time Deltas User Guide
- Pandas Time series / date functionality User Guide
- python
timedelta
objects: See supported operations. - The following sample data is already a
datetime64[ns] dtype
. It is required that all relevant columns are converted usingpandas.to_datetime()
.
import pandas as pd
# test data from OP, with values already in a datetime format
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000'), pd.Timestamp('2014-01-23 10:07:47.660000')],
'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000'), pd.Timestamp('2014-01-23 18:50:41.420000')]}
# test dataframe; the columns must be in a datetime format; use pandas.to_datetime if needed
df = pd.DataFrame(data)
# add a timedelta column if wanted. It's added here for information only
# df['time_delta_with_sub'] = df.from_date.sub(df.to_date) # also works
df['time_delta'] = (df.from_date - df.to_date)
# create a column with timedelta as total hours, as a float type
df['tot_hour_diff'] = (df.from_date - df.to_date) / pd.Timedelta(hours=1)
# create a colume with timedelta as total minutes, as a float type
df['tot_mins_diff'] = (df.from_date - df.to_date) / pd.Timedelta(minutes=1)
# display(df)
to_date from_date time_delta tot_hour_diff tot_mins_diff
0 2014-01-24 13:03:12.050 2014-01-26 23:41:21.870 2 days 10:38:09.820000 58.636061 3518.163667
1 2014-01-27 11:57:18.240 2014-01-27 15:38:22.540 0 days 03:41:04.300000 3.684528 221.071667
2 2014-01-23 10:07:47.660 2014-01-23 18:50:41.420 0 days 08:42:53.760000 8.714933 522.896000
Other methods
- An item of note from the podcast in Other Resources,
.total_seconds()
was added and merged when the core developer was on vacation, and would not have been approved.- This is also why there aren’t other
.total_xx
methods.
- This is also why there aren’t other
# convert the entire timedelta to seconds
# this is the same as td / timedelta(seconds=1)
(df.from_date - df.to_date).dt.total_seconds()
[out]:
0 211089.82
1 13264.30
2 31373.76
dtype: float64
# get the number of days
(df.from_date - df.to_date).dt.days
[out]:
0 2
1 0
2 0
dtype: int64
# get the seconds for hours + minutes + seconds, but not days
# note the difference from total_seconds
(df.from_date - df.to_date).dt.seconds
[out]:
0 38289
1 13264
2 31373
dtype: int64
Other Resources
- Talk Python to Me: Episode #271: Unlock the mysteries of time, Python’s datetime that is!
- Timedelta begins at 31 minutes
- As per Python core developer Paul Ganssle and python
dateutil
maintainer:- Use
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
- Don’t use
(df.from_date - df.to_date).dt.total_seconds() / 3600
- Use
- Real Python: Using Python datetime to Work With Dates and Times
- The
dateutil
module provides powerful extensions to the standarddatetime
module.
%%timeit
test
import pandas as pd
# dataframe with 2M rows
data = {'to_date': [pd.Timestamp('2014-01-24 13:03:12.050000'), pd.Timestamp('2014-01-27 11:57:18.240000')], 'from_date': [pd.Timestamp('2014-01-26 23:41:21.870000'), pd.Timestamp('2014-01-27 15:38:22.540000')]}
df = pd.DataFrame(data)
df = pd.concat([df] * 1000000).reset_index(drop=True)
%%timeit
(df.from_date - df.to_date) / pd.Timedelta(hours=1)
[out]:
43.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
(df.from_date - df.to_date).astype('timedelta64[h]')
[out]:
59.8 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)