Question :
Based on the pandas documentation from here: Docs
And the examples:
>>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
20000101 00:00:00 0
20000101 00:01:00 1
20000101 00:02:00 2
20000101 00:03:00 3
20000101 00:04:00 4
20000101 00:05:00 5
20000101 00:06:00 6
20000101 00:07:00 7
20000101 00:08:00 8
Freq: T, dtype: int64
After resampling:
>>> series.resample('3T', label='right', closed='right').sum()
20000101 00:00:00 0
20000101 00:03:00 6
20000101 00:06:00 15
20000101 00:09:00 15
In my thoughts, the bins should looks like these after resampling:
=========bin 01=========
20000101 00:00:00 0
20000101 00:01:00 1
20000101 00:02:00 2
=========bin 02=========
20000101 00:03:00 3
20000101 00:04:00 4
20000101 00:05:00 5
=========bin 03=========
20000101 00:06:00 6
20000101 00:07:00 7
20000101 00:08:00 8
Am I right on this step??
So after .sum
I thought it should be like this:
20000101 00:02:00 3
20000101 00:05:00 12
20000101 00:08:00 21
I just do not understand how it comes out:
20000101 00:00:00 0
(because label='right'
, 20000101 00:00:00 cannot be any right edge of any bins in this case).
20000101 00:09:00 15
(the label 20000101 00:09:00 even does not exists in the original Series.
Answer #1:
Short answer: If you use closed='left'
and loffset='2T'
then you’ll get what you expected:
series.resample('3T', label='left', closed='left', loffset='2T').sum()
20000101 00:02:00 3
20000101 00:05:00 12
20000101 00:08:00 21
Long answer: (or why the results you got were correct, given the arguments you used) This may not be clear from the documentation, but open and closed in this setting is about strict vs nonstrict inequality (e.g. <
vs <=
).
An example should make this clear. Using an interior interval from your example, this is the difference from changing the value of closed
:
closed='right' => ( 3:00, 6:00 ] or 3:00 < x <= 6:00
closed='left' => [ 3:00, 6:00 ) or 3:00 <= x < 6:00
You can find an explanation of the interval notation (parentheses vs brackets) in many places like here, for example:
https://en.wikipedia.org/wiki/Interval_(mathematics)
The label
parameter merely controls whether the left (3:00) or right (6:00) side is displayed, but doesn’t impact the results themselves.
Also note that you can change the starting point for the intervals with the loffset
parameter (which should be entered as a time delta).
Back to the example, where we change only the labeling from ‘right’ to ‘left’:
series.resample('3T', label='right', closed='right').sum()
20000101 00:00:00 0
20000101 00:03:00 6
20000101 00:06:00 15
20000101 00:09:00 15
series.resample('3T', label='left', closed='right').sum()
19991231 23:57:00 0
20000101 00:00:00 6
20000101 00:03:00 15
20000101 00:06:00 15
As you can see, the results are the same, only the index label changes. Pandas only lets you display the right or left label, but if it showed both, then it would look like this (below I’m using standard index notation where (
on the left side means open and ]
on the right side means closed):
( 19991231 23:57:00, 20000101 00:00:00 ] 0 # = 0
( 20000101 00:00:00, 20000101 00:03:00 ] 6 # = 1+2+3
( 20000101 00:03:00, 20000101 00:06:00 ] 15 # = 4+5+6
( 20000101 00:06:00, 20000101 00:09:00 ] 15 # = 7+8
Note that the first bin (23:57:00,00:00:00] is not empty, it’s just that it contains a single row and the value in that single row is zero. If you change ‘sum’ to ‘count’ this becomes more obvious:
series.resample('3T', label='left', closed='right').count()
19991231 23:57:00 1
20000101 00:00:00 3
20000101 00:03:00 3
20000101 00:06:00 2
Answer #2:
Per JohnE’s answer I put together a little helpful infographic which should settle this issue once and for all:
Answer #3:
It is important that resampling is performed by first producing a raster which is a sequence of instants (not periods, intervals, durations), and it is done independent of the ‘label’ and ‘closed’ parameters. It uses only the ‘freq’ parameter and ‘loffset’. In your case, the system will produce the following raster:
20000101 00:00:00
20000101 00:03:00
20000101 00:06:00
20000101 00:09:00
Note again that at this moment there is no interpretation in terms of intervals or periods. You can shift it using ‘loffset’.
Then the system will use the ‘closed’ parameter in ordre to choose among two options:

(start, end]

[start, end)
Here start and end are two adjacent time stamps in the raster. The ‘label’ parameter is used to choose whether start or end are used as a representative of the interval.
In your example, if you choose closed=’right’ then you will get the following intervals:
( previous_interval , 20000101 00:00:00]  {0}
(20000101 00:00:00, 20000101 00:03:00]  {1,2,3}
(20000101 00:03:00, 20000101 00:06:00]  {1,2,3}
(20000101 00:06:00, 20000101 00:09:00]  {4,5,6}
(20000101 00:09:00, next_interval ]  {7,8}
Note that after you aggregate the values over these intervals, the result is displayed in two versions depending on the ‘label’ parameter, that is, whether one and the same interval is represented by its left or right time stamp.