# Is there a way to check if NumPy arrays share the same data?

Posted on

### Question :

Is there a way to check if NumPy arrays share the same data?

My impression is that in NumPy, two arrays can share the same memory. Take the following example:

``````import numpy as np
a=np.arange(27)
b=a.reshape((3,3,3))
a[0]=5000
print (b[0,0,0]) #5000

#Some tests:
a.data is b.data #False
a.data == b.data #True

c=np.arange(27)
c[0]=5000
a.data == c.data #True ( Same data, not same memory storage ), False positive
``````

So clearly `b` didn’t make a copy of `a`; it just created some new meta-data and attached it to the same memory buffer that `a` is using. Is there a way to check if two arrays reference the same memory buffer?

My first impression was to use `a.data is b.data`, but that returns false. I can do `a.data == b.data` which returns True, but I don’t think that checks to make sure `a` and `b` share the same memory buffer, only that the block of memory referenced by `a` and the one referenced by `b` have the same bytes.

I think jterrace’s answer is probably the best way to go, but here is another possibility.

``````def byte_offset(a):
"""Returns a 1-d array of the byte offset of every element in `a`.
Note that these will not in general be in order."""
stride_offset = np.ix_(*map(range,a.shape))
element_offset = sum(i*s for i, s in zip(stride_offset,a.strides))
element_offset = np.asarray(element_offset).ravel()
return np.concatenate([element_offset + x for x in range(a.itemsize)])

def share_memory(a, b):
"""Returns the number of shared bytes between arrays `a` and `b`."""
a_low, a_high = np.byte_bounds(a)
b_low, b_high = np.byte_bounds(b)

beg, end = max(a_low,b_low), min(a_high,b_high)

if end - beg > 0:
# memory overlaps
amem = a_low + byte_offset(a)
bmem = b_low + byte_offset(b)

return np.intersect1d(amem,bmem).size
else:
return 0
``````

Example:

``````>>> a = np.arange(10)
>>> b = a.reshape((5,2))
>>> c = a[::2]
>>> d = a[1::2]
>>> e = a[0:1]
>>> f = a[0:1]
>>> f = f.reshape(())
>>> share_memory(a,b)
80
>>> share_memory(a,c)
40
>>> share_memory(a,d)
40
>>> share_memory(c,d)
0
>>> share_memory(a,e)
8
>>> share_memory(a,f)
8
``````

Here is a plot showing the time for each `share_memory(a,a[::2])` call as a function of the number of elements in `a` on my computer.

You can use the base attribute to check if an array shares the memory with another array:

``````>>> import numpy as np
>>> a = np.arange(27)
>>> b = a.reshape((3,3,3))
>>> b.base is a
True
>>> a.base is b
False
``````

Not sure if that solves your problem. The base attribute will be `None` if the array owns its own memory. Note that an array’s base will be another array, even if it is a subset:

``````>>> c = a[2:]
>>> c.base is a
True
``````

Just do:

``````a = np.arange(27)
a.__array_interface__['data']
``````

The second line will return a tuple where the first entry is the memory address and the second is whether the array is read only. Combined with the shape and data type, you can figure out the exact span of memory address that the array covers, so you can also work out from this when one array is a subset of another.

To solve the problem exactly, you can use

``````import numpy as np

a=np.arange(27)
b=a.reshape((3,3,3))

# Checks exactly by default
np.shares_memory(a, b)

# Checks bounds only
np.may_share_memory(a, b)
``````

Both `np.may_share_memory` and `np.shares_memory` take an optional `max_work` argument that lets you decide how much effort to put in to ensure no false positives. This problem is NP-complete, so always finding the correct answer can be quite computationally expensive.