Question :
Using the given routines (how to load Matlab .mat files with scipy), I could not access deeper nested structures to recover them into dictionaries
To present the problem I run into in more detail, I give the following toy example:
load scipy.io as spio
a = {'b':{'c':{'d': 3}}}
# my dictionary: a['b']['c']['d'] = 3
spio.savemat('xy.mat',a)
Now I want to read the mat-File back into python. I tried the following:
vig=spio.loadmat('xy.mat',squeeze_me=True)
If I now want to access the fields I get:
>> vig['b']
array(((array(3),),), dtype=[('c', '|O8')])
>> vig['b']['c']
array(array((3,), dtype=[('d', '|O8')]), dtype=object)
>> vig['b']['c']['d']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/<ipython console> in <module>()
ValueError: field named d not found.
However, by using the option struct_as_record=False
the field could be accessed:
v=spio.loadmat('xy.mat',squeeze_me=True,struct_as_record=False)
Now it was possible to access it by
>> v['b'].c.d
array(3)
Answer #1:
Here are the functions, which reconstructs the dictionaries just use this loadmat instead of scipy.io’s loadmat:
import scipy.io as spio
def loadmat(filename):
'''
this function should be called instead of direct spio.loadmat
as it cures the problem of not properly recovering python dictionaries
from mat files. It calls the function check keys to cure all entries
which are still mat-objects
'''
data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True)
return _check_keys(data)
def _check_keys(dict):
'''
checks if entries in dictionary are mat-objects. If yes
todict is called to change them to nested dictionaries
'''
for key in dict:
if isinstance(dict[key], spio.matlab.mio5_params.mat_struct):
dict[key] = _todict(dict[key])
return dict
def _todict(matobj):
'''
A recursive function which constructs from matobjects nested dictionaries
'''
dict = {}
for strg in matobj._fieldnames:
elem = matobj.__dict__[strg]
if isinstance(elem, spio.matlab.mio5_params.mat_struct):
dict[strg] = _todict(elem)
else:
dict[strg] = elem
return dict
Answer #2:
Just an enhancement to mergen’s answer, which unfortunately will stop recursing if it reaches a cell array of objects. The following version will make lists of them instead, and continuing the recursion into the cell array elements if possible.
import scipy as spio
import numpy as np
def loadmat(filename):
'''
this function should be called instead of direct spio.loadmat
as it cures the problem of not properly recovering python dictionaries
from mat files. It calls the function check keys to cure all entries
which are still mat-objects
'''
def _check_keys(d):
'''
checks if entries in dictionary are mat-objects. If yes
todict is called to change them to nested dictionaries
'''
for key in d:
if isinstance(d[key], spio.matlab.mio5_params.mat_struct):
d[key] = _todict(d[key])
return d
def _todict(matobj):
'''
A recursive function which constructs from matobjects nested dictionaries
'''
d = {}
for strg in matobj._fieldnames:
elem = matobj.__dict__[strg]
if isinstance(elem, spio.matlab.mio5_params.mat_struct):
d[strg] = _todict(elem)
elif isinstance(elem, np.ndarray):
d[strg] = _tolist(elem)
else:
d[strg] = elem
return d
def _tolist(ndarray):
'''
A recursive function which constructs lists from cellarrays
(which are loaded as numpy ndarrays), recursing into the elements
if they contain matobjects.
'''
elem_list = []
for sub_elem in ndarray:
if isinstance(sub_elem, spio.matlab.mio5_params.mat_struct):
elem_list.append(_todict(sub_elem))
elif isinstance(sub_elem, np.ndarray):
elem_list.append(_tolist(sub_elem))
else:
elem_list.append(sub_elem)
return elem_list
data = scipy.io.loadmat(filename, struct_as_record=False, squeeze_me=True)
return _check_keys(data)
Answer #3:
I was advised on the scipy mailing list (https://mail.python.org/pipermail/scipy-user/) that there are two more ways to access this data.
This works:
import scipy.io as spio
vig=spio.loadmat('xy.mat')
print vig['b'][0, 0]['c'][0, 0]['d'][0, 0]
Output on my machine:
3
The reason for this kind of access: “For historic reasons, in Matlab everything is at least a 2D array, even scalars.”
So scipy.io.loadmat mimics Matlab behavior per default.
Answer #4:
Found a solution, one can access the content of the “scipy.io.matlab.mio5_params.mat_struct object” can be investigated via:
v['b'].__dict__['c'].__dict__['d']
Answer #5:
Another method that works:
import scipy.io as spio
vig=spio.loadmat('xy.mat',squeeze_me=True)
print vig['b']['c'].item()['d']
Output:
3
I learned this method on the scipy mailing list, too. I certainly don’t understand (yet) why ‘.item()’ has to be added in, and:
print vig['b']['c']['d']
will throw an error instead:
IndexError: only integers, slices (:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices
but I’ll be back to supplement the explanation when I know it. Explanation of numpy.ndarray.item (from thenumpy reference):
Copy an element of an array to a standard Python scalar and return it.
(Please notice that this answer is basically the same as the comment of hpaulj to the initial question, but I felt that the comment is not ‘visible’ or understandable enough. I certainly did not notice it when I searched for a solution for the first time, some weeks ago).