Find all occurrences of a key in nested dictionaries and lists

Posted on

Question :

Find all occurrences of a key in nested dictionaries and lists

I have a dictionary like this:

{ "id" : "abcde",
  "key1" : "blah",
  "key2" : "blah blah",
  "nestedlist" : [ 
    { "id" : "qwerty",
      "nestednestedlist" : [ 
        { "id" : "xyz",
          "keyA" : "blah blah blah" },
        { "id" : "fghi",
          "keyZ" : "blah blah blah" }],
      "anothernestednestedlist" : [ 
        { "id" : "asdf",
          "keyQ" : "blah blah" },
        { "id" : "yuiop",
          "keyW" : "blah" }] } ] } 

Basically a dictionary with nested lists, dictionaries, and strings, of arbitrary depth.

What is the best way of traversing this to extract the values of every “id” key? I want to achieve the equivalent of an XPath query like “//id”. The value of “id” is always a string.

So from my example, the output I need is basically:

["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"]

Order is not important.

Answer #1:

I found this Q/A very interesting, since it provides several different solutions for the same problem. I took all these functions and tested them with a complex dictionary object. I had to take two functions out of the test, because they had to many fail results and they did not support returning lists or dicts as values, which i find essential, since a function should be prepared for almost any data to come.

So i pumped the other functions in 100.000 iterations through the timeit module and output came to following result:

0.11 usec/pass on gen_dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6.03 usec/pass on find_all_items(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.15 usec/pass on findkeys(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.79 usec/pass on get_recursively(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.14 usec/pass on find(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.36 usec/pass on dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

All functions had the same needle to search for (‘logging’) and the same dictionary object, which is constructed like this:

o = { 'temparature': '50', 
      'logging': {
        'handlers': {
          'console': {
            'formatter': 'simple', 
            'class': 'logging.StreamHandler', 
            'stream': 'ext://sys.stdout', 
            'level': 'DEBUG'
          }
        },
        'loggers': {
          'simpleExample': {
            'handlers': ['console'], 
            'propagate': 'no', 
            'level': 'INFO'
          },
         'root': {
           'handlers': ['console'], 
           'level': 'DEBUG'
         }
       }, 
       'version': '1', 
       'formatters': {
         'simple': {
           'datefmt': "'%Y-%m-%d %H:%M:%S'", 
           'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
         }
       }
     }, 
     'treatment': {'second': 5, 'last': 4, 'first': 4},   
     'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]]
}

All functions delivered the same result, but the time differences are dramatic! The function gen_dict_extract(k,o) is my function adapted from the functions here, actually it is pretty much like the find function from Alfe, with the main difference, that i am checking if the given object has iteritems function, in case strings are passed during recursion:

def gen_dict_extract(key, var):
    if hasattr(var,'iteritems'):
        for k, v in var.iteritems():
            if k == key:
                yield v
            if isinstance(v, dict):
                for result in gen_dict_extract(key, v):
                    yield result
            elif isinstance(v, list):
                for d in v:
                    for result in gen_dict_extract(key, d):
                        yield result

So this variant is the fastest and safest of the functions here. And find_all_items is incredibly slow and far off the second slowest get_recursivley while the rest, except dict_extract, is close to each other. The functions fun and keyHole only work if you are looking for strings.

Interesting learning aspect here 🙂

Answered By: hexerei software

Answer #2:

d = { "id" : "abcde",
    "key1" : "blah",
    "key2" : "blah blah",
    "nestedlist" : [ 
    { "id" : "qwerty",
        "nestednestedlist" : [ 
        { "id" : "xyz", "keyA" : "blah blah blah" },
        { "id" : "fghi", "keyZ" : "blah blah blah" }],
        "anothernestednestedlist" : [ 
        { "id" : "asdf", "keyQ" : "blah blah" },
        { "id" : "yuiop", "keyW" : "blah" }] } ] } 


def fun(d):
    if 'id' in d:
        yield d['id']
    for k in d:
        if isinstance(d[k], list):
            for i in d[k]:
                for j in fun(i):
                    yield j

>>> list(fun(d))
['abcde', 'qwerty', 'xyz', 'fghi', 'asdf', 'yuiop']
Answered By: kev

Answer #3:

d = { "id" : "abcde",
    "key1" : "blah",
    "key2" : "blah blah",
    "nestedlist" : [
    { "id" : "qwerty",
        "nestednestedlist" : [
        { "id" : "xyz", "keyA" : "blah blah blah" },
        { "id" : "fghi", "keyZ" : "blah blah blah" }],
        "anothernestednestedlist" : [
        { "id" : "asdf", "keyQ" : "blah blah" },
        { "id" : "yuiop", "keyW" : "blah" }] } ] }


def findkeys(node, kv):
    if isinstance(node, list):
        for i in node:
            for x in findkeys(i, kv):
               yield x
    elif isinstance(node, dict):
        if kv in node:
            yield node[kv]
        for j in node.values():
            for x in findkeys(j, kv):
                yield x

print(list(findkeys(d, 'id')))
Answered By: arainchi

Answer #4:

def find(key, value):
  for k, v in value.iteritems():
    if k == key:
      yield v
    elif isinstance(v, dict):
      for result in find(key, v):
        yield result
    elif isinstance(v, list):
      for d in v:
        for result in find(key, d):
          yield result

EDIT: @Anthon noticed that this will not work for directly nested lists. If you have this in your input, you can use this:

def find(key, value):
  for k, v in (value.iteritems() if isinstance(value, dict) else
               enumerate(value) if isinstance(value, list) else []):
    if k == key:
      yield v
    elif isinstance(v, (dict, list)):
      for result in find(key, v):
        yield result

But I think the original version is easier to understand, so I will leave it.

Answered By: Alfe

Answer #5:

Another variation, which includes the nested path to the found results (note: this version doesn’t consider lists):

def find_all_items(obj, key, keys=None):
    """
    Example of use:
    d = {'a': 1, 'b': 2, 'c': {'a': 3, 'd': 4, 'e': {'a': 9, 'b': 3}, 'j': {'c': 4}}}
    for k, v in find_all_items(d, 'a'):
        print "* {} = {} *".format('->'.join(k), v)    
    """
    ret = []
    if not keys:
        keys = []
    if key in obj:
        out_keys = keys + [key]
        ret.append((out_keys, obj[key]))
    for k, v in obj.items():
        if isinstance(v, dict):
            found_items = find_all_items(v, key, keys=(keys+[k]))
            ret += found_items
    return ret
Answered By: Dolan Antenucci

Answer #6:

I just wanted to iterate on @hexerei-software’s excellent answer using yield from and accepting top-level lists.

def gen_dict_extract(var, key):
    if isinstance(var, dict):
        for k, v in var.items():
            if k == key:
                yield v
            if isinstance(v, (dict, list)):
                yield from gen_dict_extract(v, key)
    elif isinstance(var, list):
        for d in var:
            yield from gen_dict_extract(d, key)
Answered By: Chris

Answer #7:

This function recursively searches a dictionary containing nested dictionaries and lists. It builds a list called fields_found, which contains the value for every time the field is found. The ‘field’ is the key I’m looking for in the dictionary and its nested lists and dictionaries.

def get_recursively(search_dict, field):
    """Takes a dict with nested lists and dicts,
    and searches all dicts for a key of the field
    provided.
    """
    fields_found = []

    for key, value in search_dict.iteritems():

        if key == field:
            fields_found.append(value)

        elif isinstance(value, dict):
            results = get_recursively(value, field)
            for result in results:
                fields_found.append(result)

        elif isinstance(value, list):
            for item in value:
                if isinstance(item, dict):
                    more_results = get_recursively(item, field)
                    for another_result in more_results:
                        fields_found.append(another_result)

    return fields_found
Answered By: Becca Petrin

Answer #8:

pip install nested-lookup
does exactly what you are looking for:

document = [ { 'taco' : 42 } , { 'salsa' : [ { 'burrito' : { 'taco' : 69 } } ] } ]

>>> print(nested_lookup('taco', document))
[42, 69]
Answered By: VengaVenga

Leave a Reply

Your email address will not be published. Required fields are marked *