Iterate an iterator by chunks (of n) in Python? [duplicate]

Posted on

Question :

Iterate an iterator by chunks (of n) in Python? [duplicate]

Can you think of a nice way (maybe with itertools) to split an iterator into chunks of given size?

Therefore l=[1,2,3,4,5,6,7] with chunks(l,3) becomes an iterator [1,2,3], [4,5,6], [7]

I can think of a small program to do that but not a nice way with maybe itertools.

Asked By: Gerenuk

||

Answer #1:

The grouper() recipe from the itertools documentation’s recipes comes close to what you want:

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

It will fill up the last chunk with a fill value, though.

A less general solution that only works on sequences but does handle the last chunk as desired is

[my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size)]

Finally, a solution that works on general iterators an behaves as desired is

def grouper(n, iterable):
    it = iter(iterable)
    while True:
        chunk = tuple(itertools.islice(it, n))
        if not chunk:
            return
        yield chunk
Answered By: Sven Marnach

Answer #2:

Although OP asks function to return chunks as list or tuple, in case you need to return iterators, then Sven Marnach’s solution can be modified:

def grouper_it(n, iterable):
    it = iter(iterable)
    while True:
        chunk_it = itertools.islice(it, n)
        try:
            first_el = next(chunk_it)
        except StopIteration:
            return
        yield itertools.chain((first_el,), chunk_it)

Some benchmarks: http://pastebin.com/YkKFvm8b

It will be slightly more efficient only if your function iterates through elements in every chunk.

Answered By: reclosedev

Answer #3:

This will work on any iterable. It returns generator of generators (for full flexibility). I now realize that it’s basically the same as @reclosedevs solution, but without the fluff. No need for try...except as the StopIteration propagates up, which is what we want.

The next(iterable) call is needed to raise the StopIteration when the iterable is empty, since islice will continue spawning empty generators forever if you let it.

It’s better because it’s only two lines long, yet easy to comprehend.

def grouper(iterable, n):
    while True:
        yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))

Note that next(iterable) is put into a tuple. Otherwise, if next(iterable) itself were iterable, then itertools.chain would flatten it out. Thanks to Jeremy Brown for pointing out this issue.

Answered By: Svein Lindal

Answer #4:

I was working on something today and came up with what I think is a simple solution. It is similar to jsbueno’s answer, but I believe his would yield empty groups when the length of iterable is divisible by n. My answer does a simple check when the iterable is exhausted.

def chunk(iterable, chunk_size):
    """Generate sequences of `chunk_size` elements from `iterable`."""
    iterable = iter(iterable)
    while True:
        chunk = []
        try:
            for _ in range(chunk_size):
                chunk.append(iterable.next())
            yield chunk
        except StopIteration:
            if chunk:
                yield chunk
            break
Answered By: eidorb

Answer #5:

Here’s one that returns lazy chunks; use map(list, chunks(...)) if you want lists.

from itertools import islice, chain
from collections import deque

def chunks(items, n):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, n-1))
        yield chunk
        deque(chunk, 0)

if __name__ == "__main__":
    for chunk in map(list, chunks(range(10), 3)):
        print chunk

    for i, chunk in enumerate(chunks(range(10), 3)):
        if i % 2 == 1:
            print "chunk #%d: %s" % (i, list(chunk))
        else:
            print "skipping #%d" % i
Answered By: Peter Otten

Answer #6:

A succinct implementation is:

chunker = lambda iterable, n: (ifilterfalse(lambda x: x == (), chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=())))

This works because [iter(iterable)]*n is a list containing the same iterator n times; zipping over that takes one item from each iterator in the list, which is the same iterator, with the result that each zip-element contains a group of n items.

izip_longest is needed to fully consume the underlying iterable, rather than iteration stopping when the first exhausted iterator is reached, which chops off any remainder from iterable. This results in the need to filter out the fill-value. A slightly more robust implementation would therefore be:

def chunker(iterable, n):
    class Filler(object): pass
    return (ifilterfalse(lambda x: x is Filler, chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=Filler)))

This guarantees that the fill value is never an item in the underlying iterable. Using the definition above:

iterable = range(1,11)

map(tuple,chunker(iterable, 3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)]

map(tuple,chunker(iterable, 2))
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

map(tuple,chunker(iterable, 4))
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]

This implementation almost does what you want, but it has issues:

def chunks(it, step):
  start = 0
  while True:
    end = start+step
    yield islice(it, start, end)
    start = end

(The difference is that because islice does not raise StopIteration or anything else on calls that go beyond the end of it this will yield forever; there is also the slightly tricky issue that the islice results must be consumed before this generator is iterated).

To generate the moving window functionally:

izip(count(0, step), count(step, step))

So this becomes:

(it[start:end] for (start,end) in izip(count(0, step), count(step, step)))

But, that still creates an infinite iterator. So, you need takewhile (or perhaps something else might be better) to limit it:

chunk = lambda it, step: takewhile((lambda x: len(x) > 0), (it[start:end] for (start,end) in izip(count(0, step), count(step, step))))

g = chunk(range(1,11), 3)

tuple(g)
([1, 2, 3], [4, 5, 6], [7, 8, 9], [10])

Answered By: Marcin

Answer #7:

I forget where I found the inspiration for this. I’ve modified it a little to work with MSI GUID’s in the Windows Registry:

def nslice(s, n, truncate=False, reverse=False):
    """Splits s into n-sized chunks, optionally reversing the chunks."""
    assert n > 0
    while len(s) >= n:
        if reverse: yield s[:n][::-1]
        else: yield s[:n]
        s = s[n:]
    if len(s) and not truncate:
        yield s

reverse doesn’t apply to your question, but it’s something I use extensively with this function.

>>> [i for i in nslice([1,2,3,4,5,6,7], 3)]
[[1, 2, 3], [4, 5, 6], [7]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True)]
[[1, 2, 3], [4, 5, 6]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True, reverse=True)]
[[3, 2, 1], [6, 5, 4]]
Answered By: Zach Young

Answer #8:

Here you go.

def chunksiter(l, chunks):
    i,j,n = 0,0,0
    rl = []
    while n < len(l)/chunks:        
        rl.append(l[i:j+chunks])        
        i+=chunks
        j+=j+chunks        
        n+=1
    return iter(rl)


def chunksiter2(l, chunks):
    i,j,n = 0,0,0
    while n < len(l)/chunks:        
        yield l[i:j+chunks]
        i+=chunks
        j+=j+chunks        
        n+=1

Examples:

for l in chunksiter([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]

for l in chunksiter2([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]


for l in chunksiter2([1,2,3,4,5,6,7,8],5):
    print(l)

[1, 2, 3, 4, 5]
[6, 7, 8]
Answered By: Carlos Quintanilla

Leave a Reply

Your email address will not be published.