Splitting a list based on a delimiter word

Posted on

Question :

Splitting a list based on a delimiter word

I have a list containing various string values. I want to split the list whenever I see WORD. The result will be a list of lists (which will be the sublists of original list) containing exactly one instance of the WORD I can do this using a loop but is there a more pythonic way to do achieve this ?

Example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']

result = [['A'], ['WORD','B','C'],['WORD','D']]

This is what I have tried but it actually does not achieve what I want since it will put WORD in a different list that it should be in:

def split_excel_cells(delimiter, cell_data):

    result = []

    temp = []

    for cell in cell_data:
        if cell == delimiter:
            temp.append(cell)
            result.append(temp)
            temp = []
        else:
            temp.append(cell)

    return result

Answer #1:

I would use a generator:

def group(seq, sep):
    g = []
    for el in seq:
        if el == sep:
            yield g
            g = []
        g.append(el)
    yield g

ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)

This prints

[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

The code accepts any iterable, and produces an iterable (which you don’t have to flatten into a list if you don’t want to).

Answered By: NPE

Answer #2:

import itertools

lst = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
w = 'WORD'

spl = [list(y) for x, y in itertools.groupby(lst, lambda z: z == w) if not x]

this creates a splitted list without delimiters, which looks more logical to me:

[['A'], ['B', 'C'], ['D']]

If you insist on delimiters to be included, this should do the trick:

spl = [[]]
for x, y in itertools.groupby(lst, lambda z: z == w):
    if x: spl.append([])
    spl[-1].extend(y)
Answered By: georg

Answer #3:

  • @NPE’s solution looks very pythonic to me. This is another one using itertools:
  • izip is specific to python 2.7. Replace izip with zip to work in python 3
from itertools import izip, chain
example = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
indices = [i for i,x in enumerate(example) if x=="WORD"]
pairs = izip(chain([0], indices), chain(indices, [None]))
result = [example[i:j] for i, j in pairs]
Answered By: A. Rodas

Answer #4:

Given

import more_itertools as mit


iterable = ["A", "WORD", "B" , "C" , "WORD" , "D"]
pred = lambda x: x == "WORD"

Code

list(mit.split_before(iterable, pred))
# [['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

more_itertools is a third-party library installable via > pip install more_itertools.

See also split_at and split_after.

Answered By: pylang

Leave a Reply

Your email address will not be published. Required fields are marked *