Python: How to loop through blocks of lines

Posted on

Question :

Python: How to loop through blocks of lines

How to go through blocks of lines separated by an empty line? The file looks like the following:

ID: 1
Name: X
FamilyN: Y
Age: 20

ID: 2
Name: H
FamilyN: F
Age: 23

ID: 3
Name: S
FamilyN: Y
Age: 13

ID: 4
Name: M
FamilyN: Z
Age: 25

I want to loop through the blocks and grab the fields Name, Family name and Age in a list of 3 columns:

Y X 20
F H 23
Y S 13
Z M 25
Asked By: Adia

||

Answer #1:

Here’s another way, using itertools.groupby.
The function groupy iterates through lines of the file and calls isa_group_separator(line) for each line. isa_group_separator returns either True or False (called the key), and itertools.groupby then groups all the consecutive lines that yielded the same True or False result.

This is a very convenient way to collect lines into groups.

import itertools

def isa_group_separator(line):
    return line=='n'

with open('data_file') as f:
    for key,group in itertools.groupby(f,isa_group_separator):
        # print(key,list(group))  # uncomment to see what itertools.groupby does.
        if not key:
            data={}
            for item in group:
                field,value=item.split(':')
                value=value.strip()
                data[field]=value
            print('{FamilyN} {Name} {Age}'.format(**data))

# Y X 20
# F H 23
# Y S 13
# Z M 25
Answered By: unutbu

Answer #2:

import re
result = re.findall(
    r"""(?mx)           # multiline, verbose regex
    ^ID:.*s*           # Match ID: and anything else on that line 
    Name:s*(.*)s*     # Match name, capture all characters on this line
    FamilyN:s*(.*)s*  # etc. for family name
    Age:s*(.*)$        # and age""", 
    subject)

Result will then be

[('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]

which can be trivially changed into whatever string representation you want.

Answered By: Tim Pietzcker

Answer #3:

Use a generator.

def blocks( iterable ):
    accumulator= []
    for line in iterable:
        if start_pattern( line ):
            if accumulator:
                yield accumulator
                accumulator= []
        # elif other significant patterns
        else:
            accumulator.append( line )
     if accumulator:
         yield accumulator
Answered By: S.Lott

Answer #4:

If your file is too large to read into memory all at once, you can still use a regular expressions based solution by using a memory mapped file, with the mmap module:

import sys
import re
import os
import mmap

block_expr = re.compile('ID:.*?nAge: d+', re.DOTALL)

filepath = sys.argv[1]
fp = open(filepath)
contents = mmap.mmap(fp.fileno(), os.stat(filepath).st_size, access=mmap.ACCESS_READ)

for block_match in block_expr.finditer(contents):
    print block_match.group()

The mmap trick will provide a “pretend string” to make regular expressions work on the file without having to read it all into one large string. And the find_iter() method of the regular expression object will yield matches without creating an entire list of all matches at once (which findall() does).

I do think this solution is overkill for this use case however (still: it’s a nice trick to know…)

Answered By: Steven

Answer #5:

import itertools

# Assuming input in file input.txt
data = open('input.txt').readlines()

records = (lines for valid, lines in itertools.groupby(data, lambda l : l != 'n') if valid)    
output = [tuple(field.split(':')[1].strip() for field in itertools.islice(record, 1, None)) for record in records]

# You can change output to generator by    
output = (tuple(field.split(':')[1].strip() for field in itertools.islice(record, 1, None)) for record in records)

# output = [('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]    
#You can iterate and change the order of elements in the way you want    
# [(elem[1], elem[0], elem[2]) for elem in output] as required in your output
Answered By: Anoop

Answer #6:

If file is not huge you can read whole file with:

content = f.open(filename).read()

then you can split content to blocks using:

blocks = content.split('nn')

Now you can create function to parse block of text. I would use split('n') to get lines from block and split(':') to get key and value, eventually with str.strip() or some help of regular expressions.

Without checking if block has required data code can look like:

f = open('data.txt', 'r')
content = f.read()
f.close()
for block in content.split('nn'):
    person = {}
    for l in block.split('n'):
        k, v = l.split(': ')
        person[k] = v
    print('%s %s %s' % (person['FamilyN'], person['Name'], person['Age']))
Answered By: Micha? Niklas

Answer #7:

This answer isn’t necessarily better than what’s already been posted, but as an illustration of how I approach problems like this it might be useful, especially if you’re not used to working with Python’s interactive interpreter.

I’ve started out knowing two things about this problem. First, I’m going to use itertools.groupby to group the input into lists of data lines, one list for each individual data record. Second, I want to represent those records as dictionaries so that I can easily format the output.

One other thing that this shows is how using generators makes breaking a problem like this down into small parts easy.

>>> # first let's create some useful test data and put it into something 
>>> # we can easily iterate over:
>>> data = """ID: 1
Name: X
FamilyN: Y
Age: 20

ID: 2
Name: H
FamilyN: F
Age: 23

ID: 3
Name: S
FamilyN: Y
Age: 13"""
>>> data = data.split("n")
>>> # now we need a key function for itertools.groupby.
>>> # the key we'll be grouping by is, essentially, whether or not
>>> # the line is empty.
>>> # this will make groupby return groups whose key is True if we
>>> care about them.
>>> def is_data(line):
        return True if line.strip() else False

>>> # make sure this really works
>>> "n".join([line for line in data if is_data(line)])
'ID: 1nName: XnFamilyN: YnAge: 20nID: 2nName: HnFamilyN: FnAge: 23nID: 3nName: SnFamilyN: YnAge: 13nID: 4nName: MnFamilyN: ZnAge: 25'

>>> # does groupby return what we expect?
>>> import itertools
>>> [list(value) for (key, value) in itertools.groupby(data, is_data) if key]
[['ID: 1', 'Name: X', 'FamilyN: Y', 'Age: 20'], ['ID: 2', 'Name: H', 'FamilyN: F', 'Age: 23'], ['ID: 3', 'Name: S', 'FamilyN: Y', 'Age: 13'], ['ID: 4', 'Name: M', 'FamilyN: Z', 'Age: 25']]
>>> # what we really want is for each item in the group to be a tuple
>>> # that's a key/value pair, so that we can easily create a dictionary
>>> # from each item.
>>> def make_key_value_pair(item):
        items = item.split(":")
        return (items[0].strip(), items[1].strip())

>>> make_key_value_pair("a: b")
('a', 'b')
>>> # let's test this:
>>> dict(make_key_value_pair(item) for item in ["a:1", "b:2", "c:3"])
{'a': '1', 'c': '3', 'b': '2'}
>>> # we could conceivably do all this in one line of code, but this 
>>> # will be much more readable as a function:
>>> def get_data_as_dicts(data):
        for (key, value) in itertools.groupby(data, is_data):
            if key:
                yield dict(make_key_value_pair(item) for item in value)

>>> list(get_data_as_dicts(data))
[{'FamilyN': 'Y', 'Age': '20', 'ID': '1', 'Name': 'X'}, {'FamilyN': 'F', 'Age': '23', 'ID': '2', 'Name': 'H'}, {'FamilyN': 'Y', 'Age': '13', 'ID': '3', 'Name': 'S'}, {'FamilyN': 'Z', 'Age': '25', 'ID': '4', 'Name': 'M'}]
>>> # now for an old trick:  using a list of column names to drive the output.
>>> columns = ["Name", "FamilyN", "Age"]
>>> print "n".join(" ".join(d[c] for c in columns) for d in get_data_as_dicts(data))
X Y 20
H F 23
S Y 13
M Z 25
>>> # okay, let's package this all into one function that takes a filename
>>> def get_formatted_data(filename):
        with open(filename, "r") as f:
            columns = ["Name", "FamilyN", "Age"]
            for d in get_data_as_dicts(f):
                yield " ".join(d[c] for c in columns)

>>> print "n".join(get_formatted_data("c:\temp\test_data.txt"))
X Y 20
H F 23
S Y 13
M Z 25
Answered By: Robert Rossney

Answer #8:

Use a dict, namedtuple, or custom class to store each attribute as you come across it, then append the object to a list when you reach a blank line or EOF.

Leave a Reply

Your email address will not be published.