iterating over a range of rows using ws.iter_rows in the optimised reader of openpyxl

Posted on

Question :

iterating over a range of rows using ws.iter_rows in the optimised reader of openpyxl

I need to read an xlsx file of 10×5324 cells

This is the gist of what i was trying to do:

from openpyxl import load_workbook
filename = 'file_path'

wb = load_workbook(filename)
ws = wb.get_sheet_by_name('LOG')

col = {'Time':0 ...}

for i in ws.columns[col['Time']][1:]:
    print i.value.hour

The code was taking much too long to run then it should (I was performing operations, not printing) and after a while I got impatient and cancelled it.

Any idea how I can work it in the optimized reader?
I need to iterate over a range of rows, not over all rows.
This is what i tried, but it’s wrong:

wb = load_workbook(filename, use_iterators = True)
ws = wb.get_sheet_by_name('LOG')
for i in ws.iter_rows[1:]:
    print i[col['Time']].value.hour

Is there any way I can do it without the range function?

I guess one way to do it would be:

for i in ws.iter_rows[1:]:
    if i.row == startrow:
    print i[col['Time']].value.hour
    if i.row == endrow:

but is there a more elegant solution?
(that doesn’t work either btw)

Answer #1:

The simplest solution with a lower bound would be something like this:

# Your code:
from openpyxl import load_workbook
filename = 'file_path'
wb = load_workbook(filename, use_iterators=True)
ws = wb.get_sheet_by_name('LOG')

# Solution 1:
for row in ws.iter_rows(row_offset=1):
    # code to execute per row...

Here another way to execute what you describe, with the enumerate function:

# Solution 2:
start, stop = 1, 100    # This will allow you to set a lower and upper limit
for index, row in enumerate(ws.iter_rows()):
    if start < index < stop:
        # code to execute per row...

The index variable keeps count of what row you are on, so it can be used in place of range or xrange. This method is pretty straightforward and works with iterators unlike range or slicing, and can be used with just the lower bound too, if desired. Cheers!

Answered By: Ali Haroon

Answer #2:

From the documentation:

Note: When a worksheet is created in memory, it contains no cells. They
are created when first accessed. This way we don’t create objects that
would never be accessed, thus reducing the memory footprint.

Warning: Because of this feature, scrolling through cells instead of accessing
them directly will create them all in memory, even if you don’t assign
them a value. Something like

>>> for i in xrange(0,100):
...             for j in xrange(0,100):
...                     ws.cell(row = i, column = j)

will create 100×100 cells in memory, for nothing.

However, there is a way to clean all those unwanted cells, we’ll see
that later.

I think accessing the columns or rows properties will cause many cells to have to be loaded into memory. I would suggest only trying to directly access the cells you need.


col_name = 'A'
start_row = 1
end_row = 99

range_expr = "{col}{start_row}:{col}{end_row}".format(
    col=col_name, start_row=start_row, end_row=end_row)

for (time_cell,) in ws.iter_rows(range_string=range_expr):
    print time_cell.value.hour
Answered By: mikeybaby173

Leave a Reply

Your email address will not be published. Required fields are marked *