smartest way to join two lists into a formatted string

Posted on

Question :

smartest way to join two lists into a formatted string

Lets say I have two lists of same length:

a = ['a1', 'a2', 'a3']
b = ['b1', 'b2', 'b3']

and I want to produce the following string:

c = 'a1=b1, a2=b2, a3=b3'

What is the best way to achieve this?

I have following implementations:

import timeit

a = [str(f) for f in range(500)]
b = [str(f) for f in range(500)]

def func1():
    return ', '.join([aa+'='+bb for aa in a for bb in b if a.index(aa) == b.index(bb)])

def func2():
    list = []
    for i in range(len(a)):
        list.append('%s=%s' % (a[i], b[i]))
    return ', '.join(list)

t = timeit.Timer(setup='from __main__ import func1', stmt='func1()')
print 'func1 = ' + t.timeit(10) 

t = timeit.Timer(setup='from __main__ import func2', stmt='func2()')
print 'func2 = ' + t.timeit(10)

and the output is:

func1 = 32.4704790115
func2 = 0.00529003143311

Do you have some trade-off?

Asked By: Jib


Answer #1:

a = ['a1', 'a2', 'a3']
b = ['b1', 'b2', 'b3']

pat = '%s=%%s, %s=%%s, %s=%%s'

print pat % tuple(a) % tuple(b)

gives a1=b1, a2=b2, a3=b3



from timeit import Timer
from itertools import izip

n = 300

a = [str(f) for f in range(n)]
b = [str(f) for f in range(n)]

def func1():
    return ', '.join([aa+'='+bb for aa in a for bb in b if a.index(aa) == b.index(bb)])

def func2():
    list = []
    for i in range(len(a)):
        list.append('%s=%s' % (a[i], b[i]))
    return ', '.join(list)

def func3():
    return ', '.join('%s=%s' % t for t in zip(a, b))

def func4():
    return ', '.join('%s=%s' % t for t in izip(a, b))

def func5():
    pat = n * '%s=%%s, '
    return pat % tuple(a) % tuple(b)

d = dict(zip((1,2,3,4,5),('heavy','append','zip','izip','% formatting')))
for i in xrange(1,6):
    t = Timer(setup='from __main__ import func%d'%i, stmt='func%d()'%i)
    print 'func%d = %s  %s' % (i,t.timeit(10),d[i])


func1 = 16.2272833558  heavy
func2 = 0.00410247671143  append
func3 = 0.00349569568199  zip
func4 = 0.00301686387516  izip
func5 = 0.00157338432678  % formatting
Answered By: eyquem

Answer #2:

This implementation is, on my system, faster than either of your two functions and still more compact.

c = ', '.join('%s=%s' % t for t in zip(a, b))

Thanks to @JBernardo for the suggested improvement.

In more recent syntax, str.format is more appropriate:

c = ', '.join('{}={}'.format(*t) for t in zip(a, b))

This produces the largely the same output, though it can accept any object with a __str__ method, so two lists of integers could still work here.

Answered By: CB Bailey

Answer #3:

Those two solutions do very different things. The first loops in a nested way, then computes indexes with list.index, effectively making this a doubly-nested for loop and requiring what you could think of as 125,000,000 operations. The second iterates in lockstep, making 500 pairs without doing 250000 operations. No wonder they’re so different!

Are you familiar with Big O notation for describing the complexity of algorithms? If so, the first solution is cubic and the second solution is linear. The cost of choosing the first one over the second one is going to grow at an alarming rate as a and b get longer, so no one would use an algorithm like that.

Personally, I would almost certainly use code like

', '.join('%s=%s' % pair for pair in itertools.izip(a, b))

or if I wasn’t too worried about the size of a and b and just writing quick, I would use zip instead of itertools.izip. This code has several advantages

  • It’s linear. Although premature optimization is a huge problem, it’s best not to cavalierly use an algorithm with an unnecessarily bad asymptotic performance.

  • It’s simple and idiomatic. I see other people write code like this frequently.

  • It’s memory efficient. By using a generator expression instead of a list comprehension (and itertools.izip rather than zip), I don’t build unnecessary lists in memory and turn what could be an O(n) (linear)-memory operation into an O(1) (constant)-memory operation.

As for timing to find the fastest solution, this would almost certainly be an example of premature optimization. To write performant programs, we use theory and experience to write high-quality, maintainable, good code. Experience shows it’s at best futile and at worst counterproductive to stop at random operations and ask the question, “What is the best way to do this particular operation,” and trying to determine it from guessing or even testing.

In reality, the programs with the best performance are the ones that are written with code of the highest quality and very selective optimizations. High-quality code that values readability and simplicity over microbenchmarks ends up being easier to test, less buggy, and nicer to refactor–these factors are key for effectively optimizing your program. The time you spend fixing unnecessary bugs, understanding complicated code, and fighting with re factoring can be spent optimizing instead.

When it comes time to optimize a program — after it’s tested and probably documented — this is not done on random snippets, but on ones determined by actual usecases and/or performance tests, with measurements collected by profiling. If a particular piece of code is only taking 0.1% of the time in the program, no amount of speeding up that piece is going to do any real good.

Answered By: Mike Graham

Answer #4:

>>> ', '.join(i + '=' + j for i,j in zip(a,b))
'a1=b1, a2=b2, a3=b3'
Answered By: JBernardo

Leave a Reply

Your email address will not be published. Required fields are marked *