Finding multiple occurrences of a string within a string in Python

Posted on

Question :

Finding multiple occurrences of a string within a string in Python

How do I find multiple occurrences of a string within a string in Python? Consider this:

>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>> 

So the first occurrence of ll is at 1 as expected. How do I find the next occurrence of it?

Same question is valid for a list. Consider:

>>> x = ['ll', 'ok', 'll']

How do I find all the ll with their indexes?

Answer #1:

Using regular expressions, you can use re.finditer to find all (non-overlapping) occurences:

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
         print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18

Alternatively, if you don’t want the overhead of regular expressions, you can also repeatedly use str.find to get the next index:

>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
        index = text.find('ll', index)
        if index == -1:
            break
        print('ll found at', index)
        index += 2 # +2 because len('ll') == 2

ll found at  1
ll found at  10
ll found at  16

This also works for lists and other sequences.

Answered By: poke

Answer #2:

I think what you are looking for is string.count

"Allowed Hello Hollow".count('ll')
>>> 3

Hope this helps
NOTE: this only captures non-overlapping occurences

Answered By: inspectorG4dget

Answer #3:

For the list example, use a comprehension:

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]

Similarly for strings:

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]

this will list adjacent runs of “ll’, which may or may not be what you want:

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]
Answered By: bstpierre

Answer #4:

FWIW, here are a couple of non-RE alternatives that I think are neater than poke’s solution.

The first uses str.index and checks for ValueError:

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass

The second tests uses str.find and checks for the sentinel of -1 by using iter:

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)

To apply any of these functions to a list, tuple or other iterable of strings, you can use a higher-level function —one that takes a function as one of its arguments— like this one:

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)
Answered By: intuited

Answer #5:

For your list example:

In [1]: x = ['ll','ok','ll']

In [2]: for idx, value in enumerate(x):
   ...:     if value == 'll':
   ...:         print idx, value       
0 ll
2 ll

If you wanted all the items in a list that contained ‘ll’, you could also do that.

In [3]: x = ['Allowed','Hello','World','Hollow']

In [4]: for idx, value in enumerate(x):
   ...:     if 'll' in value:
   ...:         print idx, value
   ...:         
   ...:         
0 Allowed
1 Hello
3 Hollow
Answered By: chauncey

Answer #6:

>>> for n,c in enumerate(text):
...   try:
...     if c+text[n+1] == "ll": print n
...   except: pass
...
1
10
16
Answered By: ghostdog74

Answer #7:

Brand new to programming in general and working through an online tutorial. I was asked to do this as well, but only using the methods I had learned so far (basically strings and loops). Not sure if this adds any value here, and I know this isn’t how you would do it, but I got it to work with this:

needle = input()
haystack = input()
counter = 0
n=-1
for i in range (n+1,len(haystack)+1):
   for j in range(n+1,len(haystack)+1):
      n=-1
      if needle != haystack[i:j]:
         n = n+1
         continue
      if needle == haystack[i:j]:
         counter = counter + 1
print (counter)
Answered By: Aaron Semeniuk

Answer #8:

This version should be linear in length of the string, and should be fine as long as the sequences aren’t too repetitive (in which case you can replace the recursion with a while loop).

def find_all(st, substr, start_pos=0, accum=[]):
    ix = st.find(substr, start_pos)
    if ix == -1:
        return accum
    return find_all(st, substr, start_pos=ix + 1, accum=accum + [ix])

bstpierre’s list comprehension is a good solution for short sequences, but looks to have quadratic complexity and never finished on a long text I was using.

findall_lc = lambda txt, substr: [n for n in xrange(len(txt))
                                   if txt.find(substr, n) == n]

For a random string of non-trivial length, the two functions give the same result:

import random, string; random.seed(0)
s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)])

>>> find_all(s, 'th') == findall_lc(s, 'th')
True
>>> findall_lc(s, 'th')[:4]
[564, 818, 1872, 2470]

But the quadratic version is about 300 times slower

%timeit find_all(s, 'th')
1000 loops, best of 3: 282 µs per loop

%timeit findall_lc(s, 'th')    
10 loops, best of 3: 92.3 ms per loop
Answered By: beardc

Leave a Reply

Your email address will not be published. Required fields are marked *