# How can I find the number of overlapping sequences in a String with Python? [duplicate]

Posted on

### Question :

How can I find the number of overlapping sequences in a String with Python? [duplicate]

I have a long sequence, and I would like to know how often some sub-sequences occur in this sequence.

I know string.count(s, sub), but it only counts non-overlapping sequences.

Does a similar function which also counts overlapping sequences exist?

As an alternative to writing your own search function, you could use the `re` module:

``````In : import re

In : haystack = 'abababa baba alibababa'

In : needle = 'baba'

In : matches = re.finditer(r'(?=(%s))' % re.escape(needle), haystack)

In : print [m.start(1) for m in matches]
[1, 3, 8, 16, 18]
``````

The above prints out the starting positions of all (potentially overlapping) matches.

If all you need is the count, the following should do the trick:

``````In : len(re.findall(r'(?=(%s))' % re.escape(needle), haystack))
Out: 5
``````

A simple to understand way to do it is:

``````def count(sub, string):
count = 0
for i in xrange(len(string)):
if string[i:].startswith(sub):
count += 1
return count

count('baba', 'abababa baba alibababa')
#output: 5
``````

If you like short snippets, you can make it less readable but smarter:

``````def count(subs, s):
return sum((s[i:].startswith(subs) for i in xrange(len(s))))
``````

This uses the fact that Python can treat boolean like integers.

``````matches =[]
st = 'abababa baba alibababa'
needle = 'baba'
for i in xrange(len(st)-len(needle)+1):
i = st.find(needle,i,i+len(needle))
if(i >= 0):
matches.append(st.find(needle,i,i+len(needle)))
print(str(matches))
``````

Did not benchmark it for long strings, see if its efficient enough for your use.

I learnt today that you could use a running index to fetch the next occurrence of your substring:

``````string = 'bobobobobobobob' # long string or variable here
count = 0
start = 0
while True:
index = string.find('bob', start)
if index >= 0:
count += 1
start += 1
else:
break
print(count)
``````

Returns `7`