# String count with overlapping occurrences

Posted on

Solving problem is about exposing yourself to as many situations as possible like String count with overlapping occurrences and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about String count with overlapping occurrences, which can be followed any time. Take easy to follow this discuss.

String count with overlapping occurrences

What’s the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:

``````def function(string, str_to_search_for):
count = 0
for x in xrange(len(string) - len(str_to_search_for) + 1):
if string[x:x+len(str_to_search_for)] == str_to_search_for:
count += 1
return count
function('1011101111','11')
``````

This method returns 5.

Is there a better way in Python?

Well, this might be faster since it does the comparing in C:

``````def occurrences(string, sub):
count = start = 0
while True:
start = string.find(sub, start) + 1
if start > 0:
count+=1
else:
return count
``````

``````>>> import re
>>> text = '1011101111'
>>> len(re.findall('(?=11)', text))
5
``````

If you didn’t want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:

``````>>> sum(1 for _ in re.finditer('(?=11)', text))
5
``````

As a function (`re.escape` makes sure the substring doesn’t interfere with the regex):

``````>>> def occurrences(text, sub):
return len(re.findall('(?={0})'.format(re.escape(sub)), text))
>>> occurrences(text, '11')
5
``````

You can also try using the new Python regex module, which supports overlapping matches.

``````import regex as re
def count_overlapping(text, search_for):
return len(re.findall(search_for, text, overlapped=True))
count_overlapping('1011101111','11')  # 5
``````

Python’s `str.count` counts non-overlapping substrings:

``````In [3]: "ababa".count("aba")
Out[3]: 1
``````

Here are a few ways to count overlapping sequences, I’m sure there are many more ðŸ™‚

How to find overlapping matches with a regexp?

``````In [10]: re.findall("a(?=ba)", "ababa")
Out[10]: ['a', 'a']
``````

### Generate all substrings

``````In [11]: data = "ababa"
In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i))
Out[17]: 2
``````

``````s = "bobobob"
sub = "bob"
ln = len(sub)
print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))
``````

# How to find a pattern in another string with overlapping

This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.

``````def occurrences(pattern, text):
"""
input: search a pattern (regular expression) in a text
returns: a list of substrings and their positions
"""
p = re.compile('(?=({0}))'.format(pattern))
matches = re.finditer(p, text)
return [(match.group(1), match.start()) for match in matches]
print (occurrences('ana', 'banana'))
print (occurrences('.ana', 'Banana-fana fo-fana'))
``````

[(‘ana’, 1), (‘ana’, 3)]
[(‘Bana’, 0), (‘nana’, 2), (‘fana’, 7), (‘fana’, 15)]

``````def count_substring(string, sub_string):
count = 0
for pos in range(len(string)):
if string[pos:].startswith(sub_string):
count += 1
return count
``````

This could be the easiest way.

``````s = 'azcbobobegghaklbob'