How to extract the substring between two markers?

Posted on

Solving problem is about exposing yourself to as many situations as possible like How to extract the substring between two markers? and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about How to extract the substring between two markers?, which can be followed any time. Take easy to follow this discuss.

How to extract the substring between two markers?

Let’s say I have a string 'gfgfdAAA1234ZZZuijjk' and I want to extract just the '1234' part.

I only know what will be the few characters directly before AAA, and after ZZZ the part I am interested in 1234.

With sed it is possible to do something like this with a string:

echo "$STRING" | sed -e "s|.*AAA(.*)ZZZ.*|1|"

And this will give me 1234 as a result.

How to do the same thing in Python?

Asked By: miernik

||

Answer #1:

Using regular expressions – documentation for further reference

import re
text = 'gfgfdAAA1234ZZZuijjk'
m = re.search('AAA(.+?)ZZZ', text)
if m:
    found = m.group(1)
# found: 1234

or:

import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    # AAA, ZZZ not found in the original string
    found = '' # apply your error handling
# found: 1234
Answered By: eumiro

Answer #2:

>>> s = 'gfgfdAAA1234ZZZuijjk'
>>> start = s.find('AAA') + 3
>>> end = s.find('ZZZ', start)
>>> s[start:end]
'1234'

Then you can use regexps with the re module as well, if you want, but that’s not necessary in your case.

Answered By: Lennart Regebro

Answer #3:

regular expression

import re
re.search(r"(?<=AAA).*?(?=ZZZ)", your_text).group(0)

The above as-is will fail with an AttributeError if there are no “AAA” and “ZZZ” in your_text

string methods

your_text.partition("AAA")[2].partition("ZZZ")[0]

The above will return an empty string if either “AAA” or “ZZZ” don’t exist in your_text.

PS Python Challenge?

Answered By: tzot

Answer #4:

Surprised that nobody has mentioned this which is my quick version for one-off scripts:

>>> x = 'gfgfdAAA1234ZZZuijjk'
>>> x.split('AAA')[1].split('ZZZ')[0]
'1234'
Answered By: Uncle Long Hair

Answer #5:

import re
print re.search('AAA(.*?)ZZZ', 'gfgfdAAA1234ZZZuijjk').group(1)
Answered By: infrared

Answer #6:

you can do using just one line of code

>>> import re
>>> re.findall(r'd{1,5}','gfgfdAAA1234ZZZuijjk')
>>> ['1234']

result will receive list…

Answered By: Mahesh Gupta

Answer #7:

You can use re module for that:

>>> import re
>>> re.compile(".*AAA(.*)ZZZ.*").match("gfgfdAAA1234ZZZuijjk").groups()
('1234,)
Answered By: andreypopp

Answer #8:

With sed it is possible to do something like this with a string:

echo "$STRING" | sed -e "s|.*AAA(.*)ZZZ.*|1|"

And this will give me 1234 as a result.

You could do the same with re.sub function using the same regex.

>>> re.sub(r'.*AAA(.*)ZZZ.*', r'1', 'gfgfdAAA1234ZZZuijjk')
'1234'

In basic sed, capturing group are represented by (..), but in python it was represented by (..).

Answered By: Avinash Raj

Leave a Reply

Your email address will not be published.