Question :
I want to extract 5 continuous digits from the string
code I have written.
re.findall(r"((D|^)*)ddddd((D|$)*)", s)
but it can not pass the string
"Helpdesk-Agenten (m/w) Kennziffer: 12966"
The expected result is:
12966
Example 2:
#input
"Helpdesk-Agenten (m/w) Kennziffer: 12966abc"
# expected
12966
Example 3:
#input
"Helpdesk-Agenten (m/w) Kennziffer: 12966345"
# expected
"" (because the length of continuous digits is longer than 5)
Answer #1:
Your current regex (((D|^)*)ddddd((D|$)*)
) used with re.findall
won’t return the digit chunks because they are not captured. More, the (D|^)*
and
(D|$)*
parts are optional and that means they do not do what they are supposed to do, the regex will find 5 digit chunks inside longer digits chunks.
If you must find 5 digit chunk not enclosed with other digits, use
re.findall(r"(?<!d)d{5}(?!d)", s)
See the regex demo
Details:
(?<!d)
– no digit is allowed before the current locationd{5}
– 5 digits(?!d)
– no digit allowed after the current location.
Answer #2:
Using word boundary (b
), which match at beginning / end of the word:
>>> re.findall(r"bdddddb", "Helpdesk-Agenten (m/w) Kennziffer: 12966")
['12966']
ddddd
can be replaced with d{5}
:
>>> re.findall(r"bd{5}b", "Helpdesk-Agenten (m/w) Kennziffer: 12966")
['12966']
UPDATE If you need to get 12966
out of 12966abc
, see Wiktor Stribi?ew’s answer which use negative lookaround assertions.
or
>>> [match.group(2) for match in re.finditer(r'(D|^)(d{5})(D|$)', '12345abc')]
['12345']
or combining simple regular expression with list comprehension:
>>> [match for match in re.findall(r'd+', '12345abc') if len(match) == 5]
['12345']