Regex matching 5-digit substrings not enclosed with digits

Posted on

Question :

Regex matching 5-digit substrings not enclosed with digits

I want to extract 5 continuous digits from the string

code I have written.

re.findall(r"((D|^)*)ddddd((D|$)*)", s)

but it can not pass the string

"Helpdesk-Agenten (m/w) Kennziffer: 12966"

The expected result is:

12966

Example 2:

#input
"Helpdesk-Agenten (m/w) Kennziffer: 12966abc"
# expected
12966

Example 3:

#input
"Helpdesk-Agenten (m/w) Kennziffer: 12966345"
# expected
"" (because the length of continuous digits is longer than 5)

Answer #1:

Your current regex (((D|^)*)ddddd((D|$)*)) used with re.findall won’t return the digit chunks because they are not captured. More, the (D|^)* and
(D|$)* parts are optional and that means they do not do what they are supposed to do, the regex will find 5 digit chunks inside longer digits chunks.

If you must find 5 digit chunk not enclosed with other digits, use

re.findall(r"(?<!d)d{5}(?!d)", s)

See the regex demo

Details:

  • (?<!d) – no digit is allowed before the current location
  • d{5} – 5 digits
  • (?!d) – no digit allowed after the current location.
Answered By: Wiktor Stribi?ew

Answer #2:

Using word boundary (b), which match at beginning / end of the word:

>>> re.findall(r"bdddddb", "Helpdesk-Agenten (m/w) Kennziffer: 12966")
['12966']

ddddd can be replaced with d{5}:

>>> re.findall(r"bd{5}b", "Helpdesk-Agenten (m/w) Kennziffer: 12966")
['12966']

UPDATE If you need to get 12966 out of 12966abc, see Wiktor Stribi?ew’s answer which use negative lookaround assertions.

or

>>> [match.group(2) for match in re.finditer(r'(D|^)(d{5})(D|$)', '12345abc')]
['12345']

or combining simple regular expression with list comprehension:

>>> [match for match in re.findall(r'd+', '12345abc') if len(match) == 5]
['12345']
Answered By: falsetru

Leave a Reply

Your email address will not be published. Required fields are marked *