# re.findall behaves weird

The source string is:

``````# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'
``````

and here is my pattern:

``````pattern = r'-?[0-9]+(\.[0-9]*)?|-?\.[0-9]+'
``````

however, `re.search` can give me correct result:

``````m = re.search(pattern, s)
print(m)  # output: <_sre.SRE_Match object; span=(3, 6), match='123'>
``````

`re.findall` just dump out an empty list:

``````L = re.findall(pattern, s)
print(L)  # output: ['', '', '']
``````

why can’t `re.findall` give me the expected list:

``````['123', '3.1415926']
``````

``````s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s)
``````

You dont need to escape twice when you are using raw mode.

Output:`['123', '3.1415926']`

Also the return type will be a list of strings. If you want return type as integers and floats use `map`

``````import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+',s))
``````

Output: `[123, 3.1415926]`

There are two things to note here:

• `re.findall` returns captured texts if the regex pattern contains capturing groups in it
• the `r'\.'` part in your pattern matches two consecutive chars, and any char other than a newline.

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

Note that to make `re.findall` return just match values, you may usually

• remove redundant capturing groups (e.g. `(a(b)c)` -> `abc`)
• convert all capturing groups into non-capturing (that is, replace `(` with `(?:`) unless there are backreferences that refer to the group values in the pattern (then see below)
• use `re.finditer` instead (`[x.group() for x in re.finditer(pattern, s)]`)

In your case, `findall` returned all captured texts that were empty because you have `\` within `r''` string literal that tried to match a literal .

To match the numbers, you need to use

``````-?d*.?d+
``````

The regex matches:

• `-?` – Optional minus sign
• `d*` – Optional digits
• `.?` – Optional decimal separator
• `d+` – 1 or more digits.

See demo

Here is IDEONE demo:

``````import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?d*.?d+'
L = re.findall(pattern, s)
print(L)
``````

Just to explain why you think that `search` returned what you want and `findall` didn’t?

search return a `SRE_Match` object that hold some information like:

• `string` : attribute contains the string that was passed to search function.
• `re` : `REGEX` object used in search function.
• `groups()` : list of string captured by the capturing groups inside the `REGEX`.
• `group(index)`: to retrieve the captured string by group using `index > 0`.
• `group(0)` : return the string matched by the `REGEX`.

`search` stops when It found the first mach build the `SRE_Match` Object and returning it, check this code:

``````import re
s = r'abc123d'
pattern = r'-?[0-9]+(.[0-9]*)?|-?.[0-9]+'
m = re.search(pattern, s)
print(m.string)  # 'abc123d'
print(m.group(0))  # REGEX matched 123
print(m.groups())  # there is only one group in REGEX (.[0-9]*) will  empy string tgis why it return (None,)
s = ', hello 3.1415926, this is my book'
m2 = re.search(pattern, s)  # ', hello 3.1415926, this is my book'
print(m2.string)    # abc123d
print(m2.group(0))  # REGEX matched 3.1415926
print(m2.groups())  # the captured group has captured this part '.1415926'
``````

`findall` behave differently because it doesn’t just stop when It find the first mach it keeps extracting until the end of the text, but if the `REGEX` contains at least one capturing group the `findall` don’t return the matched string but the captured string by the capturing groups:

``````import re
s = r'abc123d , hello 3.1415926, this is my book'
pattern = r'-?[0-9]+(.[0-9]*)?|-?.[0-9]+'
m = re.findall(pattern, s)
print(m)  # ['', '.1415926']
``````

the first `element` is return when the first mach was found witch is `'123'` the capturing group captured only `''`, but the second `element` was captured in the second match `'3.1415926'` the capturing group matched this part `'.1415926'`.

If you want to make the `findall` return matched string you should make all capturing groups `()` in your `REGEX` a non capturing groups`(?:)`:

``````import re
s = r'abc123d , hello 3.1415926, this is my book'
pattern = r'-?[0-9]+(?:.[0-9]*)?|-?.[0-9]+'
m = re.findall(pattern, s)
print(m)  # ['123', '3.1415926']
``````
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .