Question :
I’ve got some Python code that runs through a list of strings and converts them to integers or floating point numbers if possible. Doing this for integers is pretty easy
if element.isdigit():
newelement = int(element)
Floating point numbers are more difficult. Right now I’m using partition('.')
to split the string and checking to make sure that one or both sides are digits.
partition = element.partition('.')
if (partition[0].isdigit() and partition[1] == '.' and partition[2].isdigit())
or (partition[0] == '' and partition[1] == '.' and partition[2].isdigit())
or (partition[0].isdigit() and partition[1] == '.' and partition[2] == ''):
newelement = float(element)
This works, but obviously the if statement for that is a bit of a bear. The other solution I considered is to just wrap the conversion in a try/catch block and see if it succeeds, as described in this question.
Anyone have any other ideas? Opinions on the relative merits of the partition and try/catch approaches?
Answer #1:
I would just use..
try:
float(element)
except ValueError:
print "Not a float"
..it’s simple, and it works
Another option would be a regular expression:
import re
if re.match(r'^-?d+(?:.d+)$', element) is None:
print "Not float"
Answer #2:
Python method to check for float:
def isfloat(value):
try:
float(value)
return True
except ValueError:
return False
Don’t get bit by the goblins hiding in the float boat! DO UNIT TESTING!
What is, and is not a float may surprise you:
Command to parse Is it a float? Comment
-------------------------------------- --------------- ------------
print(isfloat("")) False
print(isfloat("1234567")) True
print(isfloat("NaN")) True nan is also float
print(isfloat("NaNananana BATMAN")) False
print(isfloat("123.456")) True
print(isfloat("123.E4")) True
print(isfloat(".1")) True
print(isfloat("1,234")) False
print(isfloat("NULL")) False case insensitive
print(isfloat(",1")) False
print(isfloat("123.EE4")) False
print(isfloat("6.523537535629999e-07")) True
print(isfloat("6e777777")) True This is same as Inf
print(isfloat("-iNF")) True
print(isfloat("1.797693e+308")) True
print(isfloat("infinity")) True
print(isfloat("infinity and BEYOND")) False
print(isfloat("12.34.56")) False Two dots not allowed.
print(isfloat("#56")) False
print(isfloat("56%")) False
print(isfloat("0E0")) True
print(isfloat("x86E0")) False
print(isfloat("86-5")) False
print(isfloat("True")) False Boolean is not a float.
print(isfloat(True)) True Boolean is a float
print(isfloat("+1e1^5")) False
print(isfloat("+1e1")) True
print(isfloat("+1e1.3")) False
print(isfloat("+1.3P1")) False
print(isfloat("-+1")) False
print(isfloat("(1)")) False brackets not interpreted
Answer #3:
'1.43'.replace('.','',1).isdigit()
which will return true
only if there is one or no ‘.’ in the string of digits.
'1.4.3'.replace('.','',1).isdigit()
will return false
'1.ww'.replace('.','',1).isdigit()
will return false
Answer #4:
TL;DR:
- If your input is mostly strings that can be converted to floats, the
try: except:
method is the best native Python method. - If your input is mostly strings that cannot be converted to floats, regular expressions or the partition method will be better.
- If you are 1) unsure of your input or need more speed and 2) don’t mind and can install a third-party C-extension, fastnumbers works very well.
There is another method available via a third-party module called fastnumbers (disclosure, I am the author); it provides a function called isfloat. I have taken the unittest example outlined by Jacob Gabrielson in this answer, but added the fastnumbers.isfloat
method. I should also note that Jacob’s example did not do justice to the regex option because most of the time in that example was spent in global lookups because of the dot operator… I have modified that function to give a fairer comparison to try: except:
.
def is_float_try(str):
try:
float(str)
return True
except ValueError:
return False
import re
_float_regexp = re.compile(r"^[-+]?(?:b[0-9]+(?:.[0-9]*)?|.[0-9]+b)(?:[eE][-+]?[0-9]+b)?$").match
def is_float_re(str):
return True if _float_regexp(str) else False
def is_float_partition(element):
partition=element.partition('.')
if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and partition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
return True
else:
return False
from fastnumbers import isfloat
if __name__ == '__main__':
import unittest
import timeit
class ConvertTests(unittest.TestCase):
def test_re_perf(self):
print
print 're sad:', timeit.Timer('ttest.is_float_re("12.2x")', "import ttest").timeit()
print 're happy:', timeit.Timer('ttest.is_float_re("12.2")', "import ttest").timeit()
def test_try_perf(self):
print
print 'try sad:', timeit.Timer('ttest.is_float_try("12.2x")', "import ttest").timeit()
print 'try happy:', timeit.Timer('ttest.is_float_try("12.2")', "import ttest").timeit()
def test_fn_perf(self):
print
print 'fn sad:', timeit.Timer('ttest.isfloat("12.2x")', "import ttest").timeit()
print 'fn happy:', timeit.Timer('ttest.isfloat("12.2")', "import ttest").timeit()
def test_part_perf(self):
print
print 'part sad:', timeit.Timer('ttest.is_float_partition("12.2x")', "import ttest").timeit()
print 'part happy:', timeit.Timer('ttest.is_float_partition("12.2")', "import ttest").timeit()
unittest.main()
On my machine, the output is:
fn sad: 0.220988988876
fn happy: 0.212214946747
.
part sad: 1.2219619751
part happy: 0.754667043686
.
re sad: 1.50515985489
re happy: 1.01107215881
.
try sad: 2.40243887901
try happy: 0.425730228424
.
----------------------------------------------------------------------
Ran 4 tests in 7.761s
OK
As you can see, regex is actually not as bad as it originally seemed, and if you have a real need for speed, the fastnumbers
method is quite good.
Answer #5:
If you cared about performance (and I’m not suggesting you should), the try-based approach is the clear winner (compared with your partition-based approach or the regexp approach), as long as you don’t expect a lot of invalid strings, in which case it’s potentially slower (presumably due to the cost of exception handling).
Again, I’m not suggesting you care about performance, just giving you the data in case you’re doing this 10 billion times a second, or something. Also, the partition-based code doesn’t handle at least one valid string.
$ ./floatstr.py F.. partition sad: 3.1102449894 partition happy: 2.09208488464 .. re sad: 7.76906108856 re happy: 7.09421992302 .. try sad: 12.1525540352 try happy: 1.44165301323 . ====================================================================== FAIL: test_partition (__main__.ConvertTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "./floatstr.py", line 48, in test_partition self.failUnless(is_float_partition("20e2")) AssertionError ---------------------------------------------------------------------- Ran 8 tests in 33.670s FAILED (failures=1)
Here’s the code (Python 2.6, regexp taken from John Gietzen’s answer):
def is_float_try(str):
try:
float(str)
return True
except ValueError:
return False
import re
_float_regexp = re.compile(r"^[-+]?(?:b[0-9]+(?:.[0-9]*)?|.[0-9]+b)(?:[eE][-+]?[0-9]+b)?$")
def is_float_re(str):
return re.match(_float_regexp, str)
def is_float_partition(element):
partition=element.partition('.')
if (partition[0].isdigit() and partition[1]=='.' and partition[2].isdigit()) or (partition[0]=='' and partition[1]=='.' and pa
rtition[2].isdigit()) or (partition[0].isdigit() and partition[1]=='.' and partition[2]==''):
return True
if __name__ == '__main__':
import unittest
import timeit
class ConvertTests(unittest.TestCase):
def test_re(self):
self.failUnless(is_float_re("20e2"))
def test_try(self):
self.failUnless(is_float_try("20e2"))
def test_re_perf(self):
print
print 're sad:', timeit.Timer('floatstr.is_float_re("12.2x")', "import floatstr").timeit()
print 're happy:', timeit.Timer('floatstr.is_float_re("12.2")', "import floatstr").timeit()
def test_try_perf(self):
print
print 'try sad:', timeit.Timer('floatstr.is_float_try("12.2x")', "import floatstr").timeit()
print 'try happy:', timeit.Timer('floatstr.is_float_try("12.2")', "import floatstr").timeit()
def test_partition_perf(self):
print
print 'partition sad:', timeit.Timer('floatstr.is_float_partition("12.2x")', "import floatstr").timeit()
print 'partition happy:', timeit.Timer('floatstr.is_float_partition("12.2")', "import floatstr").timeit()
def test_partition(self):
self.failUnless(is_float_partition("20e2"))
def test_partition2(self):
self.failUnless(is_float_partition(".2"))
def test_partition3(self):
self.failIf(is_float_partition("1234x.2"))
unittest.main()
Answer #6:
Just for variety here is another method to do it.
>>> all([i.isnumeric() for i in '1.2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2'.split('.',1)])
True
>>> all([i.isnumeric() for i in '2.f'.split('.',1)])
False
Edit: Im sure it will not hold up to all cases of float though especially when there is an exponent. To solve that it looks like this. This will return True only val is a float and False for int but is probably less performant than regex.
>>> def isfloat(val):
... return all([ [any([i.isnumeric(), i in ['.','e']]) for i in val], len(val.split('.')) == 2] )
...
>>> isfloat('1')
False
>>> isfloat('1.2')
True
>>> isfloat('1.2e3')
True
>>> isfloat('12e3')
False
Answer #7:
Simplified version of the function is_digit(str)
, which suffices in most cases (doesn’t consider exponential notation and “NaN” value):
def is_digit(str):
return str.lstrip('-').replace('.', '').isdigit()
Answer #8:
This regex will check for scientific floating point numbers:
^[-+]?(?:b[0-9]+(?:.[0-9]*)?|.[0-9]+b)(?:[eE][-+]?[0-9]+b)?$
However, I believe that your best bet is to use the parser in a try.