# Evaluating a mathematical expression in a string

Posted on

Solving problem is about exposing yourself to as many situations as possible like Evaluating a mathematical expression in a string and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Evaluating a mathematical expression in a string, which can be followed any time. Take easy to follow this discuss.

Evaluating a mathematical expression in a string
``````stringExp = "2^4"
intVal = int(stringExp)      # Expected value: 16
``````

This returns the following error:

``````Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: '2^4'
``````

I know that `eval` can work around this, but isn’t there a better and – more importantly – safer method to evaluate a mathematical expression that is being stored in a string?

Pyparsing can be used to parse mathematical expressions. In particular, fourFn.py
shows how to parse basic arithmetic expressions. Below, I’ve rewrapped fourFn into a numeric parser class for easier reuse.

``````from __future__ import division
from pyparsing import (Literal, CaselessLiteral, Word, Combine, Group, Optional,
ZeroOrMore, Forward, nums, alphas, oneOf)
import math
import operator
__author__ = 'Paul McGuire'
__version__ = '\$Revision: 0.0 \$'
__date__ = '\$Date: 2009-03-20 \$'
__source__ = '''http://pyparsing.wikispaces.com/file/view/fourFn.py
http://pyparsing.wikispaces.com/message/view/home/15549426
'''
__note__ = '''
All I've done is rewrap Paul McGuire's fourFn.py as a class, so I can use it
more easily in other places.
'''
class NumericStringParser(object):
'''
Most of this code comes from the fourFn.py pyparsing example
'''
def pushFirst(self, strg, loc, toks):
self.exprStack.append(toks[0])
def pushUMinus(self, strg, loc, toks):
if toks and toks[0] == '-':
self.exprStack.append('unary -')
def __init__(self):
"""
expop   :: '^'
multop  :: '*' | '/'
integer :: ['+' | '-'] '0'..'9'+
atom    :: PI | E | real | fn '(' expr ')' | '(' expr ')'
factor  :: atom [ expop factor ]*
term    :: factor [ multop factor ]*
expr    :: term [ addop term ]*
"""
point = Literal(".")
e = CaselessLiteral("E")
fnumber = Combine(Word("+-" + nums, nums) +
Optional(point + Optional(Word(nums))) +
Optional(e + Word("+-" + nums, nums)))
ident = Word(alphas, alphas + nums + "_\$")
plus = Literal("+")
minus = Literal("-")
mult = Literal("*")
div = Literal("/")
lpar = Literal("(").suppress()
rpar = Literal(")").suppress()
multop = mult | div
expop = Literal("^")
pi = CaselessLiteral("PI")
expr = Forward()
atom = ((Optional(oneOf("- +")) +
(ident + lpar + expr + rpar | pi | e | fnumber).setParseAction(self.pushFirst))
| Optional(oneOf("- +")) + Group(lpar + expr + rpar)
).setParseAction(self.pushUMinus)
# by defining exponentiation as "atom [ ^ factor ]..." instead of
# "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-right
# that is, 2^3^2 = 2^(3^2), not (2^3)^2.
factor = Forward()
factor << atom +
ZeroOrMore((expop + factor).setParseAction(self.pushFirst))
term = factor +
ZeroOrMore((multop + factor).setParseAction(self.pushFirst))
expr << term +
# expr <<  general_term
self.bnf = expr
# map operator symbols to corresponding arithmetic operations
epsilon = 1e-12
"-": operator.sub,
"*": operator.mul,
"/": operator.truediv,
"^": operator.pow}
self.fn = {"sin": math.sin,
"cos": math.cos,
"tan": math.tan,
"exp": math.exp,
"abs": abs,
"trunc": lambda a: int(a),
"round": round,
"sgn": lambda a: abs(a) > epsilon and cmp(a, 0) or 0}
def evaluateStack(self, s):
op = s.pop()
if op == 'unary -':
return -self.evaluateStack(s)
if op in "+-*/^":
op2 = self.evaluateStack(s)
op1 = self.evaluateStack(s)
return self.opn[op](op1, op2)
elif op == "PI":
return math.pi  # 3.1415926535
elif op == "E":
return math.e  # 2.718281828
elif op in self.fn:
return self.fn[op](self.evaluateStack(s))
elif op[0].isalpha():
return 0
else:
return float(op)
def eval(self, num_string, parseAll=True):
self.exprStack = []
results = self.bnf.parseString(num_string, parseAll)
val = self.evaluateStack(self.exprStack[:])
return val
``````

You can use it like this

``````nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result)
# 16.0
result = nsp.eval('exp(2^4)')
print(result)
# 8886110.520507872
``````

## `eval` is evil

``````eval("__import__('os').remove('important file')") # arbitrary commands
eval("9**9**9**9**9**9**9**9", {'__builtins__': None}) # CPU, memory
``````

Note: even if you use set `__builtins__` to `None` it still might be possible to break out using introspection:

``````eval('(1).__class__.__bases__[0].__subclasses__()', {'__builtins__': None})
``````

## Evaluate arithmetic expression using `ast`

``````import ast
import operator as op
# supported operators
ast.Div: op.truediv, ast.Pow: op.pow, ast.BitXor: op.xor,
ast.USub: op.neg}
def eval_expr(expr):
"""
>>> eval_expr('2^6')
4
>>> eval_expr('2**6')
64
>>> eval_expr('1 + 2*3**(4^5) / (6 + -7)')
-5.0
"""
return eval_(ast.parse(expr, mode='eval').body)
def eval_(node):
if isinstance(node, ast.Num): # <number>
return node.n
elif isinstance(node, ast.BinOp): # <left> <operator> <right>
return operators[type(node.op)](eval_(node.left), eval_(node.right))
elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
return operators[type(node.op)](eval_(node.operand))
else:
raise TypeError(node)
``````

You can easily limit allowed range for each operation or any intermediate result, e.g., to limit input arguments for `a**b`:

``````def power(a, b):
if any(abs(n) > 100 for n in [a, b]):
raise ValueError((a,b))
return op.pow(a, b)
operators[ast.Pow] = power
``````

Or to limit magnitude of intermediate results:

``````import functools
def limit(max_=None):
"""Return decorator that limits allowed returned values."""
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
ret = func(*args, **kwargs)
try:
mag = abs(ret)
except TypeError:
pass # not applicable
else:
if mag > max_:
raise ValueError(ret)
return ret
return wrapper
return decorator
eval_ = limit(max_=10**100)(eval_)
``````

### Example

``````>>> evil = "__import__('os').remove('important file')"
>>> eval_expr(evil) #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
TypeError:
>>> eval_expr("9**9")
387420489
>>> eval_expr("9**9**9**9**9**9**9**9") #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:
``````

Some safer alternatives to `eval()` and `sympy.sympify().evalf()`*:

*SymPy `sympify` is also unsafe according to the following warning from the documentation.

Warning: Note that this function uses `eval`, and thus shouldn’t be used on unsanitized input.

Okay, so the problem with eval is that it can escape its sandbox too easily, even if you get rid of `__builtins__`. All the methods for escaping the sandbox come down to using `getattr` or `object.__getattribute__` (via the `.` operator) to obtain a reference to some dangerous object via some allowed object (`''.__class__.__bases__[0].__subclasses__` or similar). `getattr` is eliminated by setting `__builtins__` to `None`. `object.__getattribute__` is the difficult one, since it cannot simply be removed, both because `object` is immutable and because removing it would break everything. However, `__getattribute__` is only accessible via the `.` operator, so purging that from your input is sufficient to ensure eval cannot escape its sandbox.
In processing formulas, the only valid use of a decimal is when it is preceded or followed by `[0-9]`, so we just remove all other instances of `.`.

``````import re
inp = re.sub(r".(?![0-9])","", inp)
val = eval(inp, {'__builtins__':None})
``````

Note that while python normally treats `1 + 1.` as `1 + 1.0`, this will remove the trailing `.` and leave you with `1 + 1`. You could add `)`,, and `EOF` to the list of things allowed to follow `.`, but why bother?

You can use the ast module and write a NodeVisitor that verifies that the type of each node is part of a whitelist.

``````import ast, math
locals =  {key: value for (key,value) in vars(math).items() if key[0] != '_'}
locals.update({"abs": abs, "complex": complex, "min": min, "max": max, "pow": pow, "round": round})
class Visitor(ast.NodeVisitor):
def visit(self, node):
if not isinstance(node, self.whitelist):
raise ValueError(node)
return super().visit(node)
ast.Mult, ast.Div, ast.Pow, ast.BitOr, ast.BitAnd, ast.BitXor, ast.USub, ast.UAdd, ast.FloorDiv, ast.Mod,
ast.LShift, ast.RShift, ast.Invert, ast.Call, ast.Name)
def evaluate(expr, locals = {}):
if any(elem in expr for elem in 'n#') : raise ValueError(expr)
try:
node = ast.parse(expr.strip(), mode='eval')
Visitor().visit(node)
return eval(compile(node, "<string>", "eval"), {'__builtins__': None}, locals)
except Exception: raise ValueError(expr)
``````

Because it works via a whitelist rather than a blacklist, it is safe. The only functions and variables it can access are those you explicitly give it access to. I populated a dict with math-related functions so you can easily provide access to those if you want, but you have to explicitly use it.

If the string attempts to call functions that haven’t been provided, or invoke any methods, an exception will be raised, and it will not be executed.

Because this uses Python’s built in parser and evaluator, it also inherits Python’s precedence and promotion rules as well.

``````>>> evaluate("7 + 9 * (2 << 2)")
79
>>> evaluate("6 // 2 + 0.0")
3.0
``````

The above code has only been tested on Python 3.

If desired, you can add a timeout decorator on this function.

The reason `eval` and `exec` are so dangerous is that the default `compile` function will generate bytecode for any valid python expression, and the default `eval` or `exec` will execute any valid python bytecode. All the answers to date have focused on restricting the bytecode that can be generated (by sanitizing input) or building your own domain-specific-language using the AST.

Instead, you can easily create a simple `eval` function that is incapable of doing anything nefarious and can easily have runtime checks on memory or time used. Of course, if it is simple math, than there is a shortcut.

``````c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]
``````

The way this works is simple, any constant mathematic expression is safely evaluated during compilation and stored as a constant. The code object returned by compile consists of `d`, which is the bytecode for `LOAD_CONST`, followed by the number of the constant to load (usually the last one in the list), followed by `S`, which is the bytecode for `RETURN_VALUE`. If this shortcut doesn’t work, it means that the user input isn’t a constant expression (contains a variable or function call or similar).

This also opens the door to some more sophisticated input formats. For example:

``````stringExp = "1 + cos(2)"
``````

This requires actually evaluating the bytecode, which is still quite simple. Python bytecode is a stack oriented language, so everything is a simple matter of `TOS=stack.pop(); op(TOS); stack.put(TOS)` or similar. The key is to only implement the opcodes that are safe (loading/storing values, math operations, returning values) and not unsafe ones (attribute lookup). If you want the user to be able to call functions (the whole reason not to use the shortcut above), simple make your implementation of `CALL_FUNCTION` only allow functions in a ‘safe’ list.

``````from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator
globs = {'sin':sin, 'cos':cos}
safe = globs.values()
stack = LifoQueue()
class BINARY(object):
def __init__(self, operator):
self.op=operator
def __call__(self, context):
stack.put(self.op(stack.get(),stack.get()))
class UNARY(object):
def __init__(self, operator):
self.op=operator
def __call__(self, context):
stack.put(self.op(stack.get()))
def CALL_FUNCTION(context, arg):
argc = arg[0]+arg[1]*256
args = [stack.get() for i in range(argc)]
func = stack.get()
if func not in safe:
raise TypeError("Function %r now allowed"%func)
stack.put(func(*args))
cons = arg[0]+arg[1]*256
stack.put(context['code'].co_consts[cons])
name_num = arg[0]+arg[1]*256
name = context['code'].co_names[name_num]
if name in context['locals']:
stack.put(context['locals'][name])
else:
stack.put(context['globals'][name])
def RETURN_VALUE(context):
return stack.get()
opfuncs = {
opmap['UNARY_INVERT']: UNARY(operator.invert),
opmap['CALL_FUNCTION']: CALL_FUNCTION,
opmap['RETURN_VALUE']: RETURN_VALUE,
}
def VMeval(c):
context = dict(locals={}, globals=globs, code=c)
bci = iter(c.co_code)
for bytecode in bci:
func = opfuncs[ord(bytecode)]
if func.func_code.co_argcount==1:
ret = func(context)
else:
args = ord(bci.next()), ord(bci.next())
ret = func(context, args)
if ret:
return ret
def evaluate(expr):
return VMeval(compile(expr, 'userinput', 'eval'))
``````

Obviously, the real version of this would be a bit longer (there are 119 opcodes, 24 of which are math related). Adding `STORE_FAST` and a couple others would allow for input like `'x=5;return x+x` or similar, trivially easily. It can even be used to execute user-created functions, so long as the user created functions are themselves executed via VMeval (don’t make them callable!!! or they could get used as a callback somewhere). Handling loops requires support for the `goto` bytecodes, which means changing from a `for` iterator to `while` and maintaining a pointer to the current instruction, but isn’t too hard. For resistance to DOS, the main loop should check how much time has passed since the start of the calculation, and certain operators should deny input over some reasonable limit (`BINARY_POWER` being the most obvious).

While this approach is somewhat longer than a simple grammar parser for simple expressions (see above about just grabbing the compiled constant), it extends easily to more complicated input, and doesn’t require dealing with grammar (`compile` take anything arbitrarily complicated and reduces it to a sequence of simple instructions).

I think I would use `eval()`, but would first check to make sure the string is a valid mathematical expression, as opposed to something malicious. You could use a regex for the validation.

`eval()` also takes additional arguments which you can use to restrict the namespace it operates in for greater security.

``````>>> import sympy
Very cool indeed! A `from sympy import *` brings in a lot more function support, such as trig functions, special functions, etc., but I’ve avoided that here to show what’s coming from where.