JavaScript parser in Python [closed]

Posted on

Question :

JavaScript parser in Python [closed]

There is a JavaScript parser at least in C and Java (Mozilla), in JavaScript (Mozilla again) and Ruby. Is there any currently out there for Python?

I don’t need a JavaScript interpreter, per se, just a parser that’s up to ECMA-262 standards.

A quick google search revealed no immediate answers, so I’m asking the SO community.

Asked By: Claudiu

||

Answer #1:

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

The ANTLR site provides many grammars , including one for JavaScript.

As it happens, there is a Python API available – so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).

Answered By: gimel

Answer #2:

Nowadays, there is at least one better tool, called slimit:

SlimIt is a JavaScript minifier written in Python. It compiles
JavaScript into more compact code so that it downloads and runs
faster.

SlimIt also provides a library that includes a JavaScript parser,
lexer, pretty printer and a tree visitor.

Demo:

Imagine we have the following javascript code:

$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});

And now we need to get email, phone and name values from the data object.

The idea here would be to instantiate a slimit parser, visit all nodes, filter all assignments and put them into the dictionary:

from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


data = """
$.ajax({
    type: "POST",
    url: 'http://www.example.com',
    data: {
        email: 'abc@g.com',
        phone: '9999999999',
        name: 'XYZ'
    }
});
"""

parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
          for node in nodevisitor.visit(tree)
          if isinstance(node, ast.Assign)}

print fields

It prints:

{'name': "'XYZ'", 
 'url': "'http://www.example.com'", 
 'type': '"POST"', 
 'phone': "'9999999999'", 
 'data': '', 
 'email': "'abc@g.com'"}
Answered By: alecxe

Answer #3:

I have translated esprima.js to Python:

https://github.com/PiotrDabkowski/pyjsparser

It’s a manual translation so its very fast, takes about 1 second to parse angular.js file (so 100k characters per second). It supports whole ECMAScript 5.1 and parts of version 6 – for example Arrow functions, const, let.

Alternatively you can use automated translation of newer version of esprima to python which works great and supports whole JavaScript 6!

Answered By: Piotr Dabkowski

Answer #4:

As pib mentioned, pynarcissus is a Javascript tokenizer written in Python. It seems to have some rough edges but so far has been working well for what I want to accomplish.

Updated: Took another crack at pynarcissus and below is a working direction for using PyNarcissus in a visitor pattern like system. Unfortunately my current client bought the next iteration of my experiments and have decided not to make it public source. A cleaner version of the code below is on gist here

from pynarcissus import jsparser
from collections import defaultdict

class Visitor(object):

    CHILD_ATTRS = ['thenPart', 'elsePart', 'expression', 'body', 'initializer']

def __init__(self, filepath):
    self.filepath = filepath
    #List of functions by line # and set of names
    self.functions = defaultdict(set)
    with open(filepath) as myFile:
        self.source = myFile.read()

    self.root = jsparser.parse(self.source, self.filepath)
    self.visit(self.root)


def look4Childen(self, node):
    for attr in self.CHILD_ATTRS:
        child = getattr(node, attr, None)
        if child:
            self.visit(child)

def visit_NOOP(self, node):
    pass

def visit_FUNCTION(self, node):
    # Named functions
    if node.type == "FUNCTION" and getattr(node, "name", None):
        print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.end]


def visit_IDENTIFIER(self, node):
    # Anonymous functions declared with var name = function() {};
    try:
        if node.type == "IDENTIFIER" and hasattr(node, "initializer") and node.initializer.type == "FUNCTION":
            print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.initializer.end]
    except Exception as e:
        pass

def visit_PROPERTY_INIT(self, node):

    # Anonymous functions declared as a property of an object
    try:
        if node.type == "PROPERTY_INIT" and node[1].type == "FUNCTION":
            print str(node.lineno) + " | function " + node[0].value + " | " + self.source[node.start:node[1].end]
    except Exception as e:
        pass


def visit(self, root):

    call = lambda n: getattr(self, "visit_%s" % n.type, self.visit_NOOP)(n)
    call(root)
    self.look4Childen(root)
    for node in root:
        self.visit(node)

filepath = r"C:UsersdwardDropboxjuggernaut2juggernautparsertestdatajasmine.js"
outerspace = Visitor(filepath)
Answered By: David

Answer #5:

You can try python-spidermonkey
It is a wrapper over spidermonkey which is codename for Mozilla’s C implementation of javascript.

Answered By: swamy

Leave a Reply

Your email address will not be published.