Photo of François

François Wouts

The developer happiness engineer

I write about software engineering, developer productivity and my journey from working at Google to becoming an indie dev.

Follow me on TwitterAbout meRead other articles

Parsing JavaScript in JavaScript

August 2017

I recently started playing around with the TypeScript library. In case you’re not familiar: the TypeScript language is a superset of JavaScript that adds optional typing. It’s very similar to Flow. But that’s not important here.

What’s interesting is that the TypeScript library is a JavaScript library that includes a parser (text → AST) and printer (AST → text). It’s able to parse not only TypeScript, but also plain JavaScript as well as JSX and even Flow. What’s an AST?

Let me show you a simple example:

const ts = require("typescript");

let sourceCode = `
console.log("Hello, World!");
`;

// Parse the code.
let tsSourceFile = ts.createSourceFile(
  __filename,
  sourceCode,
  ts.ScriptTarget.Latest
);

// Print the parsed Abstract Syntax Tree (AST).
tsSourceFile.statements;

/*
Output:
[
  {
    "kind": 210,  // ExpressionStatement
    "expression": {
      "kind": 181,  // CallExpression
      "expression": {
        "kind": 179,  // PropertyAccessExpression
        "expression": {
          "text": "console"
        },
        "name": {
          "text": "log"
        }
      },
      "arguments": [
        {
          "kind": 9,  // StringLiteral
          "text": "Hello, World!"
        }
      ]
    }
  }
]
*/

Now, the interesting part is that you can change the Abstract Syntax Tree and reprint the code. For example:

const ts = require("typescript");

let sourceCode = `
console.log("Hello, World!");
`;

// Parse the code.
let tsSourceFile = ts.createSourceFile(
  __filename,
  sourceCode,
  ts.ScriptTarget.Latest
);

tsSourceFile.statements[0].expression.arguments[0].text = "Changed text";

// Print the modified source code.
ts.createPrinter().printFile(tsSourceFile);

/*
This will output:
console.log("Changed text");
*/

Or you can have fun and push it a level further, by parsing the script’s own code and asking it to rewrite itself. I’m not sure why you’d do that, but hey, why not?

import * as fs from "fs";
import * as ts from "typescript";

let sourceCode = fs.readFileSync(__filename, "utf8");
let tsSourceFile = ts.createSourceFile(
  __filename,
  sourceCode,
  ts.ScriptTarget.Latest
);
for (let statement of tsSourceFile.statements) {
  // This will be removed, and replaced with:
  // console.log("Hello, World!");
  if (ts.isForOfStatement(statement)) {
    let forOfStatement = statement;
    forOfStatement.statement = ts.createStatement(
      ts.createCall(
        ts.createPropertyAccess(ts.createIdentifier("console"), "log"),
        [],
        [ts.createLiteral("Hello, World!")]
      )
    );
  }
}

fs.writeFileSync(
  __filename,
  ts.createPrinter().printFile(tsSourceFile),
  "utf8"
);

Clone this GitHub repo if you’d like to try it.

While these examples aren’t very useful, you can do some powerful things. For example, you could extract a list of import statements from a large JS codebase and generate a graph of dependencies between JS files and NPM modules (like this). Or you could refactor your entire codebase automatically by detecting a given code pattern and replacing it with another.

That’s it for today. If you liked this post, you may also like my previous post about parsing your own language with ANTLR4.

Thanks for reading, have a nice day!

Sign up to my blog

I send out a new article every few weeks. No spam.