Parsing JavaScript in JavaScript
I recently started playing around with the TypeScript library. In case you’re not familiar: the TypeScript language is a superset of JavaScript that adds optional typing. It’s very similar to Flow. But that’s not important here.
What’s interesting is that the TypeScript library is a JavaScript library that includes a parser (text → AST) and printer (AST → text). It’s able to parse not only TypeScript, but also plain JavaScript as well as JSX and even Flow. What’s an AST?
Let me show you a simple example:
const ts = require("typescript");
let sourceCode = `
console.log("Hello, World!");
`;
// Parse the code.
let tsSourceFile = ts.createSourceFile(
__filename,
sourceCode,
ts.ScriptTarget.Latest
);
// Print the parsed Abstract Syntax Tree (AST).
tsSourceFile.statements;
/*
Output:
[
{
"kind": 210, // ExpressionStatement
"expression": {
"kind": 181, // CallExpression
"expression": {
"kind": 179, // PropertyAccessExpression
"expression": {
"text": "console"
},
"name": {
"text": "log"
}
},
"arguments": [
{
"kind": 9, // StringLiteral
"text": "Hello, World!"
}
]
}
}
]
*/
Now, the interesting part is that you can change the Abstract Syntax Tree and reprint the code. For example:
const ts = require("typescript");
let sourceCode = `
console.log("Hello, World!");
`;
// Parse the code.
let tsSourceFile = ts.createSourceFile(
__filename,
sourceCode,
ts.ScriptTarget.Latest
);
tsSourceFile.statements[0].expression.arguments[0].text = "Changed text";
// Print the modified source code.
ts.createPrinter().printFile(tsSourceFile);
/*
This will output:
console.log("Changed text");
*/
Or you can have fun and push it a level further, by parsing the script’s own code and asking it to rewrite itself. I’m not sure why you’d do that, but hey, why not?
import * as fs from "fs";
import * as ts from "typescript";
let sourceCode = fs.readFileSync(__filename, "utf8");
let tsSourceFile = ts.createSourceFile(
__filename,
sourceCode,
ts.ScriptTarget.Latest
);
for (let statement of tsSourceFile.statements) {
// This will be removed, and replaced with:
// console.log("Hello, World!");
if (ts.isForOfStatement(statement)) {
let forOfStatement = statement;
forOfStatement.statement = ts.createStatement(
ts.createCall(
ts.createPropertyAccess(ts.createIdentifier("console"), "log"),
[],
[ts.createLiteral("Hello, World!")]
)
);
}
}
fs.writeFileSync(
__filename,
ts.createPrinter().printFile(tsSourceFile),
"utf8"
);
Clone this GitHub repo if you’d like to try it.
While these examples aren’t very useful, you can do some powerful things. For example, you could extract a list of import
statements from a large JS codebase and generate a graph of dependencies between JS files and NPM modules (like this). Or you could refactor your entire codebase automatically by detecting a given code pattern and replacing it with another.
That’s it for today. If you liked this post, you may also like my previous post about parsing your own language with ANTLR4.
Thanks for reading, have a nice day!