Back to home

Writing a code analyzer in TypeScript

The automated mentoring support on exercism is driven by code analyzers. Explore how you can build an analyzer from scratch.

  • Written by

    Derk-Jan Karrenbeld
    This is the main author for this article.
  • Published on


    updated:
  • Languages

    We've used these programming languages.
    1. TypeScript
  • License

    CC BY-NC-SA 4.0
    This license applies to the textual content of the article. The images might have their own license.

Exercism is an online platform designed to help you improve your coding skills through practice and mentorship.

Exercism provides you with thousands of exercises spread across numerous language tracks. Once you start a language track you are presented with a core set of exercises to complete. Each one is a fun and interesting challenge designed to teach you a little more about the features of a language.

At the moment of writing, I'm a maintainer of the JavaScript and TypeScript track and recently we've been working on automating a part of the experience.

This article will introduce you to AST parsing and walking using ESTree compatible tools. It specifically looks at certain token types, most commonly found in JavaScript and TypeScript code.

It teaches you how to explore these trees yourself and refers to code samples and actual production implementations.

When reading this article, think about your own JavaScript and TypeScript code. Once you understand how the browser (and tooling, like eslint) parses your code, you might better understand how the language is defined and constructed.

🚧 The links to GitHub are all master links, meaning the contents may change between writing this article and you clicking them. However, in order to make sure the code samples make sense, the analyzer repository links refer to a specific commit (9ff332b). This means that the code you see might not be the same as what is used today.

Table of Contents

Photo called 'Feel the freedom' in Dungeness, United Kingdom, displaying a red Volkswagen Samba parked near brown house.
Photo by Zoltan Tasi on Unsplash

πŸ“ The exercise

For this article, I'm going to write the analyzer for the gigasecond exercise, for both the TypeScript Γ‘nd JavaScript track. The description is a mere two lines:

Given a moment, determine the moment that would be after a gigasecond has passed.

A gigasecond is 10^9 (1,000,000,000) seconds.

The canonical data hints at the code I need to write, but luckily the exercise is implemented in both the JavaScript and TypeScript tracks.

JavaScript implementation

The JavaScript implementation expects us to write a named export gigasecond which returns a Date that is a gigasecond past the input Date.

// Test case from the JavaScript track
test('tells a gigasecond anniversary since midnight', () => {
  const gs = gigasecond(new Date(Date.UTC(2015, 8, 14)));
  const expectedDate = new Date(Date.UTC(2047, 4, 23, 1, 46, 40));
  expect(gs).toEqual(expectedDate);
});

TypeScript implementation

The TypeScript implementation expects us to write a default export Gigasecond which is a class that has a date() function which returns a Date that is a gigasecond past the constructor Date.

// Test case from the TypeScript track
it('tells a gigasecond anniversary since midnight', () => {
  const gs = new Gigasecond(new Date(Date.UTC(2015, 8, 14)));
  const expectedDate = new Date(Date.UTC(2047, 4, 23, 1, 46, 40));
  expect(gs.date()).toEqual(expectedDate);
});

πŸ’― Optimal solutions

Before tackling how to write an analyzer for these two implementations, I first have to establish what the optimal solutions are. If I know the intended code result, I can try to recognise that and work from there.

JavaScript solution

The implementation in JavaScript is straightforward. It uses the Date constructor together with Date#getTime and a constant to generate a the appropiate result.

const GIGASECOND_IN_MS = 10 ** 9 * 1000;

export function gigasecond(date) {
  return new Date(date.getTime() + GIGASECOND_IN_MS);
}

It is vital to note the perculiarities here:

  • Optimal is extracting the GIGASECOND_IN_MS value as a top-level constant
  • The constant's value ((10 ** 9) * 1000) can optimally be written in many equally valid forms. Writing the number out, however, is deemed a smell. All the following SHOULD be recognised as optimal:
    • 10 ** 12
    • 1e9 * 1e3
    • 1e12,
    • Math.pow(10, 9) * 1000
    • Math.pow(10, 12)
  • Date#valueOf is not optimal. It is marked as "This method is usually called internally by JavaScript and not explicitly in code.", even though it's functionally equivalent.
  • Finally, Date.parse(date) is not a good candidate as it's supposed to work with strings only. The reason it returns the same value as getTime when given a date, is because that date is coerced to a string and then parsed.

TypeScript solution

The TypeScript implementation expects a class as default export, and has a method date(). The algorithm is exactly the same as in the JavaScript solution, but it requires type annotations.

const GIGASECOND_IN_MS = 10 ** 9 * 1000;

export default class Gigasecond {
  private readonly futureDate: Date;

  constructor(date: Readonly<Date>) {
    this.futureDate = new Date(date.getTime() + GIGASECOND_IN_MS);
  }

  date(): Date {
    return this.futureDate;
  }
}

Apart from the variations and rules as described earlier for JavaScript, the calculation may be done either in the constructor (as shown above) or in the date function. In that last case, it will look as follows:

const GIGASECOND_IN_MS = 10 ** 9 * 1000;

export default class Gigasecond {
  constructor(private readonly date: Date) {}

  date(): Date {
    return new Date(date.getTime() + GIGASECOND_IN_MS);
  }
}

πŸ‘©πŸ½β€πŸ’» Analysing the code

Now it's time to actually write the analyzer. We'll focus on the JavaScript implementation first. Because there are already JavaScript analyzers running in the wild and that work is open source, this example will use the utilities and base class from the javascript-analyzer repository.

πŸ’¬ Abstract Syntax Trees

The JavaScript Analyzer will be working the Abstract Syntax Tree (AST) of the code solutions. There are other ways to write an analyzer, but for the sake of this article, AST parsing is the way to go.

yarn add @typescript-eslint/parser @typescript-eslint/typescript-estree

The TypeScript ESLint team has built a great parser that outputs an ESTree, a specced format of tokens and information about the input code. It can work with both JavaScript and TypeScript and is therefore great for our use case. I prefer working with this type of tree, because of the spec which allows for interoptability with other tools.

The parser deals with the eslint configuration and then invokes the typescript-estree package which compiles the code using TypeScript and transforms the result to match the ESTree. You can head on over to AST Explorer and try this out yourself by pasting the example code from above into the input field and selecting @typescript-eslint/parser. ⚠ Note: The version of the parser here is usually not up-to-date with the latest parser.

πŸƒπŸ½β€πŸ’¨ Running the parser

Now that the packages are in place, let's parse the input code.

import { parse, TSESTreeOptions } from '@typescript-eslint/typescript-estree';

const options: TSESTreeOptions = {
  comment: false,
  jsx: false,
};

const program = parse(source, options);
// => Program({ body: [...], sourceType: "module", tokens: [...] })

This gives us the same output as you see in AST Explorer, with at the root a Program and its body. We won't need the other fields, but tokens is interesting. It lists the parsed tokens from the source, as it was building the tree.

πŸ’Ž You can find this in parsers/AstParser.ts

πŸ”Ž Finding the main entrypoint

We're looking for a function called gigasecond. We know the following things:

  • It must be exported
  • Its name must be gigasecond

The input code declares a function, like in the optimal solution above, so the tree holds a FunctionDeclaration with an Identifier:

export function gigasecond(date) {
  // ...
}
{
  type: "FunctionDeclaration",
  id: {
    type: "Identifier",
    name: "gigasecond"
  },
  generator: false,
  expression: false,
  async: false,
  params: [ ... ],
  // The body is a block { ... }
  body: {
    type: "BlockStatement",
    body: [ ... ]
  }
}

This is something it can search for. The most common way to search in an AST is by walking that tree. You start at some node (usually the root / program) and visit each item.

We know that our parsed EStree is compatible with eslint and that eslint, just like prettier can recognise (and transform) code.

# Not a dev dependency!
yarn add eslint
import { TSESTree } from '@typescript-eslint/typescript-estree';
import { traverse } from 'eslint/lib/util/traverser';

traverse(program, {
  enter(node: TSESTree.Node) {
    // ...
  },
});

When writing this code, TypeScript will complain that there are no types for this library, which is unfortunately still true at moment of writing. However, you can copy this declarations.d.ts I wrote in order to get type completion.

The enter method will be called for each node in the program. Inside the enter block, your in the "TraverserContext" which exposes two methods:

  • this.skip(): skips the node from further traversal, meaning it will not visit any other keys (and therefore children) of the current node;
  • this.break(): completely stop traversing.

Finding the entrypoint is now straightforward.

import { TSESTree, AST_NODE_TYPES } from '@typescript-eslint/typescript-estree';

let entry: TSESTree.FunctionDeclaration | undefined = undefined;

traverse(program, {
  enter(node: TSESTree.Node) {
    switch (node.type) {
      // function name() {}
      case AST_NODE_TYPES.FunctionDeclaration:
        if (node.id && node.id.name === 'gigasecond') {
          entry = node;
          this.break();
        }
        break;
    }
  },
});

entry;
// => FunctionDeclaration({
//      id: { type: "Identifier", name: "gigasecond" }, ...
//    })

Unfortunately the walker above only finds FunctionDeclaration and fails on equivalent code usign an ArrowFunctionExpression or FunctionExpression. More on that later.

πŸ’Ž You can find this in analyzers/utils/extract_main_method.ts

πŸ”Ž Finding the top-level constant

The code finds the first of the two components. Now it also needs to find the second. A top-level const, but the name is not known.

const GIGASECOND_IN_MS = 10 ** 9 * 1000;
{
  type: "VariableDeclaration",
  declarations: [
    {
      type: "VariableDeclarator",
      id: {
        type: "Identifier",
        name: "GIGASECOND_IN_MS"
      },
      init: {
        type: "BinaryExpression",
        operator: "*",
        // Left is another BinaryExpression with **
        left: { ... },
        // Right is a Literal
        right: { ... }
      }
    }
  ],
  kind: "const"
}

Nothing here is particularly helpful. Given the variations of data it needs to accept, I can't rely on init being a certain type. The name is also not fixed as it's not exported and therefore not tested.

However, there are a few constraints that will help here:

  • It must be a top-level constant
  • It can not be named gigasecond
  • In an optimal solution, there is really only one top-level constant that is not the entry,
type FoundConst = { kind: TSESTree.VariableDeclaration['kind'] }
  & TSESTree.VariableDeclarator

let bigNumber: FoundConst | undefined = undefined

traverse(program, {
  enter(node: TSESTree.Node) {
    switch (node.type) {

      // const NAME = ...
      case AST_NODE_TYPES.VariableDeclaration:
        const declaration = node.declarations.find(
          (declaration) => declaration.id && declaration.id.name !== 'gigasecond')
        )

        if (declaration) {
          bigNumber = { kind: node.kind, ...declaration }
          this.break()
        }

        break;

      default:
        // This doesn't declare a variable, so skip the node
        this.skip()
    }
  }
})

Later, I can check bigNumber['kind'] and make sure it's const, or otherwise attach a message that const is preferred.

πŸ’Ž You can find this in analyzers/utils/find_top_level_constants.ts

The algorithm

Now that I found the entry point, I can figure out what the name of the argument is (date). Because I also know the top-level constant, I know what the constant name is GIGASECOND_IN_MS.

new Date(...)

Nothing too fancy here. It's a new expression with an expression as the first argument.

{
  type: "NewExpression",
  callee: {
    type: "Identifier",
    name: "Date"
  },
  arguments: [ ... ]
}
let newDate: TSESTree.NewExpression | undefined = undefined;

traverse(program, {
  enter(node: TSESTree.Node) {
    switch (node.type) {
      // new Date(...)
      case AST_NODE_TYPES.NewExpression:
        if (
          node.callee.type === AST_NODE_TYPES.Identifier &&
          node.callee.name === 'Date'
        ) {
          newDate = node;
          this.break();
        }
        break;

      default:
        // This doesn't declare a variable, so skip the node
        this.skip();
    }
  },
});

πŸ’Ž You can find this in analyzers/utils/find_new_expression.ts

The inner expression is of type BinaryExpression. In EStree compatible output, and operator with two components (such as +, -, *) is a binary expression, whereas those with one component (such as ~ and !) are unary expressions.

date.getTime() + GIGASECOND_IN_MS;
{
  type: "BinaryExpression",
  operator: "+",
  left: {
    type: "CallExpression",
    callee: {
      type: "MemberExpression",
      object: {
        type: "Identifier",
        name: "date"
      },
      property: {
        type: "Identifier",
        name: "getTime"
      }
    },
    arguments: []
  },
  right: {
    type: "Identifier",
    name: "GIGASECOND_IN_MS"
  }
}

Quite a few things we've already seen and also a few new types. Let's look at those.

Properties of objects

When the parser encounters a object property accessor (object.property), it is parsed as a MemberExpression. Depending on the way of writing, the property is either an Identifier or a Literal.

date.getTime;
// ^       ^
// object  property
// |       |
// |       identifier (name = getTime)
// identifier (name = date)

date['getTime'];
// ^       ^
// object  property
// |       |
// |       literal (value = getTime)
// identifier (name = date)

"Executing" a property

If there are parentheses behind the MemberExpression, the entire expression is parsed as a child of a CallExpression. This is also the case for parentheses following an identifier.

date.getTime();
// ---------| ^ |
// ^        | argument(s) of call expression
// member expression
//              |
// -------------|
// call expression

gigasecond(INPUT);
// ------|   ^   |
// ^     | argument of call expression
// identifier    |
//               |
// --------------|
// call expression

Matching the identifiers

There are two identifiers provided by the source code that I need to find and match against:

  • the gigasecond first argument (used in arg.getTime())
  • the top-level constant (used in time + CONSTANT)
const argumentName = entry.id.name;
// => "gigasecond"
const constantName = bigNumber.id.name;
// => "GIGASECOND_IN_MS"

let optimalExpression: boolean = false;

// NOTE: passing in the newDate as root, so this is a subtree traversal!
traverse(newDate, {
  enter(node: TSESTree.Node) {
    switch (node.type) {
      // new Date(x.z() + y)
      case AST_NODE_TYPES.BinaryExpression:
        this.break();

        if (node.operator !== '+') {
          optimalExpression = false;
          return;
        }

        // This allows the order to be reversed
        const leftType = node.left.type;
        const constSide =
          leftType === AST_NODE_TYPES.Identifier ? node.left : node.right;
        const expressionSide =
          leftType === AST_NODE_TYPES.CallExpression ? node.left : node.right;

        if (constSide === expressionSide) {
          // throw new Error("not optimal! this is not x.z() + y")
          optimalExpression = false;
          return;
        }

        if (constSide.id.name !== constantName) {
          optimalExpression = false;
          return;
        }

        const { object, property } = expressionSide.callee;
        optimalExpression =
          object.type === AST_NODE_TYPES.Identifier &&
          object.name === argumentName &&
          ((property.type === AST_NODE_TYPES.Identifier &&
            property.name === 'getTime') ||
            (property.type === AST_NODE_TYPES.Literal &&
              property.value === 'getTime'));

        break;
    }
  },
});

πŸ’Ž You can find this in:

βœ… Automated Mentoring

When all these pieces are put together, it is the analyzer for gigasecond. There are a few more things to check:

  • Is the bigNumber.kind equal to "const"? If not, add a comment
  • Is the value of GIGASECOND_IN_MS using one of the comprehensions? If not, add a comment.
  • Is there only one argument to gigasecond? Make sure it's not a ...splat argument, and that it has no value = "default".
  • Is gigasecond actually exported? Is the export inline? if not, add a comment.

Since the first one was mentioned (kind equality check) and the second one is very similar to the inner expression of the new Date(...) call, I've left out how to implement these. You can check the gigasecond analyzer source code if you need some inspiration. The third one is testing the entry for parameters.

As for exports, these are handled by πŸ’Ž extract_export, but I'll show you the gist of it.

πŸ“¦ Testing exports

Exports in JavaScript and TypeScript basically come in three types. Those using a core language feature (read: use a keyword) are the easiest:

export const inline = {}
export default class InlineClass {}
export default defaultSpecifier
export { specifier }
export { specifier as renamed }

The default export have their own token type ExportDefaultDeclaration.

{
  type: "ExportDefaultDeclaration",
  declaration: {
    // ...
  }
}

Those without a default modifier are of type ExportNamedDeclaration.

{
  type: "ExportNamedDeclaration",
  declaration: {
    // ...
  }
}

The declaration property is where it gets slightly tricky. The inline export statements, regardless if they're default or not, are followed by the same token types as if they did not have the export keyword similarly to how writing parentheses wraps an expression in a CallExpression.

Inline exports

This means that the first example is a VariableDeclaration with a single VariableDeclaractor: the id is an Identifier with name = "inline" and the init is an ObjectExpression. Similarly the second example is a ClassDeclaration with as id an Identifier with name = "InlineClass".

Specifier exports

The third has a declaration of type Identifier with name = "defaultSpecifier". This is similar to inline exports

The fourth and fifth however, do not have a declaration property. Instead, they have a specifiers property with, in this case, one item:

{
  type: "ExportSpecifier",
  local: {
    type: "Identifier",
    name: "specifier"
  }
  exported: {
    type: "Identifier",
    name: "specifier" // or "renamed"
  }
}

Use the local property to determine what is exported (what the internal name is) and the exported property how it's imported (what the exported name is).

CommonJS exports

Finally there are those exports that don't use a keyword but instead use the (as far as I'm concerned) defunct module.exports.

module.exports = singleExport;
module.exports = { specifier };
module.exports.default = defaultExport;
module.exports.renamed = specifier;

Since these don't use keywords, they're interpreted as ExpressionStatements, as they're AssignmentExpressions. Here is a quick overview table of the important properties and representations:

expression type prop value
module.exports.renamed = specifier AssignmentExpression
operator "="
Identifier right "specifier"
MemberExpression left πŸ”½ module.exports.renamed πŸ”½
module.exports.renamed MemberExpression
Identifier property "renamed"
MemberExpression object πŸ”½ module.exports πŸ”½
module.exports MemberExpression
Identifier property "exports"
Identifier object "module"

There is also the variant of using the object['accessor'], where accessor is not an Identifier but a Literal, but otherwise that is the same.

πŸ”€ Testing variations

As mentioned before, there are many ways to write functions in JavaScript and TypeScript. In the source code for the analyzer there is a utility method πŸ’Ž extract_main_method. It can detect the following variations:

function name() {}
// FunctionDeclaration

const name = () => {};
// ArrowFunctionExpression

const name = function () {};
// FunctionExpression

export default {
  name: () => {},
};
// ExportDefaultDeclaration + ObjectExpression + (Arrow)FunctionExpression

And the TypeScript specific variants (but they work on both)

class Foo {
  name() {}
}
// MethodDefinition + FunctionExpression

class Foo {
  static name = () => {};
  static name = function name() {};
}
// ClassProperty + (Arrow)FunctionExpression

Walking TypeScript Trees

As you've noticed, all we did so far is checking the JavaScript code and shown how that is parsed and walked. In order to adapt the solution so that TypeScript code can be parsed with it, there is one thing to change in the walker and a few extra properties to test for.

πŸ”‘ Visitor Keys

When the traverser is walking the tree, it decides which nodes to "walk πŸšΆπŸ½β€" based on a set of keys called visitor keys. Since TypeScript is a superset of JavaScript it has the same keys and then some.

import { visitorKeys } from '@typescript-eslint/parser/dist/visitor-keys';

traverse(root, {
  enter(node: Node) {
    // ...
  },

  visitorKeys,
});

If you look at the source of that file, you'll see that it actually imports the eslint visitor keys (in order to visit all the JavaScript keys) and adds the specific TypeScript keys.

πŸ“– Type annotations

These are interesting.

class Gigasecond {
  constructor(private readonly date: Date) {}
}
{
  type: "ClassDeclaration",
  id: {
    type: "Identifier",
    name: "Gigasecond"
  },
  // everything inside the class { body }
  body: {
    type: "ClassBody",
    body: [
      {
        // a constructor is a regular method definition...
        type: "MethodDefinition",
        key: {
          type: "Identifier",
          name: "constructor"
        },
        value: {
          type: "FunctionExpression",
          params: [{ /*...*/ }],
          generator: false,
          expression: false,
          async: false,
          body: { /*...*/ }
        }
        computed: false,
        static: false,  // (typescript static keyword)
        // ... but with a special kind
        kind: 'constructor'
      }
    ]
  }
}

The above has no special properties over the JavaScript equivalent, but that's because there are no type annotations in the source except for in the params of the constructor:

{
  type: "Identifier",
  name: "date",
  // : ...
  typeAnnotation: {
    type: "TSTypeAnnotation",
    typeAnnotation: {
      // Readonly
      type: "TSTypeReference",
      typeName: {
        type: "Identifier"
        name: "Readonly"
      },
      // <...>
      typeParameters: {
        type: "TSTypeParameterInstantiation",
        // Each type between the < brackets >
        params: [
          {
            type: "TSTypeReference",
            typeName: {
              type: "Identifier",
              name: "Date"
            }
          }
        ]
      }
    }
  }
}

A few key observations:

  • The type annotation has its own visitor key typeAnnotation,
  • All the TypeScript nodes start with TS,
  • A generic type is just a TSTypeReference with both a typeName and one or multiple typeParameters.
  • Stripping types is almost as easy as removing the typeAnnotation keys, which is almost exactly what babel's preset-typescript does.

Class properties

In TypeScript you can annotate class properties with keywords such as private and readonly. Additionally, they can have a type (which is Date in this example).

class Gigasecond {
  private readonly futureDate: Date;
}
{
  type: "ClassProperty",
  key: {
    type: "Identifier",
    name: "futureDate"
  },
  computed: false,
  static: false,            // static keyword
  readonly: true,           // readonly keyword
  accessibility: "private", // private, protected, public keywords
  typeAnnotation: {
    type: "TSTypeAnnotation",
    typeAnnotation: {
      type: "TSTypeReference",
      typeName: {
        type: "Identifier",
        name: "Date"
      }
    }
  }
}

The TypeScript keywords private and readonly modify the ClassProperty directly, but the type is again on typeAnnotation. If the type annotation is left out the source code (read: implicit any), the typeAnnotation key is not present on the AST.

↩ Return types

The final type we'll look at for now are function return types. Most other type annotations are just variations on this and the ones mentioned before.

class Gigasecond {
  date(): Date {
    // ...
  }
}
{
  type: "MethodDefinition",
  key: {
    type: "Identifier",
    name: "date"
  },
  value: {
    type: "FunctionExpression",
    generator: false,
    expression: false,
    async: false,
    body: { /*...*/ },
    params: [],
    returnType: {
      type: "TSTypeAnnotation",
      typeAnnotation: {
        type: "TSTypeReference",
        typeName: {
          type: "Identifier",
          name: "Date"
        }
      }
    }
  },
  computed: false,
  static: false, // static keyword
  kind: "method"
}

As you might have noticed, the typeAnnotation is not on the MethodDefinition. That is because a method definition on a class is actually binding a function expression (): Date { ... } to the identifier date.

On the FunctionExpression you can find the previously not encountered type annotation returnType. Its structure is the same as typeAnnotations for ClassProperty.

Conclusion

Interpreting code as an Abstract Syntax Tree and seeking out certain properties is a lot of tree walking; because there is a specification for the format of the output of certain AST Parsers, you can write tooling yourself.

The contents of this article, in a different format, is being used to automatically approve the gigasecond exercise, given that the student has provided an exact variation of the optimal solution. There is enough surface to hook into the findings of the analyzer to provide meaningful commentary, should the student not have provided an optimal solution.

Photo of Erasmusbrug, Rotterdam, The Netherlands, displaying gray concrete bridge near buildings.
Photo by Miles Vanderlooven on Unsplash

Reference

Analyzer reference

Exercism repositories

Packages