Integrate Antlr4 to C++ Projects for Javascript with CMake on Linux

Here are the requires steps to make existing CMake-based C++ projects to compile with antlr4 for Javascript on Linux.

1. Generate the JavaScriptLexer.{h|cpp} and JavaScriptParser.{h|cpp}

These files will be added to your project to parse JavaScript.

  1. Download the ANTLR4 Java binary: Visit ANTLR’s download page and download the latest .jar file. Example: antlr-4.13.2-complete.jar.

  2. Get the grammar files: Download JavaScriptLexer.g4 and JavaScriptParser.g4 from the official ANTLR grammars repo.

  3. Modify the grammar files according to the instructions are https://github.com/antlr/grammars-v4/tree/master/javascript/javascript/Cpp#c-port. This is needed because the grammar files are generic to all target languages, we need some tweaks for C++.

  4. Run the ANTLR tool: Use the following command to generate C++ source files from the grammar files. Ensure java is installed on your machine.

    java -jar antlr-4.13.2-complete.jar -Dlanguage=Cpp JavaScriptLexer.g4 JavaScriptParser.g4
    
  5. Add generated files to your project: Move the generated files into your project’s source directory. Your project structure should look like this:

     <root directory for the project>
         |-CMakeLists.txt
         |-src
            |-main.cpp
            |-antlr4
               |-JavaScriptLexer.h
               |-JavaScriptLexer.cpp
               |-JavaScriptParser.h
               |-JavaScriptParser.cpp
               |-...
    

2. Clone https://github.com/antlr/grammars-v4.git into your project

This is needed because grammars-v4 contains the Base classes that are required for the lexer and parser.

git submodule add https://github.com/antlr/grammars-v4.git third_party/grammars-v4

3. Clone https://github.com/antlr/antlr4.git

This is needed because in order to run anltr’s lexer and parser, they require a runtime, which can be compiled to a static/dynamic library

git submodule add https://github.com/antlr/antlr4.git third_party/antlr4

4. Modify the CMakeLists.txt

Now, we have the setup done, we just need to link them together.

Assume the project structure now looks like

<root directory for the project>
    |-CMakeLists.txt
    |-third_party
       |-antlr4
       |-grammars-v4
    |-src
       |-main.cpp
       |-antlr4
          |-JavaScriptLexer.h
          |-JavaScriptLexer.cpp
          |-JavaScriptParser.h
          |-JavaScriptParser.cpp
          |-...

Let’s modify the CMakeLists file.

  1. Add add_subdirectory(third_party/antlr4/runtime/Cpp) This line tells CMake to also run the CMakeLists.txt inside this subdirectory, which effectively means compile the runtime.

  2. Make sure these files are added to the source list in CMakeLists.txt

     third_party/grammars-v4/javascript/javascript/Cpp/JavaScriptLexerBase.cpp
     third_party/grammars-v4/javascript/javascript/Cpp/JavaScriptParserBase.cpp
    
     src/antlr4/JavaScriptLexer.cpp
     src/antlr4/JavaScriptParser.cpp
    
  3. Make sure the include files are setup correctly

    target_include_directories(
      MyProject
      #
      # Your other includes
      #
      PRIVATE src/antlr4
      PRIVATE third_party/antlr4/runtime/Cpp/runtime/src
      PRIVATE third_party/grammars-v4/javascript/javascript/Cpp
    )
    
  4. Make sure the static library is linked. Add this line

    target_link_libraries(MyProject PRIVATE antlr4_static)

After all those steps, our CMakeLists.txt should be like

cmake_minimum_required(VERSION 3.10)

project(MyProject VERSION 0.1 LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED True)

add_subdirectory(third_party/antlr4/runtime/Cpp)

add_executable(
    MyProject
    src/main.cpp

    third_party/grammars-v4/javascript/javascript/Cpp/JavaScriptLexerBase.cpp
    third_party/grammars-v4/javascript/javascript/Cpp/JavaScriptParserBase.cpp

    src/antlr4/JavaScriptLexer.cpp
    src/antlr4/JavaScriptParser.cpp
)

target_include_directories(
    MyProject
    PRIVATE src/antlr4
    PRIVATE third_party/antlr4/runtime/Cpp/runtime/src
    PRIVATE third_party/grammars-v4/javascript/javascript/Cpp
)
target_link_libraries(MyProject PRIVATE antlr4_static)

and we are good to go :)

Assume the src/main.cpp looks like


/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set ts=8 sts=2 et sw=2 tw=80: */
/* This Source Code Form is subject to the terms of the Mozilla Public
 * License, v. 2.0. If a copy of the MPL was not distributed with this
 * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

#include "antlr4-runtime.h"
#include "JavaScriptLexer.h"
#include "JavaScriptParser.h"
#include <cstdio>
#include <fstream>

using namespace antlr4;
int main(int argc, char *argv[]) {
  std::string code = "let x = 5;";

  // Set up ANTLR input stream and lexer
  ANTLRInputStream input(code);
  JavaScriptLexer lexer(&input);

  // Tokenize the input
  CommonTokenStream tokens(&lexer);

  // Set up the parser and parse the token stream
  JavaScriptParser parser(&tokens);
  tree::ParseTree *tree = parser.program(); // Start rule

  std::cout << "Parsed: " << tree->toStringTree(&parser) << std::endl;
}

Doing the normal CMake compile process should work, so something like

mkdir build && cd build && cmake .. && make