Bachelor projects

Extensible LaTeX parser


The LaTeX typesetting system allows writing documents with structural mark-up and renders to viewable or printable formats that include styled text. LaTeX has many extensions providing mark-up for many different elements you may want to present in your document. For example, it contains extensions to layout source code you want to show as examples in your code applying syntax highlighting, or extensions that let you textually define diagrams and graphs which are rendered to an image in the document. Current editing environments for LaTeX documents contain only rudimentary support for writing documents that use such extensions. But each extension may add its own syntax to LaTeX and should be visualized differently in the editor.


The core syntax of LaTeX is relatively simple but can be extended in two ways.

  1. All mark-up commands start with a “\” character followed by the name of the command and some arguments. The format of the arguments can depend on the concrete command used.
  2. The special commands “\begin{environment-name}” and “\end{environment-name}” enclose text passages of which the format depends on the specified environment.

In this assignment you will develop an Eclipse plug-in containing an extensible parser for LaTeX documents. The parser should be extensible with new commands and new environments and extensions should be addible dynamically. In order to make the parser extensible, the Eclipse plug-in mechanism should be used. The output of the parser should be an abstract syntax tree of the parsed document where each parser extension can contribute its own abstract syntax tree nodes.

Furthermore editing tools should be developed that act on the provided abstract syntax tree, e.g., to apply syntax highlighting or to show the outline of the document.

List of Tasks

  • Make a survey of popular extensions to the LaTeX language (e.g., the packages lstlistings, pgf, prettyref) and determine in which way you need to extend the LaTeX parser
  • Develop a parser for the LaTeX core language that is, for example, able to ignore unrecognized commands or environments
  • Insert extension points into this parser
  • Develop extensions for a few selected LaTeX packages
  • Develop tools like syntax highlighting or an outline view based on the abstract syntax tree provided by the parser
  • If possible, make the parser work incrementally


  • Sebastian Erdweg, Tillmann Rendel, Christian Kästner and Klaus Ostermann, SugarJ: Library-based Syntactic Language Extensibility. In Proceedings of OOPSLA, ACM, 2011
  • Eclipse Extension Points and Extensions:
  • Developing LaTeX packages:


Christoph Bockisch