Advanced Parsing

Editing

In applications like text editors, you often need to re-parse a file after its source code has changed. Tree-sitter is designed to support this use case efficiently through incremental parsing.

The Two-Step Process

Edit the syntax tree

First, you must edit the syntax tree to adjust the ranges of its nodes so they stay in sync with the code.

typedef struct {
  uint32_t start_byte;
  uint32_t old_end_byte;
  uint32_t new_end_byte;
  TSPoint start_point;
  TSPoint old_end_point;
  TSPoint new_end_point;
} TSInputEdit;

void ts_tree_edit(TSTree *, const TSInputEdit *);

Re-parse with the old tree

Then, call ts_parser_parse again, passing in the old tree. This creates a new tree that internally shares structure with the old tree.

TSTree *new_tree = ts_parser_parse_string(
  parser,
  old_tree,  // Pass the edited tree here
  new_source_code,
  strlen(new_source_code)
);

Complete Example

#include <string.h>
#include <tree_sitter/api.h>

const TSLanguage *tree_sitter_javascript(void);

int main() {
  TSParser *parser = ts_parser_new();
  ts_parser_set_language(parser, tree_sitter_javascript());
  
  // Parse original code
  const char *original = "console.log(42);";
  TSTree *tree = ts_parser_parse_string(
    parser, NULL, original, strlen(original)
  );
  
  // User types "hello" before the number 42
  // Position: console.log(|42);
  //           byte offset: 12
  TSInputEdit edit = {
    .start_byte = 12,
    .old_end_byte = 12,
    .new_end_byte = 17,  // Added 5 characters
    .start_point = {0, 12},
    .old_end_point = {0, 12},
    .new_end_point = {0, 17}
  };
  
  // Edit the tree
  ts_tree_edit(tree, &edit);
  
  // Re-parse with the edited tree
  const char *new_source = "console.log(hello42);";
  TSTree *new_tree = ts_parser_parse_string(
    parser, tree, new_source, strlen(new_source)
  );
  
  // Clean up
  ts_tree_delete(tree);
  ts_tree_delete(new_tree);
  ts_parser_delete(parser);
  return 0;
}

use tree_sitter::{Parser, InputEdit, Point};

let mut parser = Parser::new();
parser.set_language(&tree_sitter_javascript::language()).unwrap();

// Parse original code
let original = "console.log(42);";
let mut tree = parser.parse(original, None).unwrap();

// User types "hello" before the number 42
let edit = InputEdit {
    start_byte: 12,
    old_end_byte: 12,
    new_end_byte: 17,
    start_position: Point { row: 0, column: 12 },
    old_end_position: Point { row: 0, column: 12 },
    new_end_position: Point { row: 0, column: 17 },
};

// Edit the tree
tree.edit(&edit);

// Re-parse with the edited tree
let new_source = "console.log(hello42);";
let new_tree = parser.parse(new_source, Some(&tree)).unwrap();

const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');

const parser = new Parser();
parser.setLanguage(JavaScript);

// Parse original code
const original = 'console.log(42);';
let tree = parser.parse(original);

// User types "hello" before the number 42
tree.edit({
  startIndex: 12,
  oldEndIndex: 12,
  newEndIndex: 17,
  startPosition: { row: 0, column: 12 },
  oldEndPosition: { row: 0, column: 12 },
  newEndPosition: { row: 0, column: 17 }
});

// Re-parse with the edited tree
const newSource = 'console.log(hello42);';
const newTree = parser.parse(newSource, tree);

Editing Stored Nodes

When you edit a syntax tree, the positions of its nodes will change. If you have stored any TSNode instances outside of the TSTree, you must update them separately:

void ts_node_edit(TSNode *, const TSInputEdit *);

The ts_node_edit function is only needed when you have retrieved TSNode instances before editing the tree and want to continue using those specific instances afterward. Often, you’ll just re-fetch nodes from the edited tree, in which case ts_node_edit is not needed.

Multi-language Documents

Sometimes, different parts of a file may be written in different languages. For example, templating languages like EJS and ERB allow you to generate HTML by writing a mixture of HTML and another language like JavaScript or Ruby. Tree-sitter handles these types of documents by allowing you to create a syntax tree based on the text in certain ranges of a file.

Setting Included Ranges

typedef struct {
  TSPoint start_point;
  TSPoint end_point;
  uint32_t start_byte;
  uint32_t end_byte;
} TSRange;

void ts_parser_set_included_ranges(
  TSParser *self,
  const TSRange *ranges,
  uint32_t range_count
);

Example: Parsing ERB

Consider this ERB document:

<ul>
  <% people.each do |person| %>
    <li><%= person.name %></li>
  <% end %>
</ul>

Conceptually, it can be represented by three syntax trees with overlapping ranges:

An ERB syntax tree
A Ruby syntax tree (for the <% %> blocks)
An HTML syntax tree (for the content outside the blocks)

#include <string.h>
#include <tree_sitter/api.h>

const TSLanguage *tree_sitter_embedded_template(void);
const TSLanguage *tree_sitter_html(void);
const TSLanguage *tree_sitter_ruby(void);

int main(int argc, const char **argv) {
  const char *text = argv[1];
  unsigned len = strlen(text);
  
  // Parse the entire text as ERB
  TSParser *parser = ts_parser_new();
  ts_parser_set_language(parser, tree_sitter_embedded_template());
  TSTree *erb_tree = ts_parser_parse_string(parser, NULL, text, len);
  TSNode erb_root_node = ts_tree_root_node(erb_tree);
  
  // Find the ranges of HTML content and Ruby code
  TSRange html_ranges[10];
  TSRange ruby_ranges[10];
  unsigned html_range_count = 0;
  unsigned ruby_range_count = 0;
  unsigned child_count = ts_node_child_count(erb_root_node);
  
  for (unsigned i = 0; i < child_count; i++) {
    TSNode node = ts_node_child(erb_root_node, i);
    if (strcmp(ts_node_type(node), "content") == 0) {
      html_ranges[html_range_count++] = (TSRange) {
        ts_node_start_point(node),
        ts_node_end_point(node),
        ts_node_start_byte(node),
        ts_node_end_byte(node),
      };
    } else {
      TSNode code_node = ts_node_named_child(node, 0);
      ruby_ranges[ruby_range_count++] = (TSRange) {
        ts_node_start_point(code_node),
        ts_node_end_point(code_node),
        ts_node_start_byte(code_node),
        ts_node_end_byte(code_node),
      };
    }
  }
  
  // Parse the HTML
  ts_parser_set_language(parser, tree_sitter_html());
  ts_parser_set_included_ranges(parser, html_ranges, html_range_count);
  TSTree *html_tree = ts_parser_parse_string(parser, NULL, text, len);
  
  // Parse the Ruby
  ts_parser_set_language(parser, tree_sitter_ruby());
  ts_parser_set_included_ranges(parser, ruby_ranges, ruby_range_count);
  TSTree *ruby_tree = ts_parser_parse_string(parser, NULL, text, len);
  
  // Print all three trees
  char *erb_sexp = ts_node_string(erb_root_node);
  char *html_sexp = ts_node_string(ts_tree_root_node(html_tree));
  char *ruby_sexp = ts_node_string(ts_tree_root_node(ruby_tree));
  printf("ERB: %s\n", erb_sexp);
  printf("HTML: %s\n", html_sexp);
  printf("Ruby: %s\n", ruby_sexp);
  
  return 0;
}

use tree_sitter::{Parser, Range, Point};

// Parse the entire text as ERB
let mut parser = Parser::new();
parser.set_language(&tree_sitter_embedded_template::language()).unwrap();
let erb_tree = parser.parse(text, None).unwrap();

// Find the ranges of HTML content and Ruby code
let mut html_ranges = Vec::new();
let mut ruby_ranges = Vec::new();

for child in erb_tree.root_node().children(&mut tree.walk()) {
    if child.kind() == "content" {
        html_ranges.push(Range {
            start_byte: child.start_byte(),
            end_byte: child.end_byte(),
            start_point: child.start_position(),
            end_point: child.end_position(),
        });
    } else if let Some(code_node) = child.named_child(0) {
        ruby_ranges.push(Range {
            start_byte: code_node.start_byte(),
            end_byte: code_node.end_byte(),
            start_point: code_node.start_position(),
            end_point: code_node.end_position(),
        });
    }
}

// Parse the HTML
parser.set_language(&tree_sitter_html::language()).unwrap();
parser.set_included_ranges(&html_ranges).unwrap();
let html_tree = parser.parse(text, None).unwrap();

// Parse the Ruby
parser.set_language(&tree_sitter_ruby::language()).unwrap();
parser.set_included_ranges(&ruby_ranges).unwrap();
let ruby_tree = parser.parse(text, None).unwrap();

This API allows for great flexibility in how languages can be composed. Tree-sitter is not responsible for mediating the interactions between languages — you’re free to do that using arbitrary application-specific logic.

Concurrency

Tree-sitter supports multi-threaded use cases by making syntax trees very cheap to copy.

TSTree *ts_tree_copy(const TSTree *);

Internally, copying a syntax tree just entails incrementing an atomic reference count. Conceptually, it provides you a new tree which you can freely query, edit, reparse, or delete on a new thread while continuing to use the original tree on a different thread.

#include <pthread.h>
#include <tree_sitter/api.h>

void *analyze_tree(void *tree_ptr) {
  TSTree *tree = (TSTree *)tree_ptr;
  TSNode root = ts_tree_root_node(tree);
  // Perform analysis...
  ts_tree_delete(tree);
  return NULL;
}

int main() {
  TSParser *parser = ts_parser_new();
  // ... setup and parse ...
  TSTree *tree = ts_parser_parse_string(parser, NULL, source, len);
  
  // Create a copy for use in another thread
  TSTree *tree_copy = ts_tree_copy(tree);
  
  pthread_t thread;
  pthread_create(&thread, NULL, analyze_tree, tree_copy);
  
  // Continue using the original tree
  TSNode root = ts_tree_root_node(tree);
  // ...
  
  pthread_join(thread, NULL);
  ts_tree_delete(tree);
  ts_parser_delete(parser);
  return 0;
}

use std::thread;
use tree_sitter::{Parser, Tree};

let mut parser = Parser::new();
// ... setup and parse ...
let tree = parser.parse(source, None).unwrap();

// Clone the tree for use in another thread
let tree_copy = tree.clone();

let handle = thread::spawn(move || {
    let root = tree_copy.root_node();
    // Perform analysis...
});

// Continue using the original tree
let root = tree.root_node();
// ...

handle.join().unwrap();

// JavaScript is single-threaded, but you can use Worker threads
const { Worker } = require('worker_threads');

const tree = parser.parse(sourceCode);

// Serialize the tree for use in a worker
const worker = new Worker('./analyze-worker.js', {
  workerData: {
    sourceCode: sourceCode,
    // Workers need to re-parse; tree objects can't be transferred
  }
});

// Continue using the tree in the main thread
const root = tree.rootNode;

Individual TSTree instances are not thread safe. You must copy a tree if you want to use it on multiple threads simultaneously.

Getting Changed Ranges

When re-parsing after an edit, you can determine which parts of the tree have changed:

TSRange *ts_tree_get_changed_ranges(
  const TSTree *old_tree,
  const TSTree *new_tree,
  uint32_t *length
);

This function returns an array of ranges whose syntactic structure has changed between the old and new trees. The returned array is allocated using malloc and must be freed by the caller.

TSTree *old_tree = /* ... */;
TSTree *new_tree = /* ... */;

uint32_t range_count;
TSRange *ranges = ts_tree_get_changed_ranges(
  old_tree,
  new_tree,
  &range_count
);

for (uint32_t i = 0; i < range_count; i++) {
  printf("Changed: %u-%u\n",
         ranges[i].start_byte,
         ranges[i].end_byte);
}

free(ranges);

let old_tree = /* ... */;
let new_tree = /* ... */;

let ranges = old_tree.changed_ranges(&new_tree);
for range in ranges {
    println!("Changed: {}-{}",
             range.start_byte,
             range.end_byte);
}

The returned ranges indicate areas where the hierarchical structure of syntax nodes (from root to leaf) has changed. Characters outside these ranges have identical ancestor nodes in both trees. The ranges may be slightly larger than the exact changed areas, but Tree-sitter attempts to make them as small as possible.

Get Started

Using Parsers

Queries

Creating Parsers

Advanced Topics

Editing

The Two-Step Process

Complete Example

Editing Stored Nodes

Multi-language Documents

Setting Included Ranges

Example: Parsing ERB

Concurrency

Getting Changed Ranges

Next Steps

Walking Trees

Pattern Matching

​Editing

​The Two-Step Process

​Complete Example

​Editing Stored Nodes

​Multi-language Documents

​Setting Included Ranges

​Example: Parsing ERB

​Concurrency

​Getting Changed Ranges

​Next Steps

Walking Trees

Pattern Matching

Editing

The Two-Step Process

Complete Example

Editing Stored Nodes

Multi-language Documents

Setting Included Ranges

Example: Parsing ERB

Concurrency

Getting Changed Ranges

Next Steps