Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tree-sitter/tree-sitter/llms.txt

Use this file to discover all available pages before exploring further.

Editing

In applications like text editors, you often need to re-parse a file after its source code has changed. Tree-sitter is designed to support this use case efficiently through incremental parsing.

The Two-Step Process

1

Edit the syntax tree

First, you must edit the syntax tree to adjust the ranges of its nodes so they stay in sync with the code.
typedef struct {
  uint32_t start_byte;
  uint32_t old_end_byte;
  uint32_t new_end_byte;
  TSPoint start_point;
  TSPoint old_end_point;
  TSPoint new_end_point;
} TSInputEdit;

void ts_tree_edit(TSTree *, const TSInputEdit *);
2

Re-parse with the old tree

Then, call ts_parser_parse again, passing in the old tree. This creates a new tree that internally shares structure with the old tree.
TSTree *new_tree = ts_parser_parse_string(
  parser,
  old_tree,  // Pass the edited tree here
  new_source_code,
  strlen(new_source_code)
);

Complete Example

#include <string.h>
#include <tree_sitter/api.h>

const TSLanguage *tree_sitter_javascript(void);

int main() {
  TSParser *parser = ts_parser_new();
  ts_parser_set_language(parser, tree_sitter_javascript());
  
  // Parse original code
  const char *original = "console.log(42);";
  TSTree *tree = ts_parser_parse_string(
    parser, NULL, original, strlen(original)
  );
  
  // User types "hello" before the number 42
  // Position: console.log(|42);
  //           byte offset: 12
  TSInputEdit edit = {
    .start_byte = 12,
    .old_end_byte = 12,
    .new_end_byte = 17,  // Added 5 characters
    .start_point = {0, 12},
    .old_end_point = {0, 12},
    .new_end_point = {0, 17}
  };
  
  // Edit the tree
  ts_tree_edit(tree, &edit);
  
  // Re-parse with the edited tree
  const char *new_source = "console.log(hello42);";
  TSTree *new_tree = ts_parser_parse_string(
    parser, tree, new_source, strlen(new_source)
  );
  
  // Clean up
  ts_tree_delete(tree);
  ts_tree_delete(new_tree);
  ts_parser_delete(parser);
  return 0;
}

Editing Stored Nodes

When you edit a syntax tree, the positions of its nodes will change. If you have stored any TSNode instances outside of the TSTree, you must update them separately:
void ts_node_edit(TSNode *, const TSInputEdit *);
The ts_node_edit function is only needed when you have retrieved TSNode instances before editing the tree and want to continue using those specific instances afterward. Often, you’ll just re-fetch nodes from the edited tree, in which case ts_node_edit is not needed.

Multi-language Documents

Sometimes, different parts of a file may be written in different languages. For example, templating languages like EJS and ERB allow you to generate HTML by writing a mixture of HTML and another language like JavaScript or Ruby. Tree-sitter handles these types of documents by allowing you to create a syntax tree based on the text in certain ranges of a file.

Setting Included Ranges

typedef struct {
  TSPoint start_point;
  TSPoint end_point;
  uint32_t start_byte;
  uint32_t end_byte;
} TSRange;

void ts_parser_set_included_ranges(
  TSParser *self,
  const TSRange *ranges,
  uint32_t range_count
);

Example: Parsing ERB

Consider this ERB document:
<ul>
  <% people.each do |person| %>
    <li><%= person.name %></li>
  <% end %>
</ul>
Conceptually, it can be represented by three syntax trees with overlapping ranges:
  • An ERB syntax tree
  • A Ruby syntax tree (for the <% %> blocks)
  • An HTML syntax tree (for the content outside the blocks)
#include <string.h>
#include <tree_sitter/api.h>

const TSLanguage *tree_sitter_embedded_template(void);
const TSLanguage *tree_sitter_html(void);
const TSLanguage *tree_sitter_ruby(void);

int main(int argc, const char **argv) {
  const char *text = argv[1];
  unsigned len = strlen(text);
  
  // Parse the entire text as ERB
  TSParser *parser = ts_parser_new();
  ts_parser_set_language(parser, tree_sitter_embedded_template());
  TSTree *erb_tree = ts_parser_parse_string(parser, NULL, text, len);
  TSNode erb_root_node = ts_tree_root_node(erb_tree);
  
  // Find the ranges of HTML content and Ruby code
  TSRange html_ranges[10];
  TSRange ruby_ranges[10];
  unsigned html_range_count = 0;
  unsigned ruby_range_count = 0;
  unsigned child_count = ts_node_child_count(erb_root_node);
  
  for (unsigned i = 0; i < child_count; i++) {
    TSNode node = ts_node_child(erb_root_node, i);
    if (strcmp(ts_node_type(node), "content") == 0) {
      html_ranges[html_range_count++] = (TSRange) {
        ts_node_start_point(node),
        ts_node_end_point(node),
        ts_node_start_byte(node),
        ts_node_end_byte(node),
      };
    } else {
      TSNode code_node = ts_node_named_child(node, 0);
      ruby_ranges[ruby_range_count++] = (TSRange) {
        ts_node_start_point(code_node),
        ts_node_end_point(code_node),
        ts_node_start_byte(code_node),
        ts_node_end_byte(code_node),
      };
    }
  }
  
  // Parse the HTML
  ts_parser_set_language(parser, tree_sitter_html());
  ts_parser_set_included_ranges(parser, html_ranges, html_range_count);
  TSTree *html_tree = ts_parser_parse_string(parser, NULL, text, len);
  
  // Parse the Ruby
  ts_parser_set_language(parser, tree_sitter_ruby());
  ts_parser_set_included_ranges(parser, ruby_ranges, ruby_range_count);
  TSTree *ruby_tree = ts_parser_parse_string(parser, NULL, text, len);
  
  // Print all three trees
  char *erb_sexp = ts_node_string(erb_root_node);
  char *html_sexp = ts_node_string(ts_tree_root_node(html_tree));
  char *ruby_sexp = ts_node_string(ts_tree_root_node(ruby_tree));
  printf("ERB: %s\n", erb_sexp);
  printf("HTML: %s\n", html_sexp);
  printf("Ruby: %s\n", ruby_sexp);
  
  return 0;
}
This API allows for great flexibility in how languages can be composed. Tree-sitter is not responsible for mediating the interactions between languages — you’re free to do that using arbitrary application-specific logic.

Concurrency

Tree-sitter supports multi-threaded use cases by making syntax trees very cheap to copy.
TSTree *ts_tree_copy(const TSTree *);
Internally, copying a syntax tree just entails incrementing an atomic reference count. Conceptually, it provides you a new tree which you can freely query, edit, reparse, or delete on a new thread while continuing to use the original tree on a different thread.
#include <pthread.h>
#include <tree_sitter/api.h>

void *analyze_tree(void *tree_ptr) {
  TSTree *tree = (TSTree *)tree_ptr;
  TSNode root = ts_tree_root_node(tree);
  // Perform analysis...
  ts_tree_delete(tree);
  return NULL;
}

int main() {
  TSParser *parser = ts_parser_new();
  // ... setup and parse ...
  TSTree *tree = ts_parser_parse_string(parser, NULL, source, len);
  
  // Create a copy for use in another thread
  TSTree *tree_copy = ts_tree_copy(tree);
  
  pthread_t thread;
  pthread_create(&thread, NULL, analyze_tree, tree_copy);
  
  // Continue using the original tree
  TSNode root = ts_tree_root_node(tree);
  // ...
  
  pthread_join(thread, NULL);
  ts_tree_delete(tree);
  ts_parser_delete(parser);
  return 0;
}
Individual TSTree instances are not thread safe. You must copy a tree if you want to use it on multiple threads simultaneously.

Getting Changed Ranges

When re-parsing after an edit, you can determine which parts of the tree have changed:
TSRange *ts_tree_get_changed_ranges(
  const TSTree *old_tree,
  const TSTree *new_tree,
  uint32_t *length
);
This function returns an array of ranges whose syntactic structure has changed between the old and new trees. The returned array is allocated using malloc and must be freed by the caller.
TSTree *old_tree = /* ... */;
TSTree *new_tree = /* ... */;

uint32_t range_count;
TSRange *ranges = ts_tree_get_changed_ranges(
  old_tree,
  new_tree,
  &range_count
);

for (uint32_t i = 0; i < range_count; i++) {
  printf("Changed: %u-%u\n",
         ranges[i].start_byte,
         ranges[i].end_byte);
}

free(ranges);
The returned ranges indicate areas where the hierarchical structure of syntax nodes (from root to leaf) has changed. Characters outside these ranges have identical ancestor nodes in both trees. The ranges may be slightly larger than the exact changed areas, but Tree-sitter attempts to make them as small as possible.

Next Steps

Walking Trees

Learn about efficient tree traversal with cursors

Pattern Matching

Query syntax trees with powerful patterns