Basic Parsing

Providing the Code

Tree-sitter provides two main functions for parsing source code: one for simple strings and one for custom data structures.

Parsing Strings

The simplest way to parse source code is using ts_parser_parse_string:

TSTree *ts_parser_parse_string(
  TSParser *self,
  const TSTree *old_tree,
  const char *string,
  uint32_t length
);

TSParser*

required

The parser instance

const TSTree*

A previous syntax tree for incremental parsing (pass NULL for initial parse)

const char*

required

The UTF-8 encoded source code to parse

uint32_t

required

The length of the string in bytes

Parsing Custom Data Structures

You may want to parse source code that’s stored in a custom data structure, like a piece table or a rope. In this case, use the more general ts_parser_parse function:

TSTree *ts_parser_parse(
  TSParser *self,
  const TSTree *old_tree,
  TSInput input
);

The TSInput structure lets you provide your own function for reading text:

typedef struct {
  void *payload;
  const char *(*read)(
    void *payload,
    uint32_t byte_offset,
    TSPoint position,
    uint32_t *bytes_read
  );
  TSInputEncoding encoding;
  TSDecodeFunction decode;
} TSInput;

Define a read callback

Your callback receives a byte offset and position, and should return a pointer to text starting at that position.

const char *read_callback(
  void *payload,
  uint32_t byte_offset,
  TSPoint position,
  uint32_t *bytes_read
) {
  MyDocument *doc = (MyDocument *)payload;
  // Return a chunk of text from your data structure
  return doc->get_chunk_at(byte_offset, bytes_read);
}

Create the TSInput structure

TSInput input = {
  .payload = my_document,
  .read = read_callback,
  .encoding = TSInputEncodingUTF8,
  .decode = NULL
};

Parse the document

TSTree *tree = ts_parser_parse(parser, NULL, input);

Custom Text Encoding

If your text uses an encoding other than UTF-8 or UTF-16, you can provide a custom decode function:

typedef uint32_t (*TSDecodeFunction)(
  const uint8_t *string,
  uint32_t length,
  int32_t *code_point
);

The TSInputEncoding must be set to TSInputEncodingCustom for the decode function to be called.

The function should:

Read bytes from string (up to length bytes)
Write the decoded code point to code_point
Return the number of bytes consumed

Syntax Nodes

Tree-sitter provides a DOM-style interface for inspecting syntax trees.

Node Type

A syntax node’s type is a string that indicates which grammar rule the node represents:

const char *ts_node_type(TSNode);

TSNode node = ts_tree_root_node(tree);
printf("Node type: %s\n", ts_node_type(node));

let node = tree.root_node();
println!("Node type: {}", node.kind());

const node = tree.rootNode;
console.log('Node type:', node.type);

Node Position

Nodes store their position in the source code both in raw bytes and row/column coordinates:

typedef struct {
  uint32_t row;
  uint32_t column;
} TSPoint;

uint32_t ts_node_start_byte(TSNode);
uint32_t ts_node_end_byte(TSNode);
TSPoint ts_node_start_point(TSNode);
TSPoint ts_node_end_point(TSNode);

In a TSPoint, rows and columns are zero-based. The row field represents the number of newlines before a given position, while column represents the number of bytes between the position and the beginning of the line. A newline is considered to be a single line feed (\n) character.

TSNode node = ts_tree_root_node(tree);
uint32_t start_byte = ts_node_start_byte(node);
TSPoint start_point = ts_node_start_point(node);
printf("Start: byte %d, row %d, column %d\n",
       start_byte, start_point.row, start_point.column);

let node = tree.root_node();
let start_byte = node.start_byte();
let start_point = node.start_position();
println!("Start: byte {}, row {}, column {}",
         start_byte, start_point.row, start_point.column);

const node = tree.rootNode;
console.log('Start:', {
  byte: node.startIndex,
  row: node.startPosition.row,
  column: node.startPosition.column
});

Retrieving Nodes

Root Node

Every tree has a root node:

TSNode ts_tree_root_node(const TSTree *);

Child Nodes

Once you have a node, you can access its children:

uint32_t ts_node_child_count(TSNode);
TSNode ts_node_child(TSNode, uint32_t);

Siblings and Parent

You can also navigate to siblings and parent nodes:

TSNode ts_node_next_sibling(TSNode);
TSNode ts_node_prev_sibling(TSNode);
TSNode ts_node_parent(TSNode);

These methods may return a null node to indicate that no such node exists (e.g., no next sibling). Always check if a node is null using ts_node_is_null(TSNode).

bool ts_node_is_null(TSNode);

Named vs Anonymous Nodes

Tree-sitter produces concrete syntax trees — trees that contain nodes for every individual token in the source code, including things like commas and parentheses. This is important for use-cases like syntax highlighting. However, some types of code analysis are easier with an abstract syntax tree — a tree in which less important details have been removed. Tree-sitter supports both by making a distinction between named and anonymous nodes.

Example

Consider this grammar rule:

if_statement: $ => seq("if", "(", $._expression, ")", $._statement);

A syntax node representing an if_statement would have 5 children:

The condition expression (named)
The body statement (named)
The if, (, and ) tokens (anonymous)

The expression and statement are marked as named nodes because they have explicit names in the grammar. The if, (, and ) nodes are anonymous because they are represented as simple strings.

Checking if a Node is Named

bool ts_node_is_named(TSNode);

You can skip over anonymous nodes by using the _named_ variants:

TSNode ts_node_named_child(TSNode, uint32_t);
uint32_t ts_node_named_child_count(TSNode);
TSNode ts_node_next_named_sibling(TSNode);
TSNode ts_node_prev_named_sibling(TSNode);

Using the _named_ methods makes the syntax tree function much like an abstract syntax tree.

Node Field Names

Many grammars assign unique field names to particular child nodes to make them easier to analyze.

Accessing Children by Field Name

TSNode ts_node_child_by_field_name(
  TSNode self,
  const char *field_name,
  uint32_t field_name_length
);

TSNode if_node = /* ... */;
TSNode condition = ts_node_child_by_field_name(
  if_node,
  "condition",
  strlen("condition")
);

let if_node = /* ... */;
let condition = if_node.child_by_field_name("condition");

const ifNode = /* ... */;
const condition = ifNode.childForFieldName('condition');

Using Field IDs

Fields also have numeric IDs that you can use to avoid repeated string comparisons:

uint32_t ts_language_field_count(const TSLanguage *);
const char *ts_language_field_name_for_id(const TSLanguage *, TSFieldId);
TSFieldId ts_language_field_id_for_name(const TSLanguage *, const char *, uint32_t);

TSNode ts_node_child_by_field_id(TSNode, TSFieldId);

// Get field ID once
TSFieldId condition_field = ts_language_field_id_for_name(
  language,
  "condition",
  strlen("condition")
);

// Reuse field ID for multiple queries
TSNode condition1 = ts_node_child_by_field_id(if_node1, condition_field);
TSNode condition2 = ts_node_child_by_field_id(if_node2, condition_field);

// Field IDs are handled automatically by the Rust API
let condition = if_node.child_by_field_name("condition");

// Field access is optimized automatically in JavaScript
const condition = ifNode.childForFieldName('condition');

Get Started

Using Parsers

Queries

Creating Parsers

Advanced Topics

Providing the Code

Parsing Strings

Parsing Custom Data Structures

Custom Text Encoding

Syntax Nodes

Node Type

Node Position

Retrieving Nodes

Root Node

Child Nodes

Siblings and Parent

Named vs Anonymous Nodes

Example

Checking if a Node is Named

Named-Only Navigation

Node Field Names

Accessing Children by Field Name

Using Field IDs

Next Steps

Advanced Parsing

Walking Trees

​Providing the Code

​Parsing Strings

​Parsing Custom Data Structures

​Custom Text Encoding

​Syntax Nodes

​Node Type

​Node Position

​Retrieving Nodes

​Root Node

​Child Nodes

​Siblings and Parent

​Named vs Anonymous Nodes

​Example

​Checking if a Node is Named

​Named-Only Navigation

​Node Field Names

​Accessing Children by Field Name

​Using Field IDs

​Next Steps

Advanced Parsing

Walking Trees

Providing the Code

Parsing Strings

Parsing Custom Data Structures

Custom Text Encoding

Syntax Nodes

Node Type

Node Position

Retrieving Nodes

Root Node

Child Nodes

Siblings and Parent

Named vs Anonymous Nodes

Example

Checking if a Node is Named

Named-Only Navigation

Node Field Names

Accessing Children by Field Name

Using Field IDs

Next Steps