Documentation Index
Fetch the complete documentation index at: https://mintlify.com/tree-sitter/tree-sitter/llms.txt
Use this file to discover all available pages before exploring further.
Providing the Code
Tree-sitter provides two main functions for parsing source code: one for simple strings and one for custom data structures.Parsing Strings
The simplest way to parse source code is usingts_parser_parse_string:
The parser instance
A previous syntax tree for incremental parsing (pass
NULL for initial parse)The UTF-8 encoded source code to parse
The length of the string in bytes
Parsing Custom Data Structures
You may want to parse source code that’s stored in a custom data structure, like a piece table or a rope. In this case, use the more generalts_parser_parse function:
TSInput structure lets you provide your own function for reading text:
Define a read callback
Your callback receives a byte offset and position, and should return a pointer to text starting at that position.
Custom Text Encoding
If your text uses an encoding other than UTF-8 or UTF-16, you can provide a custom decode function:- Read bytes from
string(up tolengthbytes) - Write the decoded code point to
code_point - Return the number of bytes consumed
Syntax Nodes
Tree-sitter provides a DOM-style interface for inspecting syntax trees.Node Type
A syntax node’s type is a string that indicates which grammar rule the node represents:Node Position
Nodes store their position in the source code both in raw bytes and row/column coordinates:In a
TSPoint, rows and columns are zero-based. The row field represents the number of newlines before a given position, while column represents the number of bytes between the position and the beginning of the line. A newline is considered to be a single line feed (\n) character.Retrieving Nodes
Root Node
Every tree has a root node:Child Nodes
Once you have a node, you can access its children:Siblings and Parent
You can also navigate to siblings and parent nodes:Named vs Anonymous Nodes
Tree-sitter produces concrete syntax trees — trees that contain nodes for every individual token in the source code, including things like commas and parentheses. This is important for use-cases like syntax highlighting. However, some types of code analysis are easier with an abstract syntax tree — a tree in which less important details have been removed. Tree-sitter supports both by making a distinction between named and anonymous nodes.Example
Consider this grammar rule:if_statement would have 5 children:
- The condition expression (named)
- The body statement (named)
- The
if,(, and)tokens (anonymous)
if, (, and ) nodes are anonymous because they are represented as simple strings.
Checking if a Node is Named
Named-Only Navigation
You can skip over anonymous nodes by using the_named_ variants:
Node Field Names
Many grammars assign unique field names to particular child nodes to make them easier to analyze.Accessing Children by Field Name
Using Field IDs
Fields also have numeric IDs that you can use to avoid repeated string comparisons:Next Steps
Advanced Parsing
Learn about incremental parsing and editing
Walking Trees
Use tree cursors for efficient traversal