forest

package module
v1.5.44 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 23, 2024 License: MIT Imports: 286 Imported by: 0

README

a Go 🌳 Sitter Forest

Where a Gopher wanders around and meets lots of 🌳 Sitters...

First of all, giving credits where they are due:

This repository started as a fork of @smacker's go-tree-sitter repo until I realized I don't want to also handle the bindings library itself in the same project (i.e. the stuff in the root of the repo, exposing sitter.Language type itself & co.), I just want a (big) collection of all the tree-sitter parsers I can add.

So here it is: started with the parsers and the automation from the above mentioned repo then added a bunch more parsers on top of it and updated automation (to support more parsers and also to automatically update the PARSERS.md file, git tags, etc.).

See PARSERS.md for the list of supported parsers. The end goal is (at least) parity with nvim_treesitter.

For contributing (or just to see how the automation works) see CONTRIBUTING.md.

Naming Conventions

The language name used is the same as TreeSitter language name (lower case, underscore instead of spaces) and the same as the query folder name in nvim_treesitter.

This keeps things simple and consistent.

In rare cases, the Go package name differs from the language name:

  • go actually has the package name Go because package go does not go well in Go (pun intended) but otherwise the language name remains "go";
  • func language, same problem as above, so package name is actually FunC (but everything else is func as normal: folder, language name, etc.).

Also, some languages may have names that are not very straightforward acronyms. In those cases, an altName field will be populated, i.e. requirements language has an altName of Pip requirements, query has Tree-Sitter Query Language and so on. Search grammar.json for your grammar of interest.

Usage

See the README in go-tree-sitter-bare, as well as the example_*.go files in this repo.

This repo only gives you the GetLanguage() function, you will still use the sibling repo for all your interactions with the tree.

You can use the parsers in this repo in 3 main ways:

1. Standalone

You can use the parsers one (or more) at a time, as you'd use any other Go package:

package main

import (
	"context"
	"fmt"

	"github.com/alexaandru/go-sitter-forest/risor"
	sitter "github.com/alexaandru/go-tree-sitter-bare"
)

func main() {
	content := []byte("print('It works!')\n")
	node, err := sitter.ParseCtx(context.TODO(), content, risor.GetLanguage())
	if err != nil {
		panic(err)
	}

	// Do something interesting with the parsed tree...
	fmt.Println(node)
}
2. In Bulk

If (and only IF) you want to use ALL (or most of) the parsers (beware, your binary size will be huge, as in 200MB+ huge) then you can use the root (forest) package:

package main

import (
	"context"
	"fmt"

	forest "github.com/alexaandru/go-sitter-forest"
	sitter "github.com/alexaandru/go-tree-sitter-bare"
)

func main() {
	content := []byte("print('It works!')\n")
	parser := sitter.NewParser()
	parser.SetLanguage(forest.GetLanguage("risor")())

	tree, err := parser.ParseCtx(context.TODO(), nil, content)
	if err != nil {
		panic(err)
	}

	// Do something interesting with the parsed tree...
	fmt.Println(tree.RootNode())
}

this way you can fetch and use any of the parsers dynamically, without having to manually import them. You should rarely need this though, unless you're writing a text editor or something.

3. As a Plugin

A third way, and perhaps the most convenient (no, it's not, it's ~200MB with all parsers built into the binary whereas all parsers built as plugins took ~1650MB last time I built them all (which granted, was several versions ago, before upgrading to TreeSitter v0.22.1)), is to use the included Plugins.make makefile, which allows easy creation of any and all plugins. Simply copy it to your repo, and then you can easily make -f Plugins.make plugin-risor, etc. or use the plugin-all target which creates all the plugins.

Then you can selectively use them in your app using the plugins mechanism.

IMPORTANT: You MUST use -trimpath when building your app, when using plugins (the Plugins.make file already includes it, but the app that uses them also needs it).

Info

Each individual parser (as well as the bulk loader) offers an Info() function which can be used to retrieve information about a parser. It exposes it's entry from grammars.json either raw (as a string holding the JSON encoded entry) or as an object (only available in bulk mode).

The returned Grammar type implements Stringer so it should give a nice summary when printed (to screen or logs, etc.).

Parser Code Changes

For transparency, any and all changes made to the parsers' (and, to be clear, I include in this term ALL the files coming from parsers, not just parser.c) files are documented below.

For one thing ALL changes are fully automated (any exceptions are noted below), no change is ever made manually, so inspecting the automation should give you a clear picture of all the changes performed to the code, changes which are detailed below:

  • the include paths are rewritten to use a flat structure (i.e. "tree_sitter/parser.h" becomes "parser.h"); This is needed so that the files are part of the same package, plus it also makes automation simpler;
  • for unison the scanner file includes maybe.c which causes cgo to include the file twice and throw duplicate symbols error. The solution chosen was to copy the content of the included file into the scanner file and set the included file to zero bytes; this way all the code is in one file and the compilation is possible;
  • for parsers that include a tag.h file: the TAG_TYPES_BY_TAG_NAME variable clashes between them (when those parsers are all included into one app). The solution chosen was to rename the variable by adding the _<lang> suffix, i.e., we currently have:
    • TAG_TYPES_BY_TAG_NAME_astro;
    • TAG_TYPES_BY_TAG_NAME_html;
    • TAG_TYPES_BY_TAG_NAME_svelte;
    • TAG_TYPES_BY_TAG_NAME_vue;
  • for parsers that define serialize(), deserialize(), scan() (and a few others) (i.e. org, beancount, html & a few others): the offending identifiers are renamed by appending the _<lang> suffix to them (i.e. serialize -> serialize_org, etc.); See the putFile() function in internal/automation/main.go for details;
  • some parsers' grammar.js files were not yet updated to work with latest TreeSitter, in which case we hot patch them before regenerating the parser. See the replMap in downloadGrammar() function;
  • EXCEPTION MANUAL CHANGE: poe_filter/parser.c is currently invalid upon generation (is missing a right paren at line 6216, col 50 - I added it manually).

Versions & Status

We are currently aligned with TreeSitter v0.22.2: go-tree-sitter-bare v1.1.1 uses v0.22.2 (and this project uses that latest version of go-tree-sitter-bare) and the included package.json (which is used for regenerating grammars) is using the same version.

As for the parsers in this repo:

  • almost of the parsers are now regenerating the parser.{c,h} files using the latest tree-sitter (they are marked with a heavy checkmark in PARSERS.md and they do NOT have the skip flag set in grammars.json); this is preferred way going further;
  • a few parsers could not yet be regenerated locally. They can still be used just fine, but they will use whatever parser.{c,h} files they have in the upstream repo, which may or may not have been compiled with the latest tree-sitter version (most likely not, or we'd also be able to regenerate them). In general they should work too though, that's how this project started after all, with downloaded files only.

So there it is, we try to converge towards using the same tree-sitter version everywhere, and keeping up with it too.

TODO

  • need to update the parsers automation to create a Go module for a new parser automatically;
  • need to be able to auto-delete files deleted remotely (i.e. if a scanner.c or whatever is deleted from the source repo, we should also be deleting it locally).

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetLanguage added in v1.5.11

func GetLanguage(lang string) func() *sitter.Language

Lang returns the corresponding TS language function for name. Language name must follow the TS convention (lowercase, letters only).

Example
package main

import (
	"context"
	"fmt"

	"github.com/alexaandru/go-sitter-forest/lua"
	sitter "github.com/alexaandru/go-tree-sitter-bare"
)

func main() {
	content := []byte("print('It works!')\n")
	node, err := sitter.ParseCtx(context.TODO(), content, lua.GetLanguage())
	if err != nil {
		panic(err)
	}

	// Do something interesting with the parsed tree...
	fmt.Println(node)
}
Output:

(chunk (function_call name: (identifier) arguments: (arguments (string content: (string_content)))))

func Info

func Info(lang string) *grammar.Grammar
Example

This is still an example for GetLanguage, but I cannot have two ExampleGetLanguage in the same package.

package main

import (
	"context"
	"fmt"

	forest "github.com/alexaandru/go-sitter-forest"
	sitter "github.com/alexaandru/go-tree-sitter-bare"
)

func main() {
	content := []byte("print('It works!')")
	parser := sitter.NewParser()
	parser.SetLanguage(forest.GetLanguage("lua")())

	tree, err := parser.ParseCtx(context.TODO(), nil, content)
	if err != nil {
		panic(err)
	}

	// Do something interesting with the parsed tree...
	fmt.Println(tree.RootNode())
}
Output:

(chunk (function_call name: (identifier) arguments: (arguments (string content: (string_content)))))

func SupportedLanguages added in v1.5.11

func SupportedLanguages() []string

Types

This section is empty.

Directories

Path Synopsis
ada module
agda module
angular module
apex module
arduino module
asm module
astro module
authzed module
awk module
bash module
bass module
beancount module
bibtex module
bicep module
bitbake module
blueprint module
c module
c_sharp module
cairo module
calc module
capnp module
cel module
chatito module
clojure module
cmake module
comment module
commonlisp module
cooklang module
corn module
cpon module
cpp module
crystal module
css module
csv module
cuda module
cue module
d module
dart module
devicetree module
dhall module
diff module
disassembly module
djot module
dockerfile module
dot module
doxygen module
dtd module
earthfile module
ebnf module
eds module
eex module
elixir module
elm module
elsa module
elvish module
erlang module
facility module
faust module
fennel module
fidl module
firrtl module
fish module
foam module
forth module
fortran module
fsh module
func module
fusion module
gdscript module
gdshader module
git_config module
git_rebase module
gitcommit module
gitignore module
gleam module
glimmer module
glsl module
gn module
gnuplot module
go module
gomod module
gosum module
gotmpl module
gowork module
gpg module
graphql module
groovy module
gstlaunch module
hack module
hare module
haskell module
hcl module
heex module
helm module
hjson module
hlsl module
hlsplaylist module
hocon module
hoon module
html module
htmldjango module
http module
hurl module
hyprlang module
idl module
ini module
inko module
internal
ispc module
janet_simple module
java module
javascript module
jq module
jsdoc module
json module
json5 module
jsonc module
jsonnet module
julia module
just module
kconfig module
kdl module
kotlin module
koto module
kusto module
lalrpop module
latex module
ledger module
leo module
linkerscript module
liquid module
liquidsoap module
llvm module
lua module
luadoc module
luap module
luau module
m68k module
make module
markdown module
matlab module
menhir module
mermaid module
meson module
mlir module
muttrc module
nasm module
nickel module
nim module
ninja module
nix module
norg module
nqc module
objc module
objdump module
ocaml module
ocamllex module
odin module
org module
pascal module
passwd module
pem module
perl module
php module
php_only module
phpdoc module
pioasm module
po module
pod module
poe_filter module
pony module
printf module
prisma module
promql module
properties module
proto module
prql module
psv module
pug module
puppet module
purescript module
pymanifest module
python module
ql module
qmldir module
qmljs module
query module
r module
racket module
rasi module
rbs module
re2c module
readline module
regex module
rego module
requirements module
risor module
rnoweb module
robot module
roc module
ron module
ruby module
rust module
scala module
scfg module
scheme module
scss module
slang module
slint module
smali module
smithy module
snakemake module
solidity module
soql module
sosl module
sourcepawn module
sparql module
sql module
sqlite module
squirrel module
ssh_config module
starlark module
strace module
styled module
surface module
svelte module
swift module
sxhkdrc module
systemtap module
t32 module
tablegen module
tact module
tcl module
teal module
templ module
textproto module
thrift module
tiger module
tlaplus module
tmux module
todotxt module
toml module
tsv module
tsx module
turtle module
twig module
typescript module
typespec module
typoscript module
typst module
udev module
ungrammar module
unison module
usd module
uxntal module
v module
vala module
vento module
verilog module
vhs module
vim module
vimdoc module
vue module
wgsl module
wgsl_bevy module
wing module
wit module
xcompose module
xml module
yaml module
yang module
yuck module
zathurarc module
zig module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL