tokenscript --from english --to vectorized-tokens
Write English,
get vectorized-tokens.
the pre-tokeniser tokeniser you never knew you needed
Your LLM already has a tokeniser. Tokenscript is the pre-tokeniser tokeniser — it tokenises your English before your tokeniser tokenises it.
Is this necessary? No. Is it load-bearing in any pipeline? Also no. Will it look great in your next arXiv preprint? Absolutely.
no spam. no product. possibly no tokens.