bytepairencoding

Here are 21 public repositories matching this topic...

franciszekparma / GBPET

GPT-style language model with Byte Pair Encoding tokenizer, built from scratch in PyTorch.

python nlp machine-learning deep-learning pytorch transformer gpt language-model from-scratch bytepairencoding bpe-tokenizer

Updated May 5, 2026
Python

deepanprabhu / fastbpe

Star

Java library implementing Byte-Pair Encoding Tokenization

bpe bytepairencoding llm

Updated May 17, 2023
Java

rraghavkaushik / smol-bpe-tokenizer

Star

A lightweight, from-scratch implementation of Byte Pair Encoding (BPE) tokenization in Python.

natural-language-processing tokenizer bpe byte-pair-encoding bytepairencoding llms

Updated Jul 8, 2025
Python

10-OASIS-01 / BPEtokenizer

Star

This project implements a tokenizer based on the Byte Pair Encoding (BPE) algorithm, with additional custom tokenizers, including one similar to the GPT-4 tokenizer.

natural-language-processing tokenizer bytepairencoding

Updated Jun 8, 2025
Python

willxxy / superbpe

Star

[Rust] Unofficial implementation of "SuperBPE: Space Travel for Language Models" in Rust

rust rust-lang bpe bytepairencoding bpe-tokenizer

Updated Apr 14, 2025
Rust

vatsalsaglani / BytePairEncoding

Star

A python package to build a corpus vocabulary using the byte pair methodology and also a tokenizer to tokenize input texts based on the built vocab.

nlp natural-language-processing tokenizer vocabulary nlp-library vocabulary-builder natural-language-understanding subword-units bpe bytepairencoding subwordtokenization subwordtokens

Updated May 21, 2020
Python

swanshiv / varna_marathi_tokenizer

Star

From-scratch Marathi BPE tokenizer with Flask API and web interface for encoding/decoding Marathi text using Byte Pair Encoding algorithm.

natural-language-processing tokenizer computational-linguistics bytepairencoding large-language-models marathi-tokenizor

Updated Dec 22, 2025
Jupyter Notebook

AlgoBrother / MayaTok-BPE

Star

MayaTok is a Byte Pair Encoding based Tokenizer.

nlp rust-lang tokenization pyo3-rust-bindings bytepairencoding

Updated Apr 17, 2026
Rust

gxstxxv / BPE

Star

Byte Pair Encoding (BPE)

python algorithm bpe bytepairencoding

Updated Apr 27, 2025
Python

art-test-stack / tokenizer

Star

A web app to compare pre-built or self-built tokenizers

ai tokenizer webapp language-model bytepairencoding llms

Updated Sep 18, 2024
Python

LahiaOmar / tokens_viewer

Star

Strings Tokenization with Byte Pair Encoding (BPE).

tokenization bpe bytepairencoding llms

Updated May 29, 2024
TypeScript

madhu102938 / BPE-CBOW

Star

implementation of BPE algorithm and training of the tokens generated

word2vec cbow bytepairencoding tokenizer-nlp

Updated Jul 16, 2024
Python

dbtreasure / zig-bpe

Star

Byte Pair Encoding (BPE) in the Zig programming language (0.13.0)

zig bytepairencoding tiktoken

Updated Sep 12, 2024
Zig

mohsenfayyaz / nlp-course-ut

Star

Natural Language Processing course assignments @ University of Tehran

natural-language-processing naive-bayes transformer lstm rnn tokenization bytepairencoding

Updated Jun 24, 2022
Jupyter Notebook

jamylak / bytepairencoding

Star

Byte Pair Encoding Visualizer

visualization raylib bytepairencoding

Updated Apr 15, 2026
C

shivendrra / tokenizers

Star

self made byte-pair-encoding tokenizer

tokenizer tokenization bytepairencoding llm bpe-tokenizer

Updated Jan 20, 2025
Python

JunhoKim94 / Transformer

Star

This repository is reimplementation of Transformer model which was introduced in 2017 NeurIPS paper "Attention is all you need"

transformer wmt-15 self-attention bytepairencoding

Updated Jun 29, 2020
Python

ReshiAdavan / Thoth

Star

tokenizer for large-scale language models (GPT, Claude, Llama, etc.)

python rust natural-language-processing tokenizer gpt-2 sentencepiece bytepairencoding gpt-4 tiktoken llama2

Updated Jun 29, 2024
Python

Hords01 / Data_Mining

Star

TF-IDF Calculation

python data text-mining news tokenizer turkish tf-idf bpe bytepairencoding bpe-tokenizer

Updated Apr 29, 2025
Python

sumony2j / Simple-BPE-Tokenizer

Star

A pure Python implementation of Byte Pair Encoding (BPE) tokenizer. Train on any text, encode/decode with saved models, and explore BPE tokenization fundamentals.

python nlp ai bytecode deep-learning tokenizer artificial-intelligence transformer openai scratch gpt tokenization andrej-karpathy bpe gpt-2 bytepairencoding gpt-4 tokenizer-nlp llm

Updated May 15, 2025
Python

Improve this page

Add a description, image, and links to the bytepairencoding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bytepairencoding topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytepairencoding

Here are 21 public repositories matching this topic...

franciszekparma / GBPET

deepanprabhu / fastbpe

rraghavkaushik / smol-bpe-tokenizer

10-OASIS-01 / BPEtokenizer

willxxy / superbpe

vatsalsaglani / BytePairEncoding

swanshiv / varna_marathi_tokenizer

AlgoBrother / MayaTok-BPE

gxstxxv / BPE

art-test-stack / tokenizer

LahiaOmar / tokens_viewer

madhu102938 / BPE-CBOW

dbtreasure / zig-bpe

mohsenfayyaz / nlp-course-ut

jamylak / bytepairencoding

shivendrra / tokenizers

JunhoKim94 / Transformer

ReshiAdavan / Thoth

Hords01 / Data_Mining

sumony2j / Simple-BPE-Tokenizer

Improve this page

Add this topic to your repo