Following up on #1942 which added LetterBijectionConverter and DigitBijectionConverter.
The bijection attack paper (arXiv:2410.01294) describes a third bijection type where each English letter maps to a randomly sampled distinct token from the target model's tokenizer vocabulary. This mode was discussed during the #1942 review and agreed to be tracked as a separate follow-up.
The abstract base class introduced in #1942 makes this straightforward to add as a new subclass:
class TokenBijectionConverter(BijectionConverter):
def init(self, *, tokenizer, mapping=None, seed=None):
...
def _generate_mapping(self, rng):
# sample 26 distinct tokens from tokenizer vocabulary
# map each letter to a token string
...
Main consideration is the tokenizer dependency, likely HuggingFace tokenizers. Happy to take this on once #1942 lands.
Following up on #1942 which added LetterBijectionConverter and DigitBijectionConverter.
The bijection attack paper (arXiv:2410.01294) describes a third bijection type where each English letter maps to a randomly sampled distinct token from the target model's tokenizer vocabulary. This mode was discussed during the #1942 review and agreed to be tracked as a separate follow-up.
The abstract base class introduced in #1942 makes this straightforward to add as a new subclass:
class TokenBijectionConverter(BijectionConverter):
def init(self, *, tokenizer, mapping=None, seed=None):
...
Main consideration is the tokenizer dependency, likely HuggingFace tokenizers. Happy to take this on once #1942 lands.