FEAT: Add TokenBijectionConverter

Following up on #1942 which added LetterBijectionConverter and DigitBijectionConverter.

The bijection attack paper (arXiv:2410.01294) describes a third bijection type where each English letter maps to a randomly sampled distinct token from the target model's tokenizer vocabulary. This mode was discussed during the #1942 review and agreed to be tracked as a separate follow-up.

The abstract base class introduced in #1942 makes this straightforward to add as a new subclass:

class TokenBijectionConverter(BijectionConverter):
    def __init__(self, *, tokenizer, mapping=None, seed=None):
        ...

    def _generate_mapping(self, rng):
        # sample 26 distinct tokens from tokenizer vocabulary
        # map each letter to a token string
        ...

Main consideration is the tokenizer dependency, likely HuggingFace tokenizers. Happy to take this on once #1942 lands.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add TokenBijectionConverter #2023

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FEAT: Add TokenBijectionConverter #2023

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions