Wrong and unsuppressable print when instantiating BPE

I am running Python code that is of the form

```python
from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer
from tokenizers.models import BPE

vocab = {"a": 5, "b": 6, "ab": 7}
merges = [("a","b")]

backend_of_backend_of_backend = BPE(vocab=vocab, merges=merges, dropout=None)
backend_of_backend            = Tokenizer(model=backend_of_backend_of_backend)
backend                       = PreTrainedTokenizerFast(tokenizer_object=backend_of_backend)
```

The line `BPE(vocab=vocab, merges=merges, dropout=None)` has nothing to do with serialisation. Yet, when I run it, an unwanted print
```
The OrderedVocab you are attempting to save contains holes for indices [0, 1, 2, 3, 4], your vocabulary could be corrupted!
```
appears in my console, which seems to come from

https://github.com/huggingface/tokenizers/blob/f7db48f532b3d4e3c65732cf745fe62863cbe5fa/tokenizers/src/models/mod.rs#L53-L56

Not only is the print wrong (I am not trying to **save** anything), but also, it cannot be suppressed by redirecting `stdout` and `stderr` in Python. 

`println!` does not belong in low-level code, so at the very least, we need a way to disable it. But besides, what is this print even for, given that it says something about **saving** when we are **loading** a tokenizer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong and unsuppressable print when instantiating BPE #1913

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if !holes.is_empty() {
	warn!("The OrderedVocab you are attempting to save contains holes for indices {holes:?}, your vocabulary could be corrupted!");
	println!("The OrderedVocab you are attempting to save contains holes for indices {holes:?}, your vocabulary could be corrupted!");
	}

Wrong and unsuppressable print when instantiating BPE #1913

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions