Skip to content

why use byte not str while in path (Windows) #7

@lyksdu

Description

@lyksdu

def preprocess_and_write(params: Tuple[bytes, bytes, PrepConfig, str], bpe_data: Optional[BpeData] = None):

eh, I am working with this repository. on windows

I find when I use unicode like chinese in path like "./文档/", to_repr.py is likely to encode this string to bytes, this cause Exception.

unicode bytes like b'\xe6\x96\x87\xe6\xa1\xa3.py' which means ”文档.py“ , in Windows, it means a recursive folder. And python built-in function os.path.basename will not recognize this. When writing MetaData to file, this will raise a FileOrDirNotExist Exception

actually, I change the path to str to avoid this exception, but I dont know if there are any other side effects

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions