-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
codeprep/codeprep/pipeline/to_repr.py
Line 60 in f5a35b6
| def preprocess_and_write(params: Tuple[bytes, bytes, PrepConfig, str], bpe_data: Optional[BpeData] = None): |
eh, I am working with this repository. on windows
I find when I use unicode like chinese in path like "./文档/", to_repr.py is likely to encode this string to bytes, this cause Exception.
unicode bytes like b'\xe6\x96\x87\xe6\xa1\xa3.py' which means ”文档.py“ , in Windows, it means a recursive folder. And python built-in function os.path.basename will not recognize this. When writing MetaData to file, this will raise a FileOrDirNotExist Exception
actually, I change the path to str to avoid this exception, but I dont know if there are any other side effects
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels