Skip to content

fix: SentenceChunker.__init__() got an unexpected keyword argument 'tokenizer_or_token_counter' #578

@JianlongCao

Description

@JianlongCao

Pre-submission checklist | 提交前检查

  • I have searched existing issues and this hasn't been mentioned before | 我已搜索现有问题,确认此问题尚未被提及
  • I have read the project documentation and confirmed this issue doesn't already exist | 我已阅读项目文档并确认此问题尚未存在
  • This issue is specific to MemOS and not a general software issue | 该问题是针对 MemOS 的,而不是一般软件问题

Bug Description | 问题描述

There is compatibility issue introduced by chonkie 1.4.0, where the API and parameter changes cause memory initialization to fail with the following error:
SentenceChunker.init() got an unexpected keyword argument 'tokenizer_or_token_counter'

How to Reproduce | 如何重现

  1. pip install latest memos package
  2. running ollama backend with config like
{
  "llm": {
    "backend": "ollama",
    "config": {
      "model_name_or_path": "llama3.3:70b",
      "temperature": 0.0,
      "remove_think_prefix": true,
      "max_tokens": 8192,
      "api_base": "http://localhost:11434"
    }
  },
  "embedder": {
    "backend": "ollama",
    "config": {
      "model_name_or_path": "nomic-embed-text:latest",
      "api_base": "http://localhost:11434"
    }
  },
  "chunker": {
    "backend": "sentence",
    "config": {
      "tokenizer_or_token_counter": "gpt2",
      "chunk_size": 512,
      "chunk_overlap": 128,
      "min_sentences_per_chunk": 1
    }
  }
}

Notes: even remove the tokenizer_or_token_counter in config it will also cause the error, because in latest code it iwll add this filed.
3. running

Environment | 环境信息

  • python 3.12
  • memos pip package 1.1.3
  • linux

Additional Context | 其他信息

No response

Willingness to Implement | 实现意愿

  • I'm willing to implement this myself | 我愿意自己解决
  • I would like someone else to implement this | 我希望其他人来解决

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingPending items to be addressed | 待解决事项。

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions