Setting response=false in SemanticVAD cause model to not responding #435
-
OverviewAfter recent changes (8.8.7), the code that activates SemanticVAD breaks real-time conversation. When Response is set to To ReproducePlease check this PR for example #433 |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
This is by design. It's not supposed to respond unless you deliberately request for a response: await session.SendAsync(new CreateResponseRequest(), destroyCancellationToken); |
Beta Was this translation helpful? Give feedback.
-
|
I'll leave this issue open so we can talk through the use case a bit more to make sure that we can handle your concern appropriately. I went ahead and closed the PR, But I'll look at it some more and see if there's a more appropriate way to do what you're trying to do. |
Beta Was this translation helpful? Give feedback.
-
|
The case is that the model sometimes replays with a wall of text, and we need an ability to interrupt it. I have tried SemanticVAD, it is good enough, however, after the user asks to "stop", the model replies with confirmation like "okay, I understand that you want me to stop, but in case you need anything, you can always ask me... blah blah... ". That makes a user interrupt again and try to stop it. Then the model repeats again, helping with the phrase, the user asks to stop again, and you get an echo that frustrates users. Getting to my question: I thought that setting the response flag to false would instruct the model not to respond to user interruptions. And when the response is set to true, will it always confirm the user's commands? Am I right? |
Beta Was this translation helpful? Give feedback.
-
You could make this a function tool call where you update the session config with
No this is only for triggering responses from the model after the server VAD has detected that voice activity has stopped.
Yes a new response will be generated after voice activity has been detected and ceased. As for interruptions, it just tells the model that the user is talking over them, and to stop talking itself. It is up to you to handle stopping the audio playback and sending an event for the timestamp at which the audio playback stopped so that it will truncate the model transcription properly. |
Beta Was this translation helpful? Give feedback.
This is by design.
It's not supposed to respond unless you deliberately request for a response: