Skip to content

Could this be made to work per token with Coca model? #7

@Thomas2419

Description

@Thomas2419

I saw in the scripts you already have support for Coca models. Well it seems like a natural extension to make it be able to generate a caption, while dictating via the attention maps what aspects of the image corresponded to generating what tokens. Is this potentially on the radar or a method that could be integrated?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions