Instance segmentor first using sam model to get all obj's mask of the input image. Second using clip model to classify each mask with both image features and your ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results