_id,doi,title
14962,10.48550/arXiv.2309.09858,Unsupervised open-vocabulary object localization in videos
