Abstract
Automatically generating meaningful descriptions for images has recently emerged as an important area of re-search. In this direction, a nearest-neighbour based gener-ative phrase prediction model (PPM) proposed by (Guptaet al. 2012) was shown to achieve state-of-the-art results on PASCAL sentence dataset, thanks to the simultaneous use of three different sources of information (i.e. visual clues,corpus statistics and available descriptions). However, they do not utilize semantic similarities among the phrases that might be helpful in relating semantically similar phrases during phrase relevance prediction. In this paper, we ex-tend their model by considering inter-phrase semantic sim-ilarities. To compute similarity between two phrases, we consider similarities among their constituent words deter-mined using Word Net. We also re-formulate their objective function for parameter learning by penalizing each pair of phrases unevenly, in a manner similar to that in structured predictions. Various automatic and human evaluations are performed to demonstrate the advantage of our “semantic phrase prediction model” (SPPM) over PPM.