Abstract: In untrimmed video tasks, identifying temporal boundaries in videos is crucial for temporal video grounding. With the emergence of multimodal large language models (MLLMs), recent studies ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results