Abstract: In untrimmed video tasks, identifying temporal boundaries in videos is crucial for temporal video grounding. With the emergence of multimodal large language models (MLLMs), recent studies ...