Long-Video Understanding · Multimodal Agents

Multimodal agent researchfor long-form video understanding

I am Meng Qianke, a master's student at Hangzhou Dianzi University. My work centers on memory organization, video condensation, multi-step reasoning, and evaluation harnesses for long-form video understanding, aiming to make multimodal agents more reliable over long temporal contexts.

View research & projects Download résumé

Current focus: VideoARM, progressive video condensation, and long-video experiment loops

VideoARM

Hierarchical memory reasoning

Video Condensation

Progressive video condensation

Harness

Long-video experiment tooling

Research Focus

Long-Video Understanding, Multimodal Agents, and 3D Visual Grounding

My research focuses on multimodal large models and agent systems, especially long-form video understanding, video QA, hierarchical memory, MLLM-agent reasoning, and 2D-3D visual grounding. Recent outputs include CVPR 2026 and ICME 2026 papers on long-form video understanding and an arXiv preprint on 3D visual grounding.

Long-Video Understanding

Study video QA, event condensation, temporal memory, and multi-step reasoning for understanding long-duration video content.

MLLM Agent Systems

Build multimodal agents with tool use, memory management, planning, and environment interaction for complex vision-language tasks and research workflows.

3D Visual Grounding

Investigate 2D-3D mappings, zero-shot 3D visual grounding, and cross-view consistency for open-world spatial semantic understanding.

Research Output

View research →

Paper2026
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding (CVPR 2026)
A hierarchical-memory and agentic-reasoning framework for long-form video understanding, accepted to CVPR 2026. The work focuses on information compression, memory organization, and multi-step reasoning for long-video QA.
Long-Form Video UnderstandingAgentic ReasoningHierarchical MemoryVideo QA
arXiv GitHub View Details
Paper2026
Progressive Video Condensation with MLLM Agent for Long-form Video Understanding (ICME 2026)
A study on progressive video condensation and MLLM-agent reasoning for long-form video understanding, accepted to ICME 2026.
Long-Form Video UnderstandingMLLM AgentVideo CondensationMultimodal
arXiv
Preprint2026
Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding
A robust zero-shot 3D visual grounding method based on multiple consistent 2D-3D mappings. Qianke Meng is the third author; the paper is available on arXiv.
3D Visual Grounding2D-3D MappingZero-ShotMultimodal
arXiv
Contest2025
Aesthetic Feature Modeling of Classical Jiangnan Gardens (National First Prize, China Graduate Mathematical Modeling Contest)
A mathematical modeling study of aesthetic features and spatial layout patterns in classical Jiangnan gardens, awarded National First Prize in the China Graduate Mathematical Modeling Contest.
Mathematical ModelingAesthetic AnalysisSpatial Modeling
Download PDF View Details

Featured Projects

View all

VideoARM
GitHub
A long-form video understanding research project on hierarchical memory, agentic reasoning, and long-video QA, corresponding to the CVPR 2026 paper.
Long VideoMLLM AgentResearch
LongVideo Exploration
An active exploration line for long-video understanding, focusing on coarse event modeling, visual human evaluation, video memory, and multi-agent experiment loops.
Video UnderstandingEvaluationAgent
VideoARM-MCP
An MCP service-wrapper exploration for connecting VideoARM-style long-video understanding capabilities to general agent workflows.
MCPAgent ToolingVideo
DingTalk GPU Monitor
GitHub
A lightweight NVIDIA GPU utilization and memory monitor without admin privileges, with DingTalk alert notifications.
ShellDevOpsMonitoring

Experience

●
Master's Student · Hangzhou Dianzi University
• Computer Technology
• Media Intelligence Lab (MIL)
• Research interests: multimodal large models, agent systems, long-video understanding, and video QA
Sep 2024 - Present
●
Undergraduate Student · Henan University
• Computer Science and Technology
• Bachelor of Engineering
Sep 2020 - Jun 2024