Multimodal AI Research
Hi, I'm Meng Qianke
Focusing on multimodal large models, agents, and video question answering, with an emphasis on vision-language generative intelligence.
Research Focus
Multimodal Understanding & Generation
Researching multimodal large models, agents, and video question answering, with a focus on vision-language generative intelligence and cutting-edge methods in cross-modal understanding, autonomous reasoning, and long-form video analysis.
Multimodal Large Models
Investigate robust alignment mechanisms across vision, language, and speech, combining contrastive learning with instruction tuning to enhance cross-modal semantic understanding and generation.
Agent Systems
Develop autonomous agents powered by large models, focusing on tool invocation, reasoning, planning, and environmental interaction for automated task execution.
Video Question Answering
Design temporal modeling and keyframe selection strategies for long-video understanding, integrating parameter-efficient tuning and few-shot learning to improve QA generalization and real-time performance.
Research Output
- Paper2024
VideoARM: Agentic Reasoning over Hierarchical Memory for Long-Form Video Understanding (Under review at CVPR)
Proposing an agentic reasoning framework with hierarchical memory for long-form video understanding. Achieved significant improvements on multiple long-form video QA benchmarks.
Video UnderstandingAgentsMultimodalLong Video - Contest2024
Aesthetic Feature Modeling of Classical Jiangnan Gardens (National First Prize, China Graduate Mathematical Modeling Contest)
Mathematical modeling approach to analyze aesthetic features and spatial layout patterns of classical Jiangnan gardens.
Mathematical ModelingMultimodalAesthetic Analysis
Featured Projects
Food Agent
An AI agent for food and beverage R&D teams, supporting end-to-end workflow from ideation to formulation, experimentation, compliance, and evaluation.
PythonAI AgentFoodTechDingTalk GPU Monitor
A Bash script for monitoring NVIDIA GPU utilization and memory without admin privileges, with DingTalk alerts.
ShellDevOpsMonitoring
Experience
- ●2024 - Present

Graduate Student (Master's) · Hangzhou Dianzi University
- • Computer Technology
- • Research focus on multimodal large models
- ●2020 - 2024

Undergraduate (Bachelor's) · Henan University
- • Computer Science and Technology
- • Bachelor of Engineering
Get in touch
Let's connect around multimodal research, collaborations, or personal projects.
Direct Contact
Scan to Connect


