skip to content
PAN  Xu
Xu Pan    潘旭
Incoming Research Associate at Nanyang Technological University, Singapore

Hello World!

Xu Pan

To Shape the Future of Intelligence Beyond the Digital World.

Hi, I am Xu Pan, an incoming Research Associate at the Perception and Embodied Intelligence (PINE) Lab, supervised by Prof. Ziwei Wang, within the School of Electrical and Electronic Engineering (EEE), Nanyang Technological University (NTU). My research focuses on embodied intelligence, with an emphasis on spatial representations that enable generalizable perception-action coupling for embodied agents.

I study how structure-aware representations can support robust interaction, enabling agents to generalize across diverse environments, viewpoints, and embodiments, with a particular interest in agent-centric policy learning and spatial reasoning.

Previously, I received my B.Eng. and M.Sc. degrees from Wuhan University, where I worked at the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS). I also interned at Baidu, and was a remote Research Assistant at the Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), working with Dr. Xingrui Yu.

More broadly, I aim to develop scalable and generalizable learning frameworks that advance spatial intelligence and enable embodied systems to operate reliably in complex real-world settings.

News

SHOW 4 MORE
  • Jul 2025
    Attended the 2025 Annual Academic Conference on Photogrammetry and Remote Sensing, CSGPC in Kunming, China.
  • Dec 2024
    Began a research internship at Baidu in Shenzhen, supervised by Dr. Yan Zhang, exploring frontier text-to-image and text-to-video generation.
  • Jul 2024
    Began collaboration on the SCoDe project under the guidance of Dr. Zimin Xia.
  • Sep 2023
    Enrolled in the Master's program at the State Key Lab. LIESMARS, Wuhan University, as a recommended exemption student, under the supervision of Prof. Xianwei Zheng.

Experiences

Acknowledgements:
I’m grateful to my collaborators and mentors for their guidance and support, especially
Prof. Ziwei Wang, Dr. Xingrui Yu (A*STAR), Dr. Zimin Xia (EPFL), Dr. Yan Zhang (Baidu), Prof. Xianwei Zheng (WHU), Prof. Hanjiang Xiong (WHU),
and my colleagues/peers including
Zhenglin Wan (NUS), Jiashen Huang (NTU), Ziqong Lu (HKU), Qiyuan Ma (WHU), Jintao Zhang (WHU), Chenyu Zhao (WHU), He Chen (WHU),
and others I’ve had the pleasure to work with.

Selected Publications

VIEW ALL

SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models

Xu Pan, Zhenglin Wan, Xingrui Yu*, Xianwei Zheng, Youkai Ke, Ming Sun, Rui Wang, Ziwei Wang, Ivor Tsang

Under Review Citations -- 2026
Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation, but reinforcement learning (RL) fine-tuning often degrades generalization under spatial distribution shifts. We analyze flow-matching VLA policies and identify the collapse of spatial inductive bias as a key factor limiting robust transfer. To address this, we propose SA-VLA, which explicitly grounds VLA policies in spatial structure by integrating implicit spatial representations, spatially-aware step-level dense rewards, and SCAN, a spatially-conditioned exploration strategy tailored for flow-matching policies. This principled alignment mitigates policy over-specialization and preserves zero-shot generalization to more complex tasks. Experiments on challenging multi-object and cluttered benchmarks demonstrate that SA-VLA enables stable RL fine-tuning and substantially more robust, transferable behaviors.

Scale-aware Co-visible Region Detection for Image Matching

Xu Pan, Zimin Xia, Xianwei Zheng*

ISPRS Journal of Photogrammetry and Remote Sensing JCR Q1 · IF 12.2 Citations -- 2025
Matching images with significant scale differences remains a persistent challenge in photogrammetry and remote sensing. The scale discrepancy often degrades appearance consistency and introduces uncertainty in keypoint localization. While existing methods address scale variation through scale pyramids or scale-aware training, matching under significant scale differences remains an open challenge. To overcome this, we address the scale difference issue by detecting co-visible regions between image pairs and propose SCoDe (Scale-aware Co-visible region Detector), which both identifies co-visible regions and aligns their scales for highly robust, hierarchical point correspondence matching. Specifically, SCoDe employs a novel Scale Head Attention mechanism to map and correlate features across multiple scale subspaces, and uses a learnable query to aggregate scale-aware information of both images for co-visible region detection. In this way, correspondences can be established in a coarse-to-fine hierarchy, thereby mitigating semantic and localization uncertainties. Extensive experiments on three challenging datasets demonstrate that SCoDe outperforms state-of-the-art methods, improving the precision of a modern local feature matcher by 8.41%. Notably, SCoDe shows a clear advantage when handling images with drastic scale variations.

SAMatcher: Co-Visibility Modeling with Segment Anything for Robust Feature Matching

Xu Pan, Qiyuan Ma, Mingyue Dong, He Chen, Wei Ji, Xianwei Zheng*

IEEE Transactions on Image Processing (Under Review) JCR Q1 · IF 13.7 2026
Reliable correspondence estimation is a fundamental problem in image processing, underpinning a wide range of applications such as Structure from Motion, visual localization, and image registration. While recent learning-based approaches have substantially improved the representation capability of local features, most methods still operate primarily at the pixel or patch level. As a result, they lack explicit mechanisms to model regions that are jointly visible across views, leading to brittle behavior when spatial support, semantic context, or visibility patterns vary between images. We propose SAMatcher, a novel feature matching framework that formulates correspondence estimation through explicit co-visibility modeling. Rather than directly establishing point-wise correspondences from local appearance, SAMatcher first predicts consistent co-visible region masks and bounding boxes within a shared cross-view representation space, serving as structured priors to guide and regularize matching. The framework builds upon the Segment Anything Model (SAM) and introduces a symmetric cross-view interaction mechanism that treats paired images as interacting token sequences, enabling bidirectional semantic alignment and the discovery of jointly supported regions. To jointly optimize region segmentation and geometric localization, we introduce a unified supervision scheme that combines point-sampled mask learning with box regression and mask--box consistency constraints, enforcing cross-view coherence during training. Extensive experiments on challenging benchmarks demonstrate that SAMatcher significantly improves robustness under large-scale geometric and viewpoint variations. These results suggest that monocular visual foundation models can be systematically extended to multi-view correspondence estimation through explicit co-visibility modeling, providing a new perspective on structured representation learning for image matching.

Projects

SA-VLA

2026

A research project on robust RL adaptation of flow-matching–based VLA models for robotic manipulation, focusing on generalization under distribution shifts in challenging benchmarks.

Vision-Language-Action Model Robotic Manipulation Flow-Matching Reinforcement Learning

Co-visibility Guided Image Matching

2025

A research project on robust image matching in robot vision, photogrammetry and remote sensing, using explicit co-visibility modeling to handle extreme scale and viewpoint variations.

Co-visibility Image Matching 3D Vision Segmentation Photogrammetry SCoDe SAMatcher

GNDAS

2022

The GNDASystem (Global Natural Disaster Assessment System) is a web-based geographic information system application designed for the analysis and assessment of natural disasters.

Natural Disasters Geographic Information System (GIS)

I2RSI

2022

The I2RSI System (Intelligent Interpretation of Remote Sensing Images) is a web-based application for remote sensing image interpretation, powered by the Baidu PaddlePaddle deep learning framework.

Remote Sensing Interpretation Deep Learning

Honors & Awards

  • Outstanding Graduating Graduate Student

    Wuhan University

    2026.04
  • Outstanding Graduate Student

    Wuhan University

    2025.11
  • Graduate Academic Excellence Scholarship

    Wuhan University

    2025.11
  • Graduate Academic Excellence Scholarship

    Wuhan University

    2024.11
  • Outstanding Student Leader

    Wuhan University

    2024.10
SHOW 10 MORE
  • Outstanding Student Club Leader

    Wuhan University

    2024.08
  • Active Contributor to Social Activities

    School of Remote Sensing and Information Engineering

    2023.05
  • Wuhan University Class C Scholarship

    Wuhan University

    2022.10
  • National Second Prize

    China Software Cup College Student Software Design Competition

    2022.08
  • Honorable Mention

    Mathematical Contest In Modeling

    2022.03
  • Third Prize

    Asia and Pacific Mathematical Contest in Modeling

    2021.12
  • First Prize in Hubei Division

    China Undergraduate Mathematical Contest in Modeling

    2021.10
  • Outstanding Student

    Wuhan University

    2021.09
  • Bronze Medal

    China Collegiate Algorithm Design & Programming Challenge Contest

    2021.03
  • Second Prize in Final

    Translation & Interpreting Contest of Hubei Province

    2020.05

Academic Service

Personal Philosophy

I follow Stoic philosophy. Life is a joyful ascent: a true mountaineer delights in the climb itself, not just the summit.

Thou sufferest this justly: for thou choosest rather to become good to-morrow than to be good to-day.
— Marcus Aurelius, Meditations 8.22

I also resonate with the spirit of Slow Science.

We live in an age tyrannized by efficiency, outcomes, and speed, to the point that nothing lasts and nothing leaves a deep impression. In the midst of noisy bubbles and short-lived hype, I hope to take time to think carefully, to doubt, to refine, and to do research that is genuinely meaningful and worth remembering.