| [1] |
YAN Qiwei, DENG Chubo, LIU Chenglong, et al. ReCon1M: A large-scale benchmark dataset for relation comprehension in remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 4507022. doi: 10.1109/TGRS.2025.3589986.
|
| [2] |
LEE J, MIYANISHI T, KURITA S, et al. CityNav: A large-scale dataset for real-world aerial navigation[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, The United States of America, 2025: 5912–5922. (查阅网上资料, 未找到本条文献信息, 请确认).
|
| [3] |
LIU Chenyang, ZHAO Rui, CHEN Hao, et al. Remote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5633520. doi: 10.1109/TGRS.2022.3218921.
|
| [4] |
XIA Guisong, BAI Xiang, DING Jian, et al. DOTA: A large-scale dataset for object detection in aerial images[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3974–3983. doi: 10.1109/cvpr.2018.00418.
|
| [5] |
LI Ke, WAN Gang, CHENG Gong, et al. Object detection in optical remote sensing images: A survey and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 159: 296–307. doi: 10.1016/j.isprsjprs.2019.11.023.
|
| [6] |
LI Yuxuan, LI Xiang, LI Weijie, et al. SARDet-100K: Towards open-source benchmark and toolkit for large-scale SAR object detection[C]. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 4079.
|
| [7] |
SUO Jiashun, WANG Tianyi, ZHANG Xingzhou, et al. HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection[J]. Scientific Data, 2023, 10(1): 227. doi: 10.1038/s41597-023-02066-6.
|
| [8] |
李春柳, 王水根. 红外海上船舶数据集[EB/OL]. https://openai.raytrontek.com/apply/Sea_shipping.html, 2021. (查阅网上资料,未找到对应的英文翻译信息,请确认补充).
|
| [9] |
刘晴, 徐召飞, 金荣璐, 等. 红外安防数据库[EB/OL]. https://openai.raytrontek.com/apply/Infrared_security.html, 2021. (查阅网上资料,未找到对应的英文翻译信息,请确认补充).
|
| [10] |
刘晴, 徐召飞, 王水根. 红外航拍人车检测数据集[EB/OL]. http://openai.raytrontek.com/apply/Aerial_mancar.html, 2021. (查阅网上资料,未找到对应的英文翻译信息,请确认补充).
|
| [11] |
李钢强, 王建生, 王水根. 双光车载场景数据库[EB/OL]. http://openai.raytrontek.com/apply/Double_light_vehicle.html, 2021. (查阅网上资料,未找到对应的英文翻译信息,请确认补充).
|
| [12] |
李永富, 赵显, 刘兆军, 等. 远海(10–12km)船舶的目标检测数据集[EB/OL]. http://www.core.sdu.edu.cn/info/1133/2174.htm, 2020.(查阅网上资料,未找到对应的英文翻译信息,请确认补充).
|
| [13] |
XIA Guisong, HU Jingwen, HU Fan, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965–3981. doi: 10.1109/TGRS.2017.2685945.
|
| [14] |
CHENG Gong, HAN Junwei, and LU Xiaoqiang. Remote sensing image scene classification: Benchmark and state of the art[J]. Proceedings of the IEEE, 2017, 105(10): 1865–1883. doi: 10.1109/JPROC.2017.2675998.
|
| [15] |
GRATTAFIORI A, DUBEY A, JAUHRI A, et al. The llama 3 herd of models[J]. arXiv: 2407.21783, 2024. doi: 10.48550/arXiv.2407.21783.(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|
| [16] |
LOBRY S, MARCOS D, MURRAY J, et al. RSVQA: Visual question answering for remote sensing data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(12): 8555–8566. doi: 10.1109/TGRS.2020.2988782.
|
| [17] |
CHEN Jun, ZHU Deyao, SHEN Xiaoqian, et al. MiniGPT-v2: Large language model as a unified interface for vision-language multi-task learning[J]. arXiv: 2310.09478, 2023. doi: 10.48550/arXiv.2310.09478.(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|
| [18] |
WU Zhiyu, CHEN Xiaokang, PAN Zizheng, et al. Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding[J]. arXiv: 2412.10302, 2024. doi: 10.48550/arXiv.2412.10302.(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|
| [19] |
PENG Zhiliang, WANG Wenhui, DONG Li, et al. Kosmos-2: Grounding multimodal large language models to the world[J]. arXiv: 2306.14824, 2023. doi: 10.48550/arXiv.2306.14824.(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|
| [20] |
CHEN Keqin, ZHANG Zhao, ZENG Weili, et al. Shikra: Unleashing multimodal LLM's referential dialogue magic[J]. arXiv: 2306.15195, 2023. doi: 10.48550/arXiv.2306.15195.(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|
| [21] |
HU Huiyang, WANG Peijin, FENG Yingchao, et al. RingMo-Agent: A unified remote sensing foundation model for multi-platform and multi-modal reasoning[J]. arXiv: 2507.20776, 2025. doi: 10.48550/arXiv.2507.20776.(查阅网上资料,不确定文献类型及格式是否正确,请确认).
|
| [22] |
ANDERSON P, WU Q, TENEY D, et al. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3674–3683. doi: 10.1109/CVPR.2018.00387.
|
| [23] |
LIU Shuobo, ZHANG Hongsheng, QI Yuankai, et al. AeriaLVLN: Vision-and-language navigation for UAVs[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 15338–15348. doi: 10.1109/ICCV51070.2023.01411.
|
| [24] |
WANG Peijin, HU Huiyang, TONG Boyuan, et al. RingMoGPT: A unified remote sensing foundation model for vision, language, and grounded tasks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5611320. doi: 10.1109/TGRS.2024.3510833.
|
| [25] |
CHENG Qimin, HUANG Haiyan, XU Yuan, et al. NWPU-captions dataset and MLCA-net for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5629419. doi: 10.1109/TGRS.2022.3201474.
|
| [26] |
LU Xiaoqiang, WANG Binqiang, ZHENG Xiangtao, et al. Exploring models and data for remote sensing image caption generation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(4): 2183–2195. doi: 10.1109/TGRS.2017.2776321.
|
| [27] |
QU Bo, LI Xuelong, TAO Dacheng, et al. Deep semantic understanding of high resolution remote sensing image[C]. 2016 International Conference on Computer, Information and Telecommunication Systems (CITS), Kunming, China, 2016: 1–5. doi: 10.1109/CITS.2016.7546397.
|
| [28] |
LI Kun, VOSSELMAN G, and YANG M Y. HRVQA: A visual question answering benchmark for high-resolution aerial images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2024, 214: 65–81. doi: 10.1016/j.isprsjprs.2024.06.002.
|
| [29] |
SUN Xian, WANG Peijin, LU Wanxuan, et al. RingMo: A remote sensing foundation model with masked image modeling[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5612822. doi: 10.1109/TGRS.2022.3194732.
|
| [30] |
HU Huiyang, WANG Peijin, BI Hanbo, et al. RS-vHeat: Heat conduction guided efficient remote sensing foundation model[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, The United States of America, 2025: 9876–9887. (查阅网上资料, 未找到本条文献信息, 请确认).
|
| [31] |
CHANG Hao, WANG Peijin, DIAO Wenhui, et al. Remote sensing change detection with bitemporal and differential feature interactive perception[J]. IEEE Transactions on Image Processing, 2024, 33: 4543–4555. doi: 10.1109/TIP.2024.3424335.
|
| [32] |
LIU Haotian, LI Chunyuan, LI Yuheng, et al. Improved baselines with visual instruction tuning[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 26286–26296. doi: 10.1109/CVPR52733.2024.02484.
|
| [33] |
HU Yuan, YUAN Jianlong, WEN Congcong, et al. RSGPT: A remote sensing vision language model and benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2025, 224: 272–286. doi: 10.1016/j.isprsjprs.2025.03.028.
|
| [34] |
LUO Junwei, PANG Zhen, ZHANG Yongjun, et al. SkySenseGPT: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding[J]. arXiv: 2406.10100, 2024. doi: 10.48550/arXiv.2406.10100.
|
| [35] |
LI Xiang, DING Jian, and MOHAMED E. VRSBench: A versatile vision-language benchmark dataset for remote sensing image understanding[C]. Proceedings of the 38th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2024: 106.
|
| [36] |
SHI Qian, HE Da, LIU Zhengyu, et al. Globe230k: A benchmark dense-pixel annotation dataset for global land cover mapping[J]. Journal of Remote Sensing, 2023, 3: 0078. doi: 10.34133/remotesensing.0078.
|
| [37] |
HU Fengming, XU Feng, WANG R, et al. Conceptual study and performance analysis of tandem multi-antenna spaceborne SAR interferometry[J]. Journal of Remote Sensing, 2024, 4: 0137. doi: 10.34133/remotesensing.0137.
|
| [38] |
MEI Shaohui, LIAN Jiawei, WANG Xiaofei, et al. A comprehensive study on the robustness of deep learning-based image classification and object detection in remote sensing: Surveying and benchmarking[J]. Journal of Remote Sensing, 2024, 4: 0219. doi: 10.34133/remotesensing.0219.
|