Rajitha de Silva · Jacob Swindell · Jonathan Cox · Marija Popović · Cesar Cadena · Cyrill Stachniss · Riccardo Polvara
📄 Paper | 🎬 Video | 🌐 Website | 🌱 Dataset | 🧠 Model
We present Keypoint Semantic Integration (KSI) — a lightweight method that enhances keypoint descriptors with semantic context to reduce perceptual aliasing in visually repetitive outdoor environments such as vineyards.
By embedding instance-level semantic information (e.g., trunks, poles, buildings) into keypoint descriptors, KSI significantly improves feature matching, pose estimation, and visual localisation across months and seasons.
It integrates seamlessly with classical (SIFT, ORB) and learned (SuperPoint, R2D2, SFD2) descriptors, using existing matchers like SuperGlue or LightGlue without retraining.
Robust robot navigation in outdoor environments requires accurate perception systems capable of handling visual challenges such as repetitive structures and changing appearances. Visual feature matching is crucial to vision-based pipelines but remains particularly challenging in natural outdoor settings due to perceptual aliasing. We address this issue in vineyards, where repetitive vine trunks and other natural elements generate ambiguous descriptors that hinder reliable feature matching. We hypothesise that semantic information tied to keypoint positions can alleviate perceptual aliasing by enhancing keypoint descriptor distinctiveness. To this end, we introduce a keypoint semantic integration technique that improves the descriptors in semantically meaningful regions within the image, enabling more accurate differentiation even among visually similar local features. We validate this approach in two vineyard perception tasks: (i) relative pose estimation and (ii) visual localisation. Our method improves matching accuracy across all tested keypoint types and descriptors, demonstrating its effectiveness over multiple months in challenging vineyard conditions.
KSI operates as a plug-and-play enhancement over existing keypoint pipelines:
- Panoptic Segmentation – Uses YOLOv9 to segment vineyard-relevant classes (trunks, poles, buildings, etc.) from RGB images.
- Semantic Encoding – Each instance mask is encoded via a lightweight autoencoder to produce a compact semantic embedding.
- Descriptor Fusion – The semantic embedding is added to the corresponding keypoint descriptor and L2-normalised.
- Matching – Enhanced descriptors are matched together using existing matchers such as SuperGlue or LightGlue.
The result is a semantics-aware matching pipeline that maintains compatibility with standard SLAM and localisation systems.

Figure 1. Overview of the KSI pipeline integrating semantic embeddings into keypoint descriptors.

Figure 2. KSI enhances descriptor distinctiveness in repetitive vineyard scenes across seasons.

Figure 3. Vineyard loop used for evaluation — trunks and buildings provide stable semantics year-round.

Figure 4. KSI generalises to woodland environments, improving tree-based feature matching.
We introduce Semantic Bacchus Long-Term (SemanticBLT) — a multi-season dataset of vineyard images with panoptic segmentation for six classes (buildings, pipes, poles, robots, trunks, vehicles).
It extends the Bacchus Long-Term (BLT) dataset with semantic annotations, enabling perception research in repetitive natural scenes.
If you use this work, please cite:
@ARTICLE{11230833,
author={de Silva, Rajitha and Swindell, Jacob and Cox, Jonathan and Popović, Marija and Cadena, Cesar and Stachniss, Cyrill and Polvara, Riccardo},
journal={IEEE Robotics and Automation Letters},
title={Keypoint Semantic Integration for Improved Feature Matching in Outdoor Agricultural Environments},
year={2025},
volume={},
number={},
pages={1-8},
keywords={Semantics;Feature extraction;Visualization;Robots;Accuracy;Robot kinematics;Pipelines;Training;Standards;Shape},
doi={10.1109/LRA.2025.3629991}}Note: This work extends the original SuperGlue Pretrained Network codebase by Magic Leap, introducing a semantic integration module that augments descriptor matching with contextual embeddings.