
Compartir
Título
Unified Multi-Task Learning vs. Decoupled Transformer-based Perception: A Comparative Analysis
Autor(es)
Palabras clave
Autonomous driving perception
Multi-task learning
Real-time object detection
Vision Transformers
Drivable area segmentation
Clasificación UNESCO
1203 Ciencia de los ordenadores
Fecha de publicación
2026-03-02
Serie / N.º
ACM conference proceedings;
Resumen
[EN]Efficient environmental perception is a cornerstone of Advanced Driver Assistance Systems (ADAS) and autonomous driving. A
persistent architectural dilemma in this domain is whether to employ unified Multi-Task Learning (MTL) frameworks, which optimize computation through shared backbones, or modular multi-model pipelines, which prioritize task-specific accuracy. This paper presents a comparative analysis of these two paradigms for joint object detection and drivable area estimation. Specifically, we evaluate YOLOPX, a representative anchor-free MTL architecture, against a decoupled multi-model system that integrates RT-DETRv2 for vehicle detection and the lightweight YOLO11n-seg for drivable area segmentation on the BDD100K benchmark under identical hardware conditions. The results show that, although the MTL YOLOPX model achieves higher throughput, the decoupled system delivers substantially better detection performance, particularly in the stricter 𝑚𝐴𝑃 50:95 metric, while preserving competitive segmentation quality and maintaining real-time latency suitable for edge deployment. These findings suggest that modular designs, rather than monolithic MTL models, can offer a more favorable balance between safety-critical detection accuracy and computational efficiency for next-generation intelligent vehicles.
URI
Aparece en las colecciones
Ficheros en el ítem
Nombre:
Tamaño:
4.628Mb
Formato:
Adobe PDF













