Unified Multi-Task Learning vs. Decoupled Transformer-based Perception: A Comparative Analysis

Caño Pascual, Pablo; Fran Abadía, Pablo; Valdes-Ramirez, Danilo; González Briones, Alfonso; Barrio Val, Pablo

Título

dc.contributor.author	Caño Pascual, Pablo
dc.contributor.author	Fran Abadía, Pablo
dc.contributor.author	Valdes-Ramirez, Danilo
dc.contributor.author	González Briones, Alfonso
dc.contributor.author	Barrio Val, Pablo
dc.date.accessioned	2026-06-10T07:59:53Z
dc.date.available	2026-06-10T07:59:53Z
dc.date.issued	2026-03-02
dc.identifier.uri	http://hdl.handle.net/10366/171782
dc.description.abstract	[EN]Efficient environmental perception is a cornerstone of Advanced Driver Assistance Systems (ADAS) and autonomous driving. A persistent architectural dilemma in this domain is whether to employ unified Multi-Task Learning (MTL) frameworks, which optimize computation through shared backbones, or modular multi-model pipelines, which prioritize task-specific accuracy. This paper presents a comparative analysis of these two paradigms for joint object detection and drivable area estimation. Specifically, we evaluate YOLOPX, a representative anchor-free MTL architecture, against a decoupled multi-model system that integrates RT-DETRv2 for vehicle detection and the lightweight YOLO11n-seg for drivable area segmentation on the BDD100K benchmark under identical hardware conditions. The results show that, although the MTL YOLOPX model achieves higher throughput, the decoupled system delivers substantially better detection performance, particularly in the stricter 𝑚𝐴𝑃 50:95 metric, while preserving competitive segmentation quality and maintaining real-time latency suitable for edge deployment. These findings suggest that modular designs, rather than monolithic MTL models, can offer a more favorable balance between safety-critical detection accuracy and computational efficiency for next-generation intelligent vehicles.	es_ES
dc.description.sponsorship	Project “A catalyst for EuropeaN ClOUd Services in the era of data spaces, high-performance and edge computing (NOUS)”, Grant Agreement Number 101135927. Funded by the European Union.	es_ES
dc.format.mimetype	application/pdf
dc.language.iso	eng	es_ES
dc.relation.ispartofseries	ACM conference proceedings;
dc.rights	Attribution 4.0 International	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	es_ES
dc.subject	Autonomous driving perception	es_ES
dc.subject	Multi-task learning	es_ES
dc.subject	Real-time object detection	es_ES
dc.subject	Vision Transformers	es_ES
dc.subject	Drivable area segmentation	es_ES
dc.title	Unified Multi-Task Learning vs. Decoupled Transformer-based Perception: A Comparative Analysis	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.subject.unesco	1203 Ciencia de los ordenadores	es_ES
dc.relation.projectID	European Union 101135927	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.journal.title	International conference on Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) solutions for Europe’s Next-Gen Cloud Infrastructure	es_ES
dc.type.hasVersion	info:eu-repo/semantics/draft	es_ES