Afficher la notice abrégée

dc.contributor.authorNegre, Pablo
dc.contributor.authorAlonso Rincón, Ricardo Serafín 
dc.contributor.authorPrieto Tejedor, Javier 
dc.contributor.authorGarcía García, Óscar
dc.date.accessioned2026-02-09T09:54:44Z
dc.date.available2026-02-09T09:54:44Z
dc.date.issued2026-02-03
dc.identifier.citationNegre, P., Alonso, R.S., Prieto, J. et al. Video violence detection using pre-trained VGG19 combined with manual logic, LSTM layers and Bi-LSTM layers. Appl Intell 56, 72 (2026). https://doi.org/10.1007/s10489-026-07122-3es_ES
dc.identifier.issn0924-669X
dc.identifier.urihttp://hdl.handle.net/10366/169634
dc.description.abstract[EN]Video violence detection using artificial intelligence plays a key role in public safety applications. Although convolutional and recurrent neural networks are widely adopted for this task, the actual contribution of temporal modeling over strong frame-level representations remains insufficiently analyzed. This work provides a systematic study of video violence detection models under a unified experimental framework. We investigate whether violence can be reliably detected from individual frames without explicit temporal modeling, evaluate the effectiveness of combining CNNs with LSTM and Bi-LSTM layers, and analyze the impact of architectural and hyperparameter choices, including neuron configuration and backbone selection (VGG-16 vs. VGG-19). Experiments are conducted on three widely used benchmark datasets. Our results show that frame-level analysis using a pre-trained VGG-19 network, combined with a simple aggregation strategy, achieves competitive performance, reaching 95% accuracy on Hockey Fights and 96% on Violent Flow. While Bi-LSTM layers can provide moderate improvements of up to 4% over standard LSTM models in certain datasets, these gains are not consistent across all scenarios. Furthermore, variations in hyperparameter configurations do not systematically lead to improved performance. Overall, this study highlights that increased architectural complexity does not always translate into better results and that, in several cases, simple frame-based approaches can rival more complex temporal models. These findings provide practical insights into the cost–benefit trade-off of temporal modeling for video-based violence detection.es_ES
dc.language.isoenges_ES
dc.publisherSpringer Naturees_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectConvolutional Neural Networks (CNN)es_ES
dc.subjectVideo violence detectiones_ES
dc.subjectPhysical aggressiones_ES
dc.subjectManual featurees_ES
dc.subjectLong Short Term Memory (LSTM)es_ES
dc.titleVideo violence detection using pre-trained VGG19 combined with manual logic, LSTM layers and Bi-LSTM layerses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.relation.publishversionhttps://doi.org/10.1007/s10489-026-07122-3es_ES
dc.subject.unesco1203.04 Inteligencia Artificiales_ES
dc.identifier.doi10.1007/s10489-026-07122-3
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/HORIZON/101120726es_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.identifier.essn1573-7497
dc.journal.titleApplied Intelligencees_ES
dc.volume.number56es_ES
dc.issue.number3es_ES
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersiones_ES


Fichier(s) constituant ce document

Thumbnail

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepté là où spécifié autrement, la license de ce document est décrite en tant que Attribution-NonCommercial-NoDerivatives 4.0 Internacional