RECOMMENDED README FILE FOR GREDOS_USAL This readme.txt file was generated on 20260302 by Andrés Cardona-Mendoza and Ana-Belén Gil González ---------------------------------- GENERAL INFORMATION ---------------------------------- 1. Title of Dataset: Labeled Patches Dataset for Semi-supervised YOLO Training on Cervical Cytology WSI 2. Authors: ✓ Name: Andrés Cardona-Mendoza ✓ Institution: Universidad de Salamanca - BISITE Group / Universidad El Bosque - SavIA Lab - INMUBO ✓ Email: afcardonam@usal.es ✓ ORCID: 0000-0002-6697-5471 ✓ Name: Ana-Belén Gil-González ✓ Institution: Universidad de Salamanca - BISITE Group ✓ Email: abg@usal.es ✓ ORCID: 0000-0001-7235-6151 ✓ Name: Sandra Janeth Perdomo Lara ✓ Institution: Universidad El Bosque – SavIA Lab – INMUBO ✓ Email: perdomosandraj@unbosque.edu.co ✓ ORCID: 0000-0002-4429-3760 ✓ Name: Lautaro Rossi Labianca ✓ Institution: Universidad de Salamanca - BISITE Group ✓ Email: lross98@usal.es ✓ ORCID: 0009-0000-7562-7586 ✓ Name: Sandra Marcos Recio ✓ Institution: Universidad de Salamanca - BISITE Group ✓ Email: sandra_marcos@usal.es ✓ ORCID: 0009-0009-3379-7304 ✓ Name: Andrés Barrero Bueno ✓ Institution: AIR Institute ✓ Email: abarrero@air-institute.com ✓ ORCID: 0009-0008-1330-3429 ------------------- DESCRIPTION ------------------- 1. Dataset language: English 2. Abstract: This dataset contains labeled image patches extracted from Colombian Whole Slide Images (WSIs) of conventional Papanicolaou (Pap) tests. It supports the training and validation of object detection models (e.g., YOLO) in automated cervical cytology diagnosis. The patches (640×640px JPGs) were extracted and labeled using a semi-automated pipeline combining manual annotation in QuPath and automated patch extraction and YOLO label generation via Groovy and Python scripts. A web-based expert validation interface was used to ensure label accuracy. 3. Keywords: Cervical cytology, Papanicolaou test, YOLO, Whole Slide Image, object detection, patch dataset, digital pathology, label validation 4. Date of data collection: 2023-07 to 2025-02 5. Date of data publication on repository: 2026-03 6. Funding: This work was supported by the Secretaría Distrital de Salud de Bogotá (Colombia) and the ATENEA Agency (grant number 368-2022), in collaboration with the Spanish Ministry for Digital Transformation and Civil Service through the University-Enterprise Chair Call (Cátedras ENIA 2022) (Grant TSI-100933-2023-1), co-funded by the European Union NextGenerationEU/PRTR. 7. Geographic location of data collection: Colombia – Bogotá D.C. and 11 branches of Liga Colombiana Contra el Cáncer 8. Recommended citation for this dataset: Cardona-Mendoza, A., Gil-González, A.B., et al. (2025). Labeled Patches Dataset for Semi-supervised YOLO Training on Cervical Cytology WSI. GREDOS Repository, University of Salamanca. --------------------------------------------------------- SHARING/ACCESS/CONTEXT INFORMATION --------------------------------------------------------- 1. Usage Licenses/restrictions placed on the data: Creative Commons BY-NC-ND 4.0 — No public access to raw images. Dataset is restricted to research collaboration under the Cátedra DemIA framework. 2. Related publications: Andrés, CM., Ana-Belén, GG., Hortua, H.J., Lautaro, R.L., Sandra, PL. (2026). Prototype of a Comprehensive System for Automated Generation and Expert Validation of Labeled Patches on Papanicolaou Test WSI Images for Semi-supervised Training of YOLO Models in Automated Cervical Cytology Diagnosis. In: Fdez-Riverola, F., et al. Practical Applications of Computational Biology and Bioinformatics, 19th International Conference (PACBB 2025). PACBB2025 2025. Lecture Notes in Networks and Systems, vol 1720. Springer, Cham. https://doi.org/10.1007/978-3-032-10634-6_1 3. Dataset DOI: PENDING -------------------------------- DATA & FILE OVERVIEW -------------------------------- 1. File List: ✓ JPG patches (640x640) ✓ YOLO-format labels (.txt) ✓ Visual validated patches (.jpg with bounding boxes) ✓ Metadata snapshots (.csv) ✓ Logs of processed annotations (.txt) 2. Relationship between files, if important: Each patch corresponds to a label file. Patches and labels are grouped by class. Visual validated images are derived from the patch-label pair. 3. File format: ✓ Images: .jpg ✓ Labels: .txt (YOLO format) ✓ Logs/Snapshots: .csv, .txt ----------------------------------------------- METHODOLOGICAL INFORMATION ---------------------------------------------- 1. Instrument- or software-specific information: ✓ Qupath for manual annotations ✓ Custom Groovy script in Qupath for patch and label generation ✓ Python script (`verificar_labels.py`) with OpenCV and NumPy for validation ✓ Web-based review tool built in Flask with SQLite for expert evaluation 2. Quality-assurance procedures: ✓ Manual review of annotations by senior cytologists ✓ Visual validation script to check YOLO label accuracy ✓ Rejected annotations are rerouted for relabeling 3. Author contact information: Ana Belén Gil González abg@usal.es 0000-0001-7235-6151 Universidad de Salamanca – BISITE Group Andrés Cardona-Mendoza afcardonam@usal.es 0000-0002-6697-5471 Universidad de Salamanca – BISITE Group / Universidad El Bosque – SavIA Lab - INMUBO