RECOMMENDED README FILE FOR GREDOS_USAL


This readme.txt file was generated on 20260226 by Ana Belén Gil González


----------------------------------
GENERAL INFORMATION
----------------------------------
1. Title of Dataset: 
Georeferenced Integrated Forest and Carbon Dataset for Spain (198-2022)


2. Authors:


✓ Name: Maider Araceli Urbon Jimenez
✓ Institution: Universidad de Salamanca - BISITE Group
✓ Email: murbon001@usal.es
✓ ORCID: 0009-0002-6308-4398


✓ Name: Jaime Gabriel Vegas 
✓ Institution: Universidad de Salamanca - BISITE Group
✓ Email: JaimeGabrielVegas@usal.es
✓ ORCID: 0009-0006-8923-5405


✓ Name: Ana-Belén Gil-González  
✓ Institution: Universidad de Salamanca - BISITE Group  
✓ Email: abg@usal.es  
✓ ORCID: 0000-0001-7235-6151  


✓ Name: Ana de Luis Reboredo
✓ Institution: Universidad de Salamanca - BISITE Group
✓ Email: adeluis@usal.es
✓ ORCID: 0000-0001-5354-9054


✓ Name: Belén Pérez Lancho
✓ Institution: Universidad de Salamanca - BISITE Group
✓ Email: lancho@usal.es
✓ ORCID: 0000-0002-0934-8316
-------------------
DESCRIPTION
-------------------
1. Dataset language:
Spanish


2. Abstract: 
This dataset contains an integrated forest, climate and satellite-derived database for Spain based on Spanish National Forest Inventory, Copernicus and NASA data. It characterizes the evolution of Spanish forests through plots distributed across the national territory that have been inventoried on four occasions from 1986 to the present.


3. Keywords: 
Forest database, Spain, georeferenced dataset, National Forest Inventory, Copernicus, NASA, remote sensing, climate data, vegetation indices, forest carbon, biomass expansion factors, carbon stock.


4. Date of data collection: 
2025-06 to 2025-09 


5. Date of data publication on repository: 
2026-02


6. Funding (Information about funding sources that supported the collection of the data): 
This work was supported by the International Chair Project on Trustworthy Artificial Intelligence and the Demographic Challenge within the framework of the National Artificial Intelligence Strategy (ENIA). Reference: TSI-100933-2023-0001. Funded by the Secretary of State for Digitalization and Artificial Intelligence and by the European Union (NextGenerationEU).


7. Geographic location of data collection <latitude, longitude, or city/region, Country, continent
as appropriate>: 
Spain (national coverage, georeferenced across the entire Spanish territory)


8. Recommended citation for this dataset: 
Urbon-Jimenez, M.A., Gabriel-Vegas, J.G., Gil-González, A.-B., de Luis-Reboredo, A., Pérez-Lancho, B. (2025). Georeferenced Integrated Forest and Carbon Dataset for Spain (1986–2017). GREDOS Repository, University of Salamanca.


---------------------------------------------------------
SHARING/ACCESS/CONTEXT INFORMATION
---------------------------------------------------------
1. Usage Licenses/restrictions placed on the data (please indicate if different data files have different usage license). (Creative commons 4.0- BY-NC-ND o similar)
Usage license
This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Users are free to share, adapt and reuse the data for any purpose, provided that appropriate credit is given.
The dataset integrates and derives from multiple open-access data sources. While the integrated dataset is distributed under CC BY 4.0, users must additionally cite the original data sources listed below, in accordance with their respective attribution requirements.


Spanish National Forest Inventory (IFN)
Ministerio para la Transición Ecológica y el Reto Demográfico (MITECO) (2025). Spanish National Forest Inventory (Inventario Forestal Nacional). Government of Spain. Available at: https://www.miteco.gob.es/es/biodiversidad/temas/inventarios-nacionales/inventario-forestal-nacional.html


NASADEM Merged Digital Elevation Model (Global 1 arc second V001)  Provided by NASA JPL and archived at the NASA Land Processes Distributed Active Archive Center (LP DAAC). Elevation data were accessed and processed via Google Earth Engine. https://doi.org/10.5067/MEaSUREs/NASADEM/NASADEM_HGT.001


ERA5-Land climate data (Copernicus / ECMWF)
Muñoz Sabater, J. (2019). ERA5-Land hourly data from 1950 to present. Copernicus Climate Change Service (C3S) Climate Data Store. https://doi.org/10.24381/cds.e2161bac 


Landsat 5
U.S. Geological Survey (USGS). Landsat 5 Thematic Mapper Level-2, Collection 2, Tier 1. Accessed via Google Earth Engine. Available at: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LT05_C02_T1_L2


Landsat 7
U.S. Geological Survey (USGS). Landsat 7 Enhanced Thematic Mapper Plus Level-2, Collection 2, Tier 1. Accessed via Google Earth Engine. Available at: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LE07_C02_T1_L2


2. Related publications: 
In preparation

3. Dataset DOI: XXXX


--------------------------------
DATA & FILE OVERVIEW
--------------------------------


1. File List: 
The dataset consists of a single relational database organized into a set of core tables, catalog tables and a metadata table. The database implements a normalized schema designed to represent spatially explicit forest inventory, climatic and satellite-derived information for the Spanish territory.


2. Relationship between files, if important:
The database follows a normalized relational model that ensures referential integrity, internal consistency and analytical traceability. All primary keys (PK) are explicitly defined and uniquely identify records at each hierarchical level, while all foreign keys (FK) enforce valid relationships between tables and prevent orphan records.
Core tables
* parcelas
Represents the fundamental spatial unit of the database. Each record corresponds to a georeferenced sampling plot and acts as the root entity from which all other information is linked.
* parcela_inventario
Describes the state of each spatial unit within a specific forest inventory cycle. This table links parcels to inventories and provides the contextual framework for all subsequent forest, environmental and carbon-related information.
* parcela_inventario_especie
Captures the presence and characterization of species within each parcel and inventory. This table establishes the species-level granularity of the database and connects parcel-level information with taxonomic entities.
* parcela_especie_arbol
This table represents individual overstory trees identified within each parcel and species in the Fourth National Forest Inventory. Each record corresponds to a single tree and stores detailed tree-level attributes. This table provides the finest observational resolution of the forest inventory, enabling tree-level analyses of forest structure, productivity and carbon stocks.
* parcela_inventario_especie_cd
Represents the finest structural level of the forest inventory, describing tree populations by parcel, inventory, species and diameter class. It supports detailed structural, volumetric and carbon accounting analyses.
* parcela_inventario_estacion
Stores aggregated environmental, climatic and satellite-derived information at the parcel–inventory-season level. This table links biophysical and climate descriptors with forest inventory observations in a consistent spatial and temporal framework.
* especies and grupos
Define the taxonomic structure of the database. Species are linked to higher-level functional or taxonomic groups, enabling analyses at multiple biological aggregation levels.
Catalog tables
All categorical attributes in the core tables reference dedicated catalog tables (cat_*). These tables define controlled vocabularies and ensure terminological consistency across the database. Each catalog table contains the complete set of admissible values for a given categorical domain, which are referenced through foreign keys in the core tables.
Metadata table (meta_variables)
Provides a centralized metadata layer that documents the structure and semantics of the database. This table records the type, units, description, source, calculation method, expected range, special values and standard of each variable across all tables.


3. File format:
The database is implemented as a relational SQL database using the InnoDB storage engine with UTF-8 character encoding. The schema is compatible with standard relational database management systems and is designed to be queried and extended through common analytical environments.
The data model and contents are distributed through a set of structured files intended to facilitate inspection, documentation, and reuse:
   * meta_variables.xlsx: Excel file containing the meta-variables table, which documents each variable in the database, including its definition, units, description, source, calculation method, and expected range.
   * catalogos.xlsx: Excel file containing one worksheet per categorical variable catalogue (e.g. status, origin, texture, treatment). Each worksheet corresponds exactly to a categorical reference table in the database and preserves its identifiers and labels.
   * nucleo_csv/: Folder containing the core data tables of the database, with one CSV file per table (e.g. especies.csv, grupos.csv, parcelas.csv, parcela_inventario.csv…). Each file corresponds to a single relational table, preserving the original table name, column structure, and field definitions, and allowing the complete datasets to be distributed without the row limitations imposed by spreadsheet formats.
   * schema.sql: SQL file containing the complete database schema (DDL), defining tables, primary keys, relationships, and constraints. This file allows the database structure to be recreated in a compatible SQL environment without requiring access to the original database instance.
Together, these files provide a transparent and portable representation of the database structure, metadata, reference catalogues, and core datasets, enabling reproducibility and integration in external analytical workflows.


-----------------------------------------------
METHODOLOGICAL INFORMATION
----------------------------------------------
1. Instrument- or software-specific information needed to interpret/reproduce the data, please
indicate their location:
The dataset was generated using a combination of relational database technologies and geospatial data processing platforms. Data integration, processing and harmonization were performed using Python (SQLAlchemy) and Google Earth Engine for the extraction and aggregation of satellite-derived and climatic variables.
The relational structure follows a standard SQL schema compatible with MySQL/MariaDB systems. No proprietary software is required to access or query the database.
The dataset is physically hosted at the Supercomputing Center of Castilla y León (SCAYLE), while the dataset description, metadata and persistent access information are provided through the GREDOS institutional repository of the University of Salamanca. Reproduction of the data derivation workflows requires access to Google Earth Engine and the original open-access source datasets referenced in the documentation.


2. Describe any quality-assurance procedures performed on the data:
Multiple quality-assurance procedures were applied throughout the data integration process. Referential integrity was enforced through a fully normalized relational schema with explicitly defined primary and foreign keys, preventing orphan records and invalid relationships.
Controlled vocabularies were implemented through dedicated catalog tables to ensure semantic consistency across categorical variables. Spatial and temporal consistency checks were applied during the integration of inventory, climatic and satellite-derived information.
All variables are documented in a centralized metadata table describing definitions, units, calculation methods, data sources and expected value ranges, enabling full analytical traceability and reproducibility. The dataset is curated and disseminated following institutional data management practices of the University of Salamanca.


3. Author contact information:


✓ Name: Ana-Belén Gil-González  
✓ Institution: Universidad de Salamanca - BISITE Group  
✓ Email: abg@usal.es  
✓ ORCID: 0000-0001-7235-6151