Tethys RDR manages research-relevant entities and their metadata using a relational database management system (PostgreSQL). The database model includes specific data structures, tables, and relationships designed to handle the complex scientific data, such as metadata and data provenance. To ensure the integrity of the data, the database model includes data constraints and validation rules, implemented access controls to limit who can make changes to the data, timestamps and checksums such as MD5 and SHA-512 (see R14).
In order to comply with international metadata standards such as Dublin Core, DataCite and ISO 19139, the relevant information is delivered directly from the database by mapping (XSLT Transformation) on the fly. By mapping Tethys RDR metadata to these international standards, it becomes easier for other systems to exchange and integrate metadata, which in turn can facilitate the discovery, access, and reuse of data and other resources (e.g. OGC CSW Metadata Harvesting (ISO19139) from the Tethys RDR OAI-PMH endpoint).
The repository uses DataCite Fabrica services to assign Digital Object Identifiers (DOIs) to ensure accessibility and authenticity of the data. DOIs are not automatically assigned after submission, but only after the repository's staff (editors, reviewers) have checked the publication of the data for completeness and correctness. In addition to the data itself, a typical Tethys RDR dataset always contains metadata and, if necessary, a detailed description, e.g. in the form of a PDF file containing information about the data structure, suitability, geometry, citations and provenance.
Tethys RDR offers a user-friendly interface that simplifies metadata entry and record uploading. The utilization of input masks, help menus, suggestions and automatic validation enhances the efficiency of the data entry and uploading process, while reducing the likelihood of errors.
After publication, submitters cannot update their data files or metadata without the involvement of Tethys RDR staff. The curation process ensures that any updates made to the dataset are properly reviewed, approved, and documented, thereby maintaining the accuracy and integrity of the repository's contents. Major changes to the file(s), title, or creators result in a new version that cannot be published without undergoing another curation process. The new and the previous dataset are cross-referenced in their respective descriptive metadata. To conveniently track the different versions of a dataset, the Tethys RDR frontend provides also a drop-down menu where you can view and select from all available versions of the dataset. The original DOI of the published dataset will always lead to the latest version.
The actual data objects in Tethys are stored as data series (a series of data points in numerical, date/time, string or binary form). Each data entry in a data series refers to metadata about the object:
type of data point (numeric, date, string, binary file)
responsible scientist (PI)
methodology
for binary files also hashes and file size, absolute location in bucket store
for numerical data also format information like significant digits
Tethys's database schema is constantly adapted to new metadata standards. During this process, already existing metadata of datasets are adapted and extended according to new standards. Great care is taken to not introduce incompatible changes to object's metadata.
Not all of Tethys's data is provided in tabular form. Some datasets are only available in compact, community specific binary formats like NetCDF, Geopackage or static images. Long term preservation of those formats is a complex problem, so Tethys has developed some format rules before accepting data in binary formats. At the moment, any of the following formats are accepted for archival. If possible, uncompressed formats are preferred:
images: JPEG, PNG, TIFF
documents: PDF-A (preferred), ODF, OOXML
media containers:
NetCDF, preferably using Climate and Forecast Metadata Conventions - in all other cases detailed documentation is required
Geopackage, preferably used for geospatial Data
This list is not complete, Tethys may add more formats to this list. If any of those formats get deprecated or replaced by later standards, Tethys will do its best to transform these to modern replacements, but still keep the original data available.
In coordination with scientific communities, Tethys has developed documentation on how to harmonize metadata and data for archival. Those documents also contain information on how long-term preservation is handled (if applicable).
To ensure physical access to archived data, the computer center of the Geosphere Austria takes care of the proper function of hardware and software systems including backup of data and migration of data from outdated media. Geosphere Austria implemented technical-organizational measures (TOM):
Fire and smoke detection systems and fire extinguisher
Server room monitoring of temperature and humidity
Server room air-conditioning
UPS system and emergency diesel generators
RAID system / hard disk mirroring in virtualization environment
Storage of backup media in a physically separated secure locations
Backup concept and existence of an emergency plan
Backup monitoring and reporting, regular checksum validation
For documentation of all systems Gitea is used
User permission management
eMail checking with anti-virus software
Network firewall
Intrusion Detection Systems
A transfer of custody can be managed by reducing Tethys to a file based repository. In this case a file based copy of all data sets including possible binary object files would be created and made available either by the Geosphere Austria. In any case the host institutions guarantee that the data and metadata are available for at least 10 more years after the formal decommissioning of Tethys.
Due to legal reasons (e.g., copyright law / art. 17 GDPR), there may be a request by copyright owners, data subjects, or authorities to delete published datasets and their contents permanently. In this case, a tombstone page linked to the DOI name of the dataset will be created, informing potential users about the deletion.