IUPAC FAIRSpec GitHub Pages

IUPAC FAIRSpec GitHub Project pages

Welcome to the GitHub project web pages for IUPAC Project 2019-031-1-024, Development of a Standard for FAIR Data Management for Spectroscopic Data. At this site we highlight our development of IUPAC FAIRSpec Finding Aids.

See publications for published and submitted publications.

CDX/CDXML specification

But, first, 352 pages retrieved from multiple points on the wayBack machine preserving the CDX/CDXML specification. (Note that some images were not retrievable.) This specification is important, as it is the basis for one of the most widely used and accessible formats in the chemistry community for communicating structural information. The specification is detailed and extensive, and has been implemented in Jmol, allowing an extractor to use it as a basis for creating value-added structure representations such as molecular formulas, InChI strings, SMILES, and MOL-format descriptions, which can be used both for display and validation. The pages are also available as a ZIP file.

NOTE

It is important to note that, while the first two demonstrations, from 2021 and 2023, use their own specialized landing pages, all of the other demonstrations use identical HTML for their landing page. Only the finding aids are different.

DEMOS

Demo 2025.7 nmRxiv-pb

This example was created with a relatively simple configuration (wrapped here for readability):


     {"FAIRSpec.extractor.object":"{id=IFD.property.fairspec.compound.id::*}.zip|
            {IFD.representation.dataobject.fairspec.nmr.vendor_dataset::
                {IFD.property.dataobject.id::*/*}}/"
     }

The respository collection consists of a set of four ZIP files containing fourteen spectra; there are no structures. Thr configration line tells the extractor to create data objects by look for NMR data in directories within this set of zip files. The IUPAC FAIRSpec Data Collection created includes extracted PNG "thumbnail" images along with Bruker dataset zip files that can be unzipped and read back into TopSpin.

Demo 2025.6 examples2/v6-crawler2

This third demonstration combines crawling of DataCite metadata records with extraction of the files. The DataCite DOI records are followed, but unlike the v5 example, here we also download all digital objects into a temporary directory in order to extract their metadata. The command line used was


       java -jar ICLDOICrawler.jar "10.14469/hpc/10386" "c:/temp/iupac/crawler" -insitu -extractSpecProperties

The process creates an especially metadata-rich IUPAC FAIRSpec Finding Aid for a repository, in effect converting the repository from a FAIRSpec-ready data collection to an IUPAC FAIRSpec Data Collection.

The landing page and associated files are completely portable. You can work with it locally by downloading and unzipping v6-crawler2/10.14469_hpc_10386.zip.

Demo 2025.5 examples2/icl-10386-crawl-only

A second demonstration highlighting what can be done by "crawling" of metadata-rich DataCite records.

The demonstration focuses on the interconnected DataCite metadata records of a highly curated collection at the high-performance computing repository at Imperial College London. This collection contains data relating to 57 compounds associated with the article Syntheses and Characterization of Main Group, Transition Metal, Lanthanide, and Actinide Complexes of Bidentate Acylpyrazolone Ligands, by Thomas Mies, Andrew J. P. White, Henry S. Rzepa, Luciano Barluzzi, Mohit Devgan and Richard A. Layfield, and Anthony G. M. Barrett.

The IUPAC FAIRSpec Finding Aid accesses 354 DOI-referenced pages in the repository backed by DataCite metadata records pointing to 1354 distinct digital items, including 244 CDXML drawings, 209 MOL files, 146 complete Bruker NMR datasets, 144 JCAMP-DX files, and 375 PNG images.

The sample landing page for the Finding Aid uses only the information in the DataCite metadata records retrieved from DataCite by ICLDOICrawler.java, which was initiated using only the DOI string for the main repository page. The command line used was:

	java -jar ICLDOICrawler.jar "10.14469/hpc/10386" "c:/temp/iupac/crawler" -insitu

For this proof-of-concept, the only files downloaded from the repository were PNG files (the -insitu flag). The internalization of all PNG images as dataURIs within the IUPAC FAIRSpec Finding Aid allows the page to and display rapidly, without directly accessing the repository itself for images. This creates a somewhat larger JSON document (17 MB) -- still considerably smaller than the 75.7 GB of data in this repository collection.

The landing page uses JME-SwintJS to create SMARTS substructure searches and a minimal implementation of Jmol-SwingJS to do the SMILES-string searching. Not all compounds in this collection have fully validated SMILES strings due to the inorganic nature of the compounds. But this does not prevent SMARTS searching of the metal center, ligands, or associated solvents. In addition, the page provides access to nmrdb for optional NMR spectrum prediction.

Demo 2025.4 examples2/index.htm?url=icl-14635.json

This example combines the remote-access idea of Demo 2025.2 with a crawler-based IUPAC FAIRSpec Finding Aid (from Demo 2025.3). Unlike the previous url= example, here there is no stand-alone IUPAC FAIRSpec Data Collection. Instead, the repository itself serves as the IUPAC FAIRSpec Data Collection and is called directly for all referenced digital objects. The finding aid refering to the collection can be located anywhere on the web.

Demo 2025.3 examples2/icl-14635

This "metadata crawler" demonstration illustrates how we can take a single DOI, "10.14469/HPC/14635", and produce a complete IUPAC FAIRSpec Finding Aid simply by following metadata stored at DataCite. The data set is from Imperial College London. The only calls to the actual repository in creating the Finding Aid were to extract https HEAD information about each data item. This included preferred file name, media type (such as "image/png"), and file size in bytes. The program used to create the IUPAC FAIRSpec Finding Aid was DOICrawler.java, subclassed as ICLDOICrawler in order to handle a few idiocyncracies of that particular repository. This page displays slower than other demos. This is because images are not downloaded during metadata processing, and display of structural images thus requires repeated access to the repository during page creation.

Demo 2025.2 examples2/index.htm?url=[path to finding aid]

This example demonstrates a generic web page loading an IUPAC FAIRSpec Finding Aid from the URL on this server using the ?url= query.

Demo 2025.1 examples2/v6-acs

A much richer demonstration involving advanced finding aids created using ExtractorTestACS.java. The extractor generated fourteen the web pages (thirteen from a set of supporting information datasets from the ACS FAIRData pilot study, and one added from a repository at Cambridge University). It includes the capability to search for properties of compounds, structures, and spectra. This demonstration illustrates IUPAC FAIRSpec Metadata Object Model Specification, version 0.1.0 (2025.08.15).

Demo 2024.1 examples2/v6-stolaf

This demonstration features the addition of an IFD_METADATA file within a Bruker dataset to automatically generate IFDSample objects and display spectra by sample as well as by compound. The data are from a summer organic chemistry lab at St. Olaf College. Undergraduate students were assigned the task of determining the structure of an unknown compound. They used the St. Olaf Bruker Avance 400 NMR instrument, with pre-assigned slots in a 120-position BACS autosampler. The interface was the remote-access web-based OleNMR system, which iterfaces directly with IconNMR. The system required entry of a sample ID. Students were also encouraged (later in the semester) to provide a proposed structure, drawn using JSME, also within the OleNMR interface. Sample ID and structure (as "structure.mol") were automatically added to the primary Bruker dataset directory by OleNMR.

The IFD_METADATA file in this case is just a single line, for example:


    sample_id=A5-Ex.6A-230613

Metadata extraction and construction of an IUPAC FAIRSpec Data Collection and associated IUPAC FAIRSpec Finding Aid were carried out based on a simple configuration file (IFD_extract.json) that indicated, among other things, the source of the originating sample identifiers:


     {"FAIRSpec.extractor.related_metadata" : "IFD_METADATA"},
     {"FAIRSpec.extractor.related_metadata_map" : {"sample_id":"IFD.property.dataobject.originating_sample_id"}},

Thus, these two files, IFD_METADATA and structure.mol, provided all the additional bits of metadata needed for the extractor program (ExtractorTestSTO.jar) to create the IUPAC FAIRSpec Data Collection, IUPAC FAIRSpec Finding Aid, and html landing page for this demonstration.

Demo 2023.1 examples/v4-acs

A more sophisticated demonstration allowing for substructure and text searching, as well as the creation of predicted spectra based on SMILES strings. This demonstration illustrates how supporting information ZIP files in a variety of formats can be extracted for metadata.

The page (the link above) provides substructure searching by utilizing about 400 Java classes transpiled into JavaScript using the Eclipse-based java2script transpiler, running the JavaScript SwingJS implementation of Java 8. These JavaScript (né Java) classes provide substructure searching for the page using SMILES strings created from MOL, CDX, and CDXML files associated with the spectra in the collection. Input uses a hybrid JME (Java Molecular Editor)/OCL (OpenChemLib) structure-drawing and analysis interface; matching is carried out using a small Jmol-derived smiles processing package, also transpiled to JavaScript from Java. The format of the finding aids in this demo is an early (now deprecated) alpha version of the specification, version 0.0.5.

Demo 2021.1 examples/v1-demo

Our first demonstration of a very early prototypical finding aid. It was generated from a set of supporting information data sets from the ACS FAIRData pilot study (2020). This demo just gave an idea of what a minimal extraction of metadata might look like in JSON format.