IUPAC FAIRSpec GitHub Project pages

Welcome to the GitHub project web pages for IUPAC Project 2019-031-1-024, Development of a Standard for FAIR Data Management for Spectroscopic Data. At this site we highlight our development of IUPAC FAIRSpec Finding Aids.

CDX/CDXML specification

But, first, 352 pages retrieved from multiple points on the wayBack machine preserving the CDX/CDXML specification. (Note that some images were not retrievable.) This specification is important, as it is the basis for one of the most widely used and accessible formats in the chemistry community for communicating structural information. The specification is detailed and extensive, and has been implemented in Jmol, allowing an extractor to use it as a basis for creating value-added structure representations such as molecular formulas, InChI strings, SMILES, and MOL-format descriptions, which can be used both for display and validation. The pages are also available as a ZIP file.

examples/examples2/icl-14635

This "metadata crawler" demonstration illustrates how we can take a single DOI, " ", and produce a complete IUPAC FAIRSpec Finding Aid simply by following metadata at DataCite. The data set is from Imperial College London. The only calls to the actual repository in creating the Finding Aid were to extract http HEAD information about each data item. This included preferred file name, media type (such as "image/png"), and number of bytes. The program used to create the IUPAC FAIRSpec Finding Aid was DOICrawler.java, subclassed as ICLDOICrawler in order to handle a few idiocyncracies of that repository.

The resultant landing page (the link above) utilizes about 400 Java classes transpiled into JavaScript using the Eclipse java2script transpiler, running the SwingJS implementation of Java 8. These JavaScript (né Java) classes provide substructure searching for the page. Input uses a hybrid JME (Java Molecular Editor)/OCL (OpenChemLib) structure-drawing interface; matching is carried out using a small Jmol-derived smiles processing package, also transpiled to JavaScript from Java.

examples/v1-demo

Our first demonstration of a prototypical finding aid (2021), just giving an idea of what a minimal extraction of metadata might look like in JSON format.

examples/v4-acs

A more sophisticated demonstration involving finding aids generated from a set of fourteen supporting information data sets from the ACS FAIRData pilot study (2023). This demonstration illustrates how supporting information ZIP files in a variety of formats can be extracted for metadata.

examples/v5-icl-repository-DOI-crawl

This demonstration highlights a what can be done by "crawling" of DataCite metadata records as well as the searching of compounds using substructure searches. The demonstration focus on the interconnected DataCite metadata records of a highly curated collection at the high-performance computing repository at Imperial College London. This collection contains data relating to 60 compounds associated with the article Syntheses and Characterization of Main Group, Transition Metal, Lanthanide, and Actinide Complexes of Bidentate Acylpyrazolone Ligands, by Thomas Mies, Andrew J. P. White, Henry S. Rzepa, Luciano Barluzzi, Mohit Devgan and Richard A. Layfield, and Anthony G. M. Barrett.

The IUPAC FAIRSpec Finding Aid accesses 323 DOI-referenced pages in the repository backed by DataCite metadata records pointing to 1843 distinct digital items, including 390 CDXML drawings, 294 MOL files, 140 complete Bruker NMR datasets, 144 JCAMP-DX files, and 328 PNG images.

The sample landing page for the Finding Aid uses only the information in the DataCite metadata records retrieved from DataCite by DOICrawler.java, which was initiated using only the DOI string for the main repository page, "10.14469/hpc/10386". Thus, for this proof-of-concept, no files from the repository were actually downloaded in the creation of the Finding Aid or the generation of the landing page itself.

The landing page uses JSME to create SMARTS substructure searches and JSmol to do the SMILES-string searching. Not all compounds in this collection have fully validated SMILES strings due to the inorganic nature of the compounds. But this does not prevent SMARTS searching of the metal center, ligands, or associated solvents. In addition, the page provides access to nmrdb for optional NMR spectrum prediction.

View the IUPAC FAIRSpec Finding Aid

Additional output from DOICrawler.java:

ifd-fileURLMap.txt
crawler.log
devnotes.txt

IFDExtractor.jar (beta)

This demonstration illustrates how IFDExtractor can create an IUPAC FAIRSpec Finding Aid and an associated example landing page from a structured set of information that defines the key aspects of a data collection. Now in beta testing, IFDExtractor.java requires a small configuration file to do its job. We are currently working on providing a handful of templates for this file; right now it has to be constructed by hand for each dataset being processed.

The command line looks like this:

java -jar IFDExtractor.jar IFD.extract-test.json

and produces this page.