Welcome to the GitHub project web pages for 
	
IUPAC Project 2019-031-1-024, Development of a Standard for FAIR Data Management for Spectroscopic Data. 
	At this site we highlight our development of IUPAC FAIRSpec Finding Aids.
	
	
See 
publications for published and submitted publications.
	
	
	But, first, 352 pages retrieved from multiple points on the wayBack machine 
	preserving the CDX/CDXML specification. (Note that some images were not retrievable.)
	This specification is important, as it is the basis for one of the most
	widely used and accessible formats in the chemistry community for 
	communicating structural information. The specification is detailed and 
	extensive, and has been implemented in Jmol, allowing an extractor
	to use it as a basis for creating value-added structure representations such as 
	molecular formulas, InChI strings, SMILES, and MOL-format descriptions, 
	which can be used both for display and validation. The pages are also available
	as a 
ZIP file.
	
	
NOTE
	It is important to note that, while the first two demonstrations, from 2021 and 2023, use their own specialized landing pages,
	
all of the other demonstrations use identical HTML for their landing page. Only the finding aids are different.
	
    
DEMOS
	
    
    
    This example was created with a relatively simple 
    
configuration (wrapped here for readability):
    
     {"FAIRSpec.extractor.object":"{id=IFD.property.fairspec.compound.id::*}.zip|
            {IFD.representation.dataobject.fairspec.nmr.vendor_dataset::
                {IFD.property.dataobject.id::*/*}}/"
     }
     
    The respository collection consists of a set of four ZIP files containing fourteen spectra; 
    there are no structures.
    Thr configration line tells the extractor to create data objects by look 
    for NMR data in directories within this set of zip files. 
The IUPAC FAIRSpec Data Collection created
    includes extracted PNG "thumbnail" images along with Bruker dataset zip files that can be unzipped and read back into TopSpin. 
     
   
    
    
    This third demonstration combines crawling of DataCite metadata records with extraction of the files. 
    The DataCite DOI records are followed, but unlike the v5 example, here we also download all digital objects 
    into a temporary directory in order to extract their metadata.
    The command line used was
    
       java -jar ICLDOICrawler.jar "10.14469/hpc/10386" "c:/temp/iupac/crawler" -insitu -extractSpecProperties
   
    The process creates an especially metadata-rich IUPAC FAIRSpec Finding Aid 
    for a repository, in effect converting the repository from a FAIRSpec-ready data collection 
    to an IUPAC FAIRSpec Data Collection.
    
    The landing page and associated files are completely portable. You can work with it locally
    by downloading and unzipping 
v6-crawler2/10.14469_hpc_10386.zip.
   
 
	
    
	A second demonstration highlighting what can be done 
	by "crawling" of metadata-rich DataCite records. 
    
    	
	The demonstration focuses on the interconnected DataCite 
	metadata records of a 
highly curated collection at 
	the high-performance computing repository at 
	Imperial College London. This collection contains 
	data relating to 57 compounds associated with the
	article 
Syntheses and Characterization of Main Group, Transition Metal, Lanthanide, 
	and Actinide Complexes of Bidentate Acylpyrazolone Ligands, 
	by Thomas Mies, Andrew J. P. White, Henry S. Rzepa, 
	Luciano Barluzzi, Mohit Devgan and Richard A. Layfield, 
	and Anthony G. M. Barrett.
	
    The 
IUPAC FAIRSpec Finding Aid 
	accesses 354 DOI-referenced pages in the repository 
	backed by DataCite metadata records pointing to 1354 distinct digital
	items, including 244 CDXML drawings, 209 MOL files, 
	146 complete Bruker NMR datasets, 144 JCAMP-DX files,
	and 375 PNG images.  
	
    The sample landing page for the Finding Aid 
	uses only the information in the DataCite metadata records retrieved from 
	DataCite by 
ICLDOICrawler.java,
	which was initiated using only the DOI string for the main repository page. The command line used was: 
	
	java -jar ICLDOICrawler.jar "10.14469/hpc/10386" "c:/temp/iupac/crawler" -insitu
	For this proof-of-concept, the only files downloaded from the repository
	were PNG files (the -insitu flag).
	    The internalization of all PNG images as dataURIs within the IUPAC FAIRSpec Finding Aid allows 
	    the page to and display rapidly, without directly accessing the 
	    repository itself for images.
    This creates a somewhat larger JSON document (17 MB) -- still considerably
    smaller than the 75.7 GB of data in this repository collection. 
	
	
	The landing page uses 
JME-SwintJS 
	to create 
SMARTS 
	substructure searches and a minimal implementation of 
Jmol-SwingJS 
	to do the SMILES-string searching. Not all compounds in this collection
	have fully validated 
SMILES strings 
	due to the inorganic nature of the 
	compounds. But this does not prevent SMARTS searching of the metal center, ligands, or 
	associated solvents. In addition, the page provides access to 
	
nmrdb for optional NMR spectrum prediction.
	
    
    This example combines the remote-access idea of Demo 2025.2 with a crawler-based IUPAC FAIRSpec Finding Aid (from Demo 2025.3). Unlike the previous url= example,
    here there is no stand-alone IUPAC FAIRSpec Data Collection. Instead, the repository itself serves as the IUPAC FAIRSpec Data Collection 
    and is called directly for all referenced digital objects. The 
    finding aid refering to the collection can be located anywhere on the web. 
    
    
    
    This "metadata crawler"  demonstration illustrates how we 
    can take a single DOI, "10.14469/HPC/14635", and produce a 
    complete IUPAC FAIRSpec Finding Aid simply by following metadata stored at DataCite.
    The data set is from Imperial College London. 
    The only calls to the actual repository in creating the Finding Aid were
    to extract https HEAD information about each data item. This included preferred file name, 
    media type (such as "image/png"), and file size in bytes. 
    The program used 
    to create the IUPAC FAIRSpec Finding Aid was DOICrawler.java, subclassed as ICLDOICrawler 
    in order to handle a few idiocyncracies 
    of that particular repository. This page displays slower than other demos. This is because 
    images are not downloaded during metadata processing, and display of structural images 
    thus requires repeated access to the repository during page creation.  
        
    
    
    
    This example demonstrates a generic web page loading an IUPAC FAIRSpec Finding Aid 
    from the URL on this server using the ?url= query.
    
    
    
    A much richer demonstration involving advanced finding aids created using
    
ExtractorTestACS.java.
    The extractor generated fourteen the web pages (thirteen from a set of supporting information datasets from the ACS FAIRData pilot study, and one added from a repository at Cambridge University).
    It includes the capability to 
    search for properties of compounds, structures, and spectra. 
    This demonstration illustrates 
IUPAC FAIRSpec Metadata Object Model Specification, version 0.1.0 (2025.08.15). 
    
    
    
    
    This demonstration features the addition of an IFD_METADATA file within a Bruker dataset to automatically 
    generate IFDSample objects and display spectra by sample as well as by compound. 
        The data are from a summer organic chemistry lab at St. Olaf College.
        Undergraduate students were assigned the task of determining the structure of an unknown compound. 
        They used the St. Olaf Bruker Avance 400 NMR instrument, with pre-assigned slots in a 120-position BACS autosampler.  
        The interface was the remote-access web-based 
OleNMR system, which iterfaces
        directly with IconNMR. 
        The system required entry of a sample ID. 
        Students were also encouraged (later in the semester) to provide a proposed structure, drawn 
        using JSME, also within the OleNMR interface. Sample ID and structure (as "structure.mol") were 
        automatically added to the primary Bruker dataset directory by OleNMR. 
    
    The IFD_METADATA file in this case is just a single line, for example:
    
    sample_id=A5-Ex.6A-230613
    
    
    
Metadata extraction and construction
    of an IUPAC FAIRSpec Data Collection and associated IUPAC FAIRSpec Finding Aid were carried out 
    based on a simple configuration file (
IFD_extract.json)
    that indicated, among other things,  
    the source of the originating sample identifiers:
    
     {"FAIRSpec.extractor.related_metadata" : "IFD_METADATA"},
     {"FAIRSpec.extractor.related_metadata_map" : {"sample_id":"IFD.property.dataobject.originating_sample_id"}},
    
    
    Thus, these two files, IFD_METADATA and structure.mol, provided all the additional bits of metadata 
    needed for the extractor program (
ExtractorTestSTO.jar) 
    to create the 
IUPAC FAIRSpec Data Collection, 
IUPAC FAIRSpec Finding Aid, and 
html landing page for this demonstration.
    
    
    
    A more sophisticated demonstration allowing for substructure and text searching, as well as the creation of predicted spectra based on SMILES strings.
    This demonstration illustrates how supporting information ZIP files in a variety of formats can be 
    extracted for metadata. 
    
       The page (the link above) provides substructure searching by 
       utilizing about 400 Java classes transpiled into JavaScript using the Eclipse-based 
java2script transpiler, 
       running the JavaScript SwingJS implementation of Java 8. These JavaScript (né Java) classes provide substructure searching for the page using SMILES strings created from MOL, CDX, and CDXML files associated with the spectra in the collection. 
    Input uses a hybrid JME (
Java Molecular Editor)/OCL (
OpenChemLib) structure-drawing and analysis interface; 
    matching is carried
    out using a small Jmol-derived smiles processing package, also transpiled to JavaScript from Java. 
    The format of the finding aids in this demo is an early (now deprecated) alpha version of the specification, version 0.0.5.
    
    
    
    
    Our first demonstration of a very early prototypical finding aid. 
    It was generated from a set of supporting information data sets from the 
ACS FAIRData pilot study (2020).
    This demo just gave an idea of what a minimal extraction of metadata might look like in JSON format.