Accessing the IUPAC Gold Book API in Python#

About this interactive icons recipe

Step 1: Import needed Python packages#

Python has a lot of functionality that can be imported using the ‘import’ function

import requests                             # package to get data from a URL
import json                                 # package to read/write/display JSON formatted data
import re                                   # package to use regular expression (regex) searching

Step 2: Add a Python function#

This function removes HTML tags from textual data. It uses regular expressions to detect HTML tags (e.g., I am surrounded by HTML tags is really <b>I am surrounded by HTML tags</b> in the page code).

# Source: https://medium.com/@jorlugaqui/how-to-strip-html-tags-from-a-string-in-python-7cb81a2bbf44
def remove_html_tags(text):                 # a 'def' is a (defined) function that can be called later
    clean = re.compile('<.*?>')             # sets up a regular expression to search with
    return re.sub(clean, '', text)          # removes the matches to the regular expression

Step 3: Download a JSON file#

Download data for all the IUPAC Recommended Terms currently available. Even though the amount of data that we download here is big (804 kB), it is better to get the data all at once rather than call the API every time in a loop. This makes the ‘for’ loop in Step 4 much faster.

allpath = "https://goldbook.iupac.org/terms/index/all/json"  # URL to the IUPAC Gold Book API down
reqdata = requests.get(allpath)                              # download file in JSON
terms = json.loads(reqdata.content)                          # convert JSON to a Python dictionary
print(str(len(terms['terms']['list'])) + ' terms')           # print the number of terms in the list
7036 terms

Step 4: Search for a term#

Here we search the recommended term list and if present get the terms code. We use the function above to ‘normalize’ the text of the titles from the Gold Book entries, by removing the HTML markup, so they match the term we are looking for. (Note: not all term titles have HTML in them)

searchterm = "cis-trans isomers"                            # the term to be found
searchcode = None                                           # empty variable to contain the searchcode
rawtitle = None                                             # empty variable to contain the raw title string
for code, term in terms['terms']['list'].items():           # iterate over each term in the list (code (str), term (obj))
    cleaned = remove_html_tags(term['title'])               # remove any HTML formatting in the title
    if cleaned == searchterm:                               # check if the term matches the one we want
        searchcode = code                                   # if it does, get the code for the term
        rawtitle = term['title']                            # saw the raw title so we can see it below
        break                                               # we have found the term, so we can get out of the for loop
print(rawtitle)                                             # IUPAC Gold Book term code (if found)
print(searchcode)                                           # IUPAC Gold Book term code (if found)
cis-trans isomers
C01093

Step 5: Use the term code to retrieve its definition#

Generate a URL to get data about a term, print out the term, its code and its definition

path = "https://goldbook.iupac.org/terms/view/**/json"      # URL path to the IUPAC Gold Book API for a term
reqdata = requests.get(path.replace("**", searchcode))      # request data from the Gold Book server
jsondata = json.loads(reqdata.content)                      # get the downloaded JSON
print(jsondata)                                             # print out all the downloaded data, so we can 'see' its structure and know how to get the definition
{'term': {'id': '01093', 'doi': '10.1351/goldbook.C01093', 'code': 'C01093', 'status': 'current', 'longtitle': 'IUPAC Gold Book - cis-trans isomers', 'title': '<i>cis</i>-<i>trans</i> isomers', 'termversion': '2.3.3', 'lastupdated': '2014-02-24', 'definitions': [{'id': '1', 'text': 'Stereoisomeric olefins or cycloalkanes (or hetero-analogues) which differ in the positions of atoms (or groups) relative to a reference plane: in the cis-isomer the atoms are on the same side, in the trans-isomer they are on opposite sides. [image: molecular structures showing cis/trans isomerism]', 'chemicals': [{'type': 'chemimage', 'title': 'molecular structures showing cis/trans isomerism', 'file': 'https://goldbook.iupac.org/img/inline/C01093.png'}], 'links': [{'title': 'Stereoisomeric', 'type': 'internal', 'url': 'https://goldbook.iupac.org/terms/view/S05983'}, {'title': 'olefins', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/O04281'}, {'title': 'cycloalkanes', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/C01497'}, {'title': 'isomer', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/I03289'}, {'title': 'trans', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/C01092'}], 'sources': ["PAC, 1996, 68, 2193. 'Basic terminology of stereochemistry (IUPAC Recommendations 1996)' on page 2204 (https://doi.org/10.1351/pac199668122193)"]}], 'referencedin': [{'title': 'Wikipedia - Cis-trans izomerie (cs)', 'url': 'https://cs.wikipedia.org/wiki/Cis-trans_izomerie'}, {'title': 'Wikipedia - Cis-trans izoméria (sk)', 'url': 'https://sk.wikipedia.org/wiki/Cis-trans_izoméria'}, {'title': 'Wikipedia - Cis–trans isomerism (en)', 'url': 'https://en.wikipedia.org/wiki/Cis–trans_isomerism'}, {'title': 'Wikipedia - Isomeria (it)', 'url': 'https://it.wikipedia.org/wiki/Isomeria'}, {'title': 'Wikipedia - Isomeria cis-trans (it)', 'url': 'https://it.wikipedia.org/wiki/Isomeria_cis-trans'}, {'title': 'Wikipedia - Isomeria geométrica (pt)', 'url': 'https://pt.wikipedia.org/wiki/Isomeria_geométrica'}, {'title': 'Wikipedia - Isomería cis-trans (es)', 'url': 'https://es.wikipedia.org/wiki/Isomería_cis-trans'}, {'title': 'Wikipedia - Talk:Isomer (en)', 'url': 'https://en.wikipedia.org/wiki/Talk:Isomer'}, {'title': 'Wikipedia - Talk:Stereoisomerism (en)', 'url': 'https://en.wikipedia.org/wiki/Talk:Stereoisomerism'}, {'title': 'Wikipedia - Цис-транс изомеризъм (bg)', 'url': 'https://bg.wikipedia.org/wiki/Цис-транс_изомеризъм'}, {'title': 'Wikipedia - Цис–транс ізомерія (uk)', 'url': 'https://uk.wikipedia.org/wiki/Цис–транс_ізомерія'}, {'title': 'Wikipedia - ایزومری سیس–ترانس (fa)', 'url': 'https://fa.wikipedia.org/wiki/ایزومری_سیس–ترانس'}, {'title': 'Wikipedia - 顺反异构 (zh)', 'url': 'https://zh.wikipedia.org/wiki/顺反异构'}], 'links': {'html': 'https://goldbook.iupac.org/terms/view/C01093/html', 'json': 'https://goldbook.iupac.org/terms/view/C01093/json', 'xml': 'https://goldbook.iupac.org/terms/view/C01093/xml', 'plain': 'https://goldbook.iupac.org/terms/view/C01093/plain', 'pdf': 'https://goldbook.iupac.org/terms/view/C01093/pdf'}, 'citation': "Citation: '<i>cis</i>-<i>trans</i> isomers' in IUPAC Compendium of Chemical Terminology, 3rd ed. International Union of Pure and Applied Chemistry; 2006. Online version 3.0.1, 2019. 10.1351/goldbook.C01093", 'license': 'The IUPAC Gold Book is licensed under Creative Commons Attribution-ShareAlike CC BY-SA 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/) for individual terms.', 'collection': 'If you are interested in licensing the Gold Book for commercial use, please contact the IUPAC Executive Director at executivedirector@iupac.org .', 'disclaimer': 'The International Union of Pure and Applied Chemistry (IUPAC) is continuously reviewing and, where needed, updating terms in the Compendium of Chemical Terminology (the IUPAC Gold Book). Users of these terms are encouraged to include the version of a term with its use and to check regularly for updates to term definitions that you are using.', 'accessed': '2024-03-15T20:24:43+00:00'}}
print(searchterm + " (" + searchcode + ")")                 # print the title and Gold Book term code
print(jsondata['term']['definitions'][0]['text'])           # extract out and print the definition of the term (compare to above)
cis-trans isomers (C01093)
Stereoisomeric olefins or cycloalkanes (or hetero-analogues) which differ in the positions of atoms (or groups) relative to a reference plane: in the cis-isomer the atoms are on the same side, in the trans-isomer they are on opposite sides. [image: molecular structures showing cis/trans isomerism]

Step 6: Try other terms#

Change the value of the ‘searchterm’ variable above and rerun steps 4 and 5