Accessing the IUPAC Gold Book API in Python#

About this interactive icons recipe

Step 1: Import needed Python packages#

Python has a lot of functionality that can be imported using the ‘import’ function

import requests                             # package to get data from a URL
import json                                 # package to read/write/display JSON formatted data
import re                                   # package to use regular expression (regex) searching

Step 2: Add a Python function#

This function removes HTML tags from textual data. It uses regular expressions to detect HTML tags (e.g., I am surrounded by HTML tags is really <b>I am surrounded by HTML tags</b> in the page code).

# Source: https://medium.com/@jorlugaqui/how-to-strip-html-tags-from-a-string-in-python-7cb81a2bbf44
def remove_html_tags(text):                 # a 'def' is a (defined) function that can be called later
    clean = re.compile('<.*?>')             # sets up a regular expression to search with
    return re.sub(clean, '', text)          # removes the matches to the regular expression

Step 3: Download a JSON file#

Download data for all the IUPAC Recommended Terms currently available. Even though the amount of data that we download here is big (804 kB), it is better to get the data all at once rather than call the API every time in a loop. This makes the ‘for’ loop in Step 4 much faster.

allpath = "https://goldbook.iupac.org/terms/index/all/json"  # URL to the IUPAC Gold Book API down
reqdata = requests.get(allpath)                              # download file in JSON
terms = json.loads(reqdata.content)                          # convert JSON to a Python dictionary
print(str(len(terms['terms']['list'])) + ' terms')           # print the number of terms in the list
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Cell In[3], line 3
      1 allpath = "https://goldbook.iupac.org/terms/index/all/json"  # URL to the IUPAC Gold Book API down
      2 reqdata = requests.get(allpath)                              # download file in JSON
----> 3 terms = json.loads(reqdata.content)                          # convert JSON to a Python dictionary
      4 print(str(len(terms['terms']['list'])) + ' terms')           # print the number of terms in the list

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:
    348     cls = JSONDecoder

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Step 4: Search for a term#

Here we search the recommended term list and if present get the terms code. We use the function above to ‘normalize’ the text of the titles from the Gold Book entries, by removing the HTML markup, so they match the term we are looking for. (Note: not all term titles have HTML in them)

searchterm = "cis-trans isomers"                            # the term to be found
searchcode = None                                           # empty variable to contain the searchcode
rawtitle = None                                             # empty variable to contain the raw title string
for code, term in terms['terms']['list'].items():           # iterate over each term in the list (code (str), term (obj))
    cleaned = remove_html_tags(term['title'])               # remove any HTML formatting in the title
    if cleaned == searchterm:                               # check if the term matches the one we want
        searchcode = code                                   # if it does, get the code for the term
        rawtitle = term['title']                            # saw the raw title so we can see it below
        break                                               # we have found the term, so we can get out of the for loop
print(rawtitle)                                             # IUPAC Gold Book term code (if found)
print(searchcode)                                           # IUPAC Gold Book term code (if found)
<i>cis</i>-<i>trans</i> isomers
C01093

Step 5: Use the term code to retrieve its definition#

Generate a URL to get data about a term, print out the term, its code and its definition

path = "https://goldbook.iupac.org/terms/view/**/json"      # URL path to the IUPAC Gold Book API for a term
reqdata = requests.get(path.replace("**", searchcode))      # request data from the Gold Book server
jsondata = json.loads(reqdata.content)                      # get the downloaded JSON
print(jsondata)                                             # print out all the downloaded data, so we can 'see' its structure and know how to get the definition
{'term': {'id': '01093', 'doi': '10.1351/goldbook.C01093', 'code': 'C01093', 'status': 'current', 'longtitle': 'IUPAC Gold Book - cis-trans isomers', 'title': '<i>cis</i>-<i>trans</i> isomers', 'version': '2.3.3', 'lastupdated': '2014-02-24', 'definitions': [{'id': '1', 'text': 'Stereoisomeric olefins or cycloalkanes (or hetero-analogues) which differ in the positions of atoms (or groups) relative to a reference plane: in the cis-isomer the atoms are on the same side, in the trans-isomer they are on opposite sides. [image: molecular structures showing cis/trans isomerism]', 'chemicals': [{'type': 'chemimage', 'title': 'molecular structures showing cis/trans isomerism', 'file': 'https://goldbook.iupac.org/img/inline/C01093.png'}], 'links': [{'title': 'Stereoisomeric', 'type': 'internal', 'url': 'https://goldbook.iupac.org/terms/view/S05983'}, {'title': 'olefins', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/O04281'}, {'title': 'cycloalkanes', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/C01497'}, {'title': 'isomer', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/I03289'}, {'title': 'trans', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/C01092'}], 'sources': ["PAC, 1996, 68, 2193. 'Basic terminology of stereochemistry (IUPAC Recommendations 1996)' on page 2204 (https://doi.org/10.1351/pac199668122193)"]}], 'referencedin': [{'title': 'Wikipedia - Cis-trans izomerie (cs)', 'url': 'https://cs.wikipedia.org/wiki/Cis-trans_izomerie'}, {'title': 'Wikipedia - Cis-trans izoméria (sk)', 'url': 'https://sk.wikipedia.org/wiki/Cis-trans_izoméria'}, {'title': 'Wikipedia - Cis–trans isomerism (en)', 'url': 'https://en.wikipedia.org/wiki/Cis–trans_isomerism'}, {'title': 'Wikipedia - Isomeria (it)', 'url': 'https://it.wikipedia.org/wiki/Isomeria'}, {'title': 'Wikipedia - Isomeria cis-trans (it)', 'url': 'https://it.wikipedia.org/wiki/Isomeria_cis-trans'}, {'title': 'Wikipedia - Isomeria geométrica (pt)', 'url': 'https://pt.wikipedia.org/wiki/Isomeria_geométrica'}, {'title': 'Wikipedia - Isomería cis-trans (es)', 'url': 'https://es.wikipedia.org/wiki/Isomería_cis-trans'}, {'title': 'Wikipedia - Talk:Isomer (en)', 'url': 'https://en.wikipedia.org/wiki/Talk:Isomer'}, {'title': 'Wikipedia - Talk:Stereoisomerism (en)', 'url': 'https://en.wikipedia.org/wiki/Talk:Stereoisomerism'}, {'title': 'Wikipedia - Цис–транс ізомерія (uk)', 'url': 'https://uk.wikipedia.org/wiki/Цис–транс_ізомерія'}, {'title': 'Wikipedia - ایزومری سیس–ترانس (fa)', 'url': 'https://fa.wikipedia.org/wiki/ایزومری_سیس–ترانس'}, {'title': 'Wikipedia - 顺反异构 (zh)', 'url': 'https://zh.wikipedia.org/wiki/顺反异构'}], 'links': {'html': 'https://goldbook.iupac.org/terms/view/C01093/html', 'json': 'https://goldbook.iupac.org/terms/view/C01093/json', 'xml': 'https://goldbook.iupac.org/terms/view/C01093/xml', 'plain': 'https://goldbook.iupac.org/terms/view/C01093/plain', 'pdf': 'https://goldbook.iupac.org/terms/view/C01093/pdf'}, 'citeas': 'IUPAC. Compendium of Chemical Terminology, 2nd ed. (the "Gold Book"). Compiled by A. D. McNaught and A. Wilkinson. Blackwell Scientific Publications, Oxford (1997). Online version (2019-) created by S. J. Chalk. ISBN 0-9678550-9-8. https://doi.org/10.1351/goldbook.', 'license': 'Licensed under Creative Commons Attribution-NoDerivatives (CC BY-NC-ND) 4.0 International (https://creativecommons.org/licenses/by-nc-nd/4.0/)', 'collection reuse': 'For parties interested in reusing the entire IUPAC Gold Book please contact IUPAC here https://www.cognitoforms.com/IUPAC1/ContactInformationForm', 'disclaimer': 'The International Union of Pure and Applied Chemistry (IUPAC) is continuously reviewing and, where needed, updating terms in the Compendium of Chemical Terminology (the IUPAC Gold Book). Users of these terms are encouraged to include the version of a term with its use and to check regularly for updates to term definitions that you are using.', 'accessed': '2023-03-02T15:58:17+00:00'}}
print(searchterm + " (" + searchcode + ")")                 # print the title and Gold Book term code
print(jsondata['term']['definitions'][0]['text'])           # extract out and print the definition of the term (compare to above)
cis-trans isomers (C01093)
Stereoisomeric olefins or cycloalkanes (or hetero-analogues) which differ in the positions of atoms (or groups) relative to a reference plane: in the cis-isomer the atoms are on the same side, in the trans-isomer they are on opposite sides. [image: molecular structures showing cis/trans isomerism]

Step 6: Try other terms#

Change the value of the ‘searchterm’ variable above and rerun steps 4 and 5