Accessing the IUPAC Gold Book API in Python#
About this interactive recipe
Author: Stuart Chalk
Reviewer: Sam Munday
Topics: The IUPAC Gold Book, APIs, JSON
Format: Interactive Jupyter Notebook (Python)
Scenarios: Retrieve the definition of a chemical concept via code
Skills: You should be familiar with
Learning outcomes: After completing this example you should understand:
Python functions (‘def’ code blocks)
How to write Python code to request data from a URL (typically an API)
How to use a Python variable to call an API and download data
Citation: ‘Accessing the IUPAC Gold Book API in Python’, Stuart Chalk, The IUPAC FAIR Chemistry Cookbook, Contributed: 2023-02-28 https://w3id.org/ifcc/IFCC003.
Reuse: This notebook is made available under a CC-BY-4.0 license.
Step 1: Import needed Python packages#
Python has a lot of functionality that can be imported using the ‘import’ function
import requests # package to get data from a URL
import json # package to read/write/display JSON formatted data
import re # package to use regular expression (regex) searching
Step 2: Add a Python function#
This function removes HTML tags from textual data. It uses regular expressions to detect HTML tags (e.g., I am surrounded by HTML tags is really <b>I am surrounded by HTML tags</b> in the page code).
# Source: https://medium.com/@jorlugaqui/how-to-strip-html-tags-from-a-string-in-python-7cb81a2bbf44
def remove_html_tags(text): # a 'def' is a (defined) function that can be called later
clean = re.compile('<.*?>') # sets up a regular expression to search with
return re.sub(clean, '', text) # removes the matches to the regular expression
Step 3: Download a JSON file#
Download data for all the IUPAC Recommended Terms currently available. Even though the amount of data that we download here is big (804 kB), it is better to get the data all at once rather than call the API every time in a loop. This makes the ‘for’ loop in Step 4 much faster.
allpath = "https://goldbook.iupac.org/terms/index/all/json" # URL to the IUPAC Gold Book API down
reqdata = requests.get(allpath) # download file in JSON
terms = json.loads(reqdata.content) # convert JSON to a Python dictionary
print(str(len(terms['terms']['list'])) + ' terms') # print the number of terms in the list
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
Cell In[3], line 3
1 allpath = "https://goldbook.iupac.org/terms/index/all/json" # URL to the IUPAC Gold Book API down
2 reqdata = requests.get(allpath) # download file in JSON
----> 3 terms = json.loads(reqdata.content) # convert JSON to a Python dictionary
4 print(str(len(terms['terms']['list'])) + ' terms') # print the number of terms in the list
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
341 s = s.decode(detect_encoding(s), 'surrogatepass')
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
348 cls = JSONDecoder
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
332 def decode(self, s, _w=WHITESPACE.match):
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Step 4: Search for a term#
Here we search the recommended term list and if present get the terms code. We use the function above to ‘normalize’ the text of the titles from the Gold Book entries, by removing the HTML markup, so they match the term we are looking for. (Note: not all term titles have HTML in them)
searchterm = "cis-trans isomers" # the term to be found
searchcode = None # empty variable to contain the searchcode
rawtitle = None # empty variable to contain the raw title string
for code, term in terms['terms']['list'].items(): # iterate over each term in the list (code (str), term (obj))
cleaned = remove_html_tags(term['title']) # remove any HTML formatting in the title
if cleaned == searchterm: # check if the term matches the one we want
searchcode = code # if it does, get the code for the term
rawtitle = term['title'] # saw the raw title so we can see it below
break # we have found the term, so we can get out of the for loop
print(rawtitle) # IUPAC Gold Book term code (if found)
print(searchcode) # IUPAC Gold Book term code (if found)
<i>cis</i>-<i>trans</i> isomers
C01093
Step 5: Use the term code to retrieve its definition#
Generate a URL to get data about a term, print out the term, its code and its definition
path = "https://goldbook.iupac.org/terms/view/**/json" # URL path to the IUPAC Gold Book API for a term
reqdata = requests.get(path.replace("**", searchcode)) # request data from the Gold Book server
jsondata = json.loads(reqdata.content) # get the downloaded JSON
print(jsondata) # print out all the downloaded data, so we can 'see' its structure and know how to get the definition
{'term': {'id': '01093', 'doi': '10.1351/goldbook.C01093', 'code': 'C01093', 'status': 'current', 'longtitle': 'IUPAC Gold Book - cis-trans isomers', 'title': '<i>cis</i>-<i>trans</i> isomers', 'version': '2.3.3', 'lastupdated': '2014-02-24', 'definitions': [{'id': '1', 'text': 'Stereoisomeric olefins or cycloalkanes (or hetero-analogues) which differ in the positions of atoms (or groups) relative to a reference plane: in the cis-isomer the atoms are on the same side, in the trans-isomer they are on opposite sides. [image: molecular structures showing cis/trans isomerism]', 'chemicals': [{'type': 'chemimage', 'title': 'molecular structures showing cis/trans isomerism', 'file': 'https://goldbook.iupac.org/img/inline/C01093.png'}], 'links': [{'title': 'Stereoisomeric', 'type': 'internal', 'url': 'https://goldbook.iupac.org/terms/view/S05983'}, {'title': 'olefins', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/O04281'}, {'title': 'cycloalkanes', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/C01497'}, {'title': 'isomer', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/I03289'}, {'title': 'trans', 'type': 'goldify', 'url': 'https://goldbook.iupac.org/terms/view/C01092'}], 'sources': ["PAC, 1996, 68, 2193. 'Basic terminology of stereochemistry (IUPAC Recommendations 1996)' on page 2204 (https://doi.org/10.1351/pac199668122193)"]}], 'referencedin': [{'title': 'Wikipedia - Cis-trans izomerie (cs)', 'url': 'https://cs.wikipedia.org/wiki/Cis-trans_izomerie'}, {'title': 'Wikipedia - Cis-trans izoméria (sk)', 'url': 'https://sk.wikipedia.org/wiki/Cis-trans_izoméria'}, {'title': 'Wikipedia - Cis–trans isomerism (en)', 'url': 'https://en.wikipedia.org/wiki/Cis–trans_isomerism'}, {'title': 'Wikipedia - Isomeria (it)', 'url': 'https://it.wikipedia.org/wiki/Isomeria'}, {'title': 'Wikipedia - Isomeria cis-trans (it)', 'url': 'https://it.wikipedia.org/wiki/Isomeria_cis-trans'}, {'title': 'Wikipedia - Isomeria geométrica (pt)', 'url': 'https://pt.wikipedia.org/wiki/Isomeria_geométrica'}, {'title': 'Wikipedia - Isomería cis-trans (es)', 'url': 'https://es.wikipedia.org/wiki/Isomería_cis-trans'}, {'title': 'Wikipedia - Talk:Isomer (en)', 'url': 'https://en.wikipedia.org/wiki/Talk:Isomer'}, {'title': 'Wikipedia - Talk:Stereoisomerism (en)', 'url': 'https://en.wikipedia.org/wiki/Talk:Stereoisomerism'}, {'title': 'Wikipedia - Цис–транс ізомерія (uk)', 'url': 'https://uk.wikipedia.org/wiki/Цис–транс_ізомерія'}, {'title': 'Wikipedia - ایزومری سیس–ترانس (fa)', 'url': 'https://fa.wikipedia.org/wiki/ایزومری_سیس–ترانس'}, {'title': 'Wikipedia - 顺反异构 (zh)', 'url': 'https://zh.wikipedia.org/wiki/顺反异构'}], 'links': {'html': 'https://goldbook.iupac.org/terms/view/C01093/html', 'json': 'https://goldbook.iupac.org/terms/view/C01093/json', 'xml': 'https://goldbook.iupac.org/terms/view/C01093/xml', 'plain': 'https://goldbook.iupac.org/terms/view/C01093/plain', 'pdf': 'https://goldbook.iupac.org/terms/view/C01093/pdf'}, 'citeas': 'IUPAC. Compendium of Chemical Terminology, 2nd ed. (the "Gold Book"). Compiled by A. D. McNaught and A. Wilkinson. Blackwell Scientific Publications, Oxford (1997). Online version (2019-) created by S. J. Chalk. ISBN 0-9678550-9-8. https://doi.org/10.1351/goldbook.', 'license': 'Licensed under Creative Commons Attribution-NoDerivatives (CC BY-NC-ND) 4.0 International (https://creativecommons.org/licenses/by-nc-nd/4.0/)', 'collection reuse': 'For parties interested in reusing the entire IUPAC Gold Book please contact IUPAC here https://www.cognitoforms.com/IUPAC1/ContactInformationForm', 'disclaimer': 'The International Union of Pure and Applied Chemistry (IUPAC) is continuously reviewing and, where needed, updating terms in the Compendium of Chemical Terminology (the IUPAC Gold Book). Users of these terms are encouraged to include the version of a term with its use and to check regularly for updates to term definitions that you are using.', 'accessed': '2023-03-02T15:58:17+00:00'}}
print(searchterm + " (" + searchcode + ")") # print the title and Gold Book term code
print(jsondata['term']['definitions'][0]['text']) # extract out and print the definition of the term (compare to above)
cis-trans isomers (C01093)
Stereoisomeric olefins or cycloalkanes (or hetero-analogues) which differ in the positions of atoms (or groups) relative to a reference plane: in the cis-isomer the atoms are on the same side, in the trans-isomer they are on opposite sides. [image: molecular structures showing cis/trans isomerism]
Step 6: Try other terms#
Change the value of the ‘searchterm’ variable above and rerun steps 4 and 5