CAS Common Chemistry API#

About this interactive icons recipe
  • Author: Vincent Scalfani

  • Reviewer: Stuart Chalk

  • Topics: How to interact with the CAS Common Chemistry API using Python.

  • Attribution: Adapted from the MIT licensed University of Alabama Scholarly API Cookbook Use of the CAS Common Chemistry API requires registration: https://www.cas.org/services/commonchemistry-api. Example data shown is credited to CAS Common Chemistry, which is licensed under the CC BY-NC 4.0 license.

  • Format: Interactive Jupyter Notebook (Python)

  • Scenarios: You are search for identifiers and general property of common chemical substances.

  • Skills: You should be familiar with

  • Learning outcomes: After completing this example you should understand:

    • What kind of data is available through the CAS Common Chemistry API

    • How to interact with the CAS Common Chemistry API using Python

  • Citation: ‘CAS Common Chemistry API’, Vincent Scalfani, The IUPAC FAIR Chemistry Cookbook, Contributed: 2024-02-14 https://w3id.org/ifcc/IFCC011.

  • Reuse: This notebook is made available under a CC-BY-4.0 license.

1. Common Chemistry Record Detail Retrieval#

Information about substances in CAS Common Chemistry can be retrieved using the /detail API and a CAS RN identifier:

Import libraries#

import requests
from pprint import pprint

Setup API parameters#

detail_base_url = "https://commonchemistry.cas.org/api/detail?"
casrn1 = "10094-36-7" # ethyl cyclohexanepropionate

Request data from CAS Common Chemistry Detail API#

casrn1_data = requests.get(detail_base_url + "cas_rn=" + casrn1).json()
pprint(casrn1_data)
{'canonicalSmile': 'O=C(OCC)CCC1CCCCC1',
 'experimentalProperties': [{'name': 'Boiling Point',
                             'property': '105-113 °C @ Press: 17 Torr',
                             'sourceNumber': 1}],
 'hasMolfile': True,
 'images': ['<svg width="215" viewBox="0 0 215 101" style="fill-opacity:1; '
            'color-rendering:auto; color-interpolation:auto; '
            'text-rendering:auto; stroke:black; stroke-linecap:square; '
            'stroke-miterlimit:10; shape-rendering:auto; stroke-opacity:1; '
            'fill:black; stroke-dasharray:none; font-weight:normal; '
            "stroke-width:1; font-family:'Open Sans'; font-style:normal; "
            'stroke-linejoin:miter; font-size:12; stroke-dashoffset:0; '
            'image-rendering:auto;" height="101" class="cas-substance-image" '
            'xmlns:xlink="http://www.w3.org/1999/xlink" '
            'xmlns="http://www.w3.org/2000/svg"><svg '
            'class="cas-substance-single-component"><rect y="0" x="0" '
            'width="215" stroke="none" ry="7" rx="7" height="101" fill="white" '
            'class="cas-substance-group"/><svg y="0" x="0" width="215" '
            'viewBox="0 0 215 101" style="fill:black;" height="101" '
            'class="cas-substance-single-component-image"><svg><g><g '
            'transform="translate(107,49)" '
            'style="text-rendering:geometricPrecision; '
            'color-rendering:optimizeQuality; color-interpolation:linearRGB; '
            'stroke-linecap:butt; image-rendering:optimizeQuality;"><line '
            'y2="0" y1="15" x2="0" x1="-25.98" style="fill:none;"/><line '
            'y2="0" y1="15" x2="-51.963" x1="-25.98" style="fill:none;"/><line '
            'y2="15" y1="0" x2="25.98" x1="0" style="fill:none;"/><line '
            'y2="3.1886" y1="15" x2="46.4398" x1="25.98" '
            'style="fill:none;"/><line y2="38.5234" y1="13.9896" x2="24.23" '
            'x1="24.23" style="fill:none;"/><line y2="38.5234" y1="13.9897" '
            'x2="27.73" x1="27.73" style="fill:none;"/><line y2="15" '
            'y1="3.1786" x2="77.943" x1="57.4684" style="fill:none;"/><line '
            'y2="0" y1="15" x2="103.923" x1="77.943" style="fill:none;"/><line '
            'y2="15" y1="0" x2="-77.943" x1="-51.963" '
            'style="fill:none;"/><line y2="-30" y1="0" x2="-51.963" '
            'x1="-51.963" style="fill:none;"/><line y2="0" y1="15" '
            'x2="-103.923" x1="-77.943" style="fill:none;"/><line y2="-45" '
            'y1="-30" x2="-77.943" x1="-51.963" style="fill:none;"/><line '
            'y2="-30" y1="0" x2="-103.923" x1="-103.923" '
            'style="fill:none;"/><line y2="-30" y1="-45" x2="-103.923" '
            'x1="-77.943" style="fill:none;"/><path style="fill:none; '
            'stroke-miterlimit:5;" d="M-25.547 14.75 L-25.98 15 L-26.413 '
            '14.75"/><path style="fill:none; stroke-miterlimit:5;" d="M-0.433 '
            '0.25 L0 0 L0.433 0.25"/><path style="fill:none; '
            'stroke-miterlimit:5;" d="M25.547 14.75 L25.98 15 L26.413 '
            '14.75"/></g><g transform="translate(107,49)" '
            'style="stroke-linecap:butt; fill:rgb(230,0,0); '
            'text-rendering:geometricPrecision; '
            'color-rendering:optimizeQuality; image-rendering:optimizeQuality; '
            "font-family:'Open Sans'; stroke:rgb(230,0,0); "
            'color-interpolation:linearRGB; stroke-miterlimit:5;"><path '
            'style="stroke:none;" d="M55.9005 -0.0703 Q55.9005 1.9922 54.8614 '
            '3.1719 Q53.8224 4.3516 51.9786 4.3516 Q50.088 4.3516 49.0568 '
            '3.1875 Q48.0255 2.0234 48.0255 -0.0859 Q48.0255 -2.1797 49.0568 '
            '-3.3281 Q50.088 -4.4766 51.9786 -4.4766 Q53.838 -4.4766 54.8693 '
            '-3.3047 Q55.9005 -2.1328 55.9005 -0.0703 ZM49.0724 -0.0703 '
            'Q49.0724 1.6641 49.8146 2.5703 Q50.5568 3.4766 51.9786 3.4766 '
            'Q53.4005 3.4766 54.1271 2.5781 Q54.8536 1.6797 54.8536 -0.0703 '
            'Q54.8536 -1.8047 54.1271 -2.6953 Q53.4005 -3.5859 51.9786 -3.5859 '
            'Q50.5568 -3.5859 49.8146 -2.6875 Q49.0724 -1.7891 49.0724 -0.0703 '
            'Z"/><path style="stroke:none;" d="M29.9175 44.9297 Q29.9175 '
            '46.9922 28.8784 48.1719 Q27.8394 49.3516 25.9956 49.3516 Q24.105 '
            '49.3516 23.0737 48.1875 Q22.0425 47.0234 22.0425 44.9141 Q22.0425 '
            '42.8203 23.0737 41.6719 Q24.105 40.5234 25.9956 40.5234 Q27.855 '
            '40.5234 28.8862 41.6953 Q29.9175 42.8672 29.9175 44.9297 '
            'ZM23.0894 44.9297 Q23.0894 46.6641 23.8316 47.5703 Q24.5737 '
            '48.4766 25.9956 48.4766 Q27.4175 48.4766 28.1441 47.5781 Q28.8706 '
            '46.6797 28.8706 44.9297 Q28.8706 43.1953 28.1441 42.3047 Q27.4175 '
            '41.4141 25.9956 41.4141 Q24.5737 41.4141 23.8316 42.3125 Q23.0894 '
            '43.2109 23.0894 44.9297 Z"/><path style="fill:none; '
            'stroke:black;" d="M77.51 14.75 L77.943 15 L78.376 14.75"/><path '
            'style="fill:none; stroke:black;" d="M-77.51 14.75 L-77.943 15 '
            'L-78.376 14.75"/><path style="fill:none; stroke:black;" '
            'd="M-51.963 -29.5 L-51.963 -30 L-52.396 -30.25"/><path '
            'style="fill:none; stroke:black;" d="M-103.49 0.25 L-103.923 0 '
            'L-103.923 -0.5"/><path style="fill:none; stroke:black;" '
            'd="M-77.51 -44.75 L-77.943 -45 L-78.376 -44.75"/><path '
            'style="fill:none; stroke:black;" d="M-103.923 -29.5 L-103.923 -30 '
            'L-103.49 -30.25"/></g></g></svg></svg></svg></svg>'],
 'inchi': 'InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3',
 'inchiKey': 'InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N',
 'molecularFormula': 'C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>',
 'molecularMass': '184.28',
 'name': 'Ethyl cyclohexanepropionate',
 'propertyCitations': [{'docUri': 'document/pt/document/22252593',
                        'source': 'De Benneville, Peter L.; Journal of the '
                                  'American Chemical Society, (1940), 62, '
                                  '283-7, CAplus',
                        'sourceNumber': 1}],
 'replacedRns': [],
 'rn': '10094-36-7',
 'smile': 'C(CC(OCC)=O)C1CCCCC1',
 'synonyms': ['Cyclohexanepropanoic acid, ethyl ester',
              'Cyclohexanepropionic acid, ethyl ester',
              'Ethyl cyclohexanepropionate',
              'Ethyl cyclohexylpropanoate',
              'Ethyl 3-cyclohexylpropionate',
              'Ethyl 3-cyclohexylpropanoate',
              '3-Cyclohexylpropionic acid ethyl ester',
              'NSC 71463',
              'Ethyl 3-cyclohexanepropionate'],
 'uri': 'substance/pt/10094367'}

Display the Molecule Drawing#

# get svg image text
svg_string1 = casrn1_data["image"]

# display the molecule
from IPython.display import SVG
SVG(svg_string1)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[4], line 2
      1 # get svg image text
----> 2 svg_string1 = casrn1_data["image"]
      4 # display the molecule
      5 from IPython.display import SVG

KeyError: 'image'

Select some specific data#

# Get Experimental Properties
casrn1_data["experimentalProperties"][0]
{'name': 'Boiling Point',
 'property': '105-113 °C @ Press: 17 Torr',
 'sourceNumber': 1}
# Get Boiling Point property
casrn1_data["experimentalProperties"][0]["property"]
'105-113 °C @ Press: 17 Torr'
# Get InChIKey
casrn1_data["inchiKey"]
'InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N'
# Get Canonical SMILES
casrn1_data["canonicalSmile"]
'O=C(OCC)CCC1CCCCC1'

2. Common Chemistry API record detail retrieval in a loop#

Import libraries#

import requests
from pprint import pprint
from time import sleep

Setup API parameters#

detail_base_url = "https://commonchemistry.cas.org/api/detail?"
casrn_list = ["10094-36-7", "10031-92-2", "10199-61-8", "10036-21-2", "1019020-13-3"]

Request data for each CAS RN and save to a list#

casrn_data = []
for casrn in casrn_list:
    casrn_data.append(requests.get(detail_base_url + "cas_rn=" + casrn).json())
    sleep(1) # add a delay between API calls
casrn_data[0:2] # view first 2
[{'uri': 'substance/pt/10094367',
  'rn': '10094-36-7',
  'name': 'Ethyl cyclohexanepropionate',
  'image': '<svg width="228.6" viewBox="0 0 7620 3716" text-rendering="auto" stroke-width="1" stroke-opacity="1" stroke-miterlimit="10" stroke-linejoin="miter" stroke-linecap="square" stroke-dashoffset="0" stroke-dasharray="none" stroke="black" shape-rendering="auto" image-rendering="auto" height="111.48" font-weight="normal" font-style="normal" font-size="12" font-family="\'Dialog\'" fill-opacity="1" fill="black" color-rendering="auto" color-interpolation="auto" xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect y="0" x="0" width="7620" stroke="none" height="3716"/></g><g transform="translate(32866,32758)" text-rendering="geometricPrecision" stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line y2="-30850" y1="-31419" x2="-30792" x1="-31777" fill="none"/><line y2="-29715" y1="-30850" x2="-30792" x1="-30792" fill="none"/><line y2="-31419" y1="-30850" x2="-31777" x1="-32762" fill="none"/><line y2="-29146" y1="-29715" x2="-31777" x1="-30792" fill="none"/><line y2="-30850" y1="-29715" x2="-32762" x1="-32762" fill="none"/><line y2="-29715" y1="-29146" x2="-32762" x1="-31777" fill="none"/><line y2="-31376" y1="-30850" x2="-29885" x1="-30792" fill="none"/><line y2="-30850" y1="-31376" x2="-28978" x1="-29885" fill="none"/><line y2="-31376" y1="-30850" x2="-28071" x1="-28978" fill="none"/><line y2="-30960" y1="-31376" x2="-27352" x1="-28071" fill="none"/><line y2="-31376" y1="-30960" x2="-26257" x1="-26976" fill="none"/><line y2="-30850" y1="-31376" x2="-25350" x1="-26257" fill="none"/><line y2="-32202" y1="-31376" x2="-28140" x1="-28140" fill="none"/><line y2="-32202" y1="-31376" x2="-28002" x1="-28002" fill="none"/><text y="-30671" xml:space="preserve" x="-27317" stroke="none" font-size="433.3333" font-family="sans-serif">O</text><text y="-32242" xml:space="preserve" x="-28224" stroke="none" font-size="433.3333" font-family="sans-serif">O</text></g></g></svg>',
  'inchi': 'InChI=1S/C11H20O2/c1-2-13-11(12)9-8-10-6-4-3-5-7-10/h10H,2-9H2,1H3',
  'inchiKey': 'InChIKey=NRVPMFHPHGBQLP-UHFFFAOYSA-N',
  'smile': 'C(CC(OCC)=O)C1CCCCC1',
  'canonicalSmile': 'O=C(OCC)CCC1CCCCC1',
  'molecularFormula': 'C<sub>11</sub>H<sub>20</sub>O<sub>2</sub>',
  'molecularMass': '184.28',
  'experimentalProperties': [{'name': 'Boiling Point',
    'property': '105-113 °C @ Press: 17 Torr',
    'sourceNumber': 1}],
  'propertyCitations': [{'docUri': 'document/pt/document/22252593',
    'sourceNumber': 1,
    'source': 'De Benneville, Peter L.; Journal of the American Chemical Society, (1940), 62, 283-7, CAplus'}],
  'synonyms': ['Cyclohexanepropanoic acid, ethyl ester',
   'Cyclohexanepropionic acid, ethyl ester',
   'Ethyl cyclohexanepropionate',
   'Ethyl cyclohexylpropanoate',
   'Ethyl 3-cyclohexylpropionate',
   'Ethyl 3-cyclohexylpropanoate',
   '3-Cyclohexylpropionic acid ethyl ester',
   'NSC 71463',
   'Ethyl 3-cyclohexanepropionate'],
  'replacedRns': [],
  'hasMolfile': True},
 {'uri': 'substance/pt/10031922',
  'rn': '10031-92-2',
  'name': 'Ethyl 2-nonynoate',
  'image': '<svg width="318.24" viewBox="0 0 10608 2283" text-rendering="auto" stroke-width="1" stroke-opacity="1" stroke-miterlimit="10" stroke-linejoin="miter" stroke-linecap="square" stroke-dashoffset="0" stroke-dasharray="none" stroke="black" shape-rendering="auto" image-rendering="auto" height="68.49" font-weight="normal" font-style="normal" font-size="12" font-family="\'Dialog\'" fill-opacity="1" fill="black" color-rendering="auto" color-interpolation="auto" xmlns="http://www.w3.org/2000/svg"><g><g stroke="white" fill="white"><rect y="0" x="0" width="10608" stroke="none" height="2283"/></g><g transform="translate(32866,32758)" text-rendering="geometricPrecision" stroke-width="44" stroke-linejoin="round" stroke-linecap="round"><line y2="-31899" y1="-31899" x2="-26132" x1="-27178" fill="none"/><line y2="-31988" y1="-31988" x2="-26132" x1="-27178" fill="none"/><line y2="-31809" y1="-31809" x2="-26132" x1="-27178" fill="none"/><line y2="-31899" y1="-31899" x2="-28227" x1="-27178" fill="none"/><line y2="-31376" y1="-31899" x2="-29134" x1="-28227" fill="none"/><line y2="-31899" y1="-31376" x2="-30041" x1="-29134" fill="none"/><line y2="-31376" y1="-31899" x2="-30948" x1="-30041" fill="none"/><line y2="-31899" y1="-31376" x2="-31855" x1="-30948" fill="none"/><line y2="-31376" y1="-31899" x2="-32762" x1="-31855" fill="none"/><line y2="-31899" y1="-31899" x2="-25084" x1="-26132" fill="none"/><line y2="-32315" y1="-31899" x2="-24364" x1="-25084" fill="none"/><line y2="-31899" y1="-32315" x2="-23270" x1="-23989" fill="none"/><line y2="-32422" y1="-31899" x2="-22362" x1="-23270" fill="none"/><line y2="-31070" y1="-31899" x2="-25014" x1="-25014" fill="none"/><line y2="-31070" y1="-31899" x2="-25153" x1="-25153" fill="none"/><text y="-32242" xml:space="preserve" x="-24330" stroke="none" font-size="433.3333" font-family="sans-serif">O</text><text y="-30671" xml:space="preserve" x="-25237" stroke="none" font-size="433.3333" font-family="sans-serif">O</text></g></g></svg>',
  'inchi': 'InChI=1S/C11H18O2/c1-3-5-6-7-8-9-10-11(12)13-4-2/h3-8H2,1-2H3',
  'inchiKey': 'InChIKey=BFZNMUGAZYAMTG-UHFFFAOYSA-N',
  'smile': 'C(C#CCCCCCC)(OCC)=O',
  'canonicalSmile': 'O=C(C#CCCCCCC)OCC',
  'molecularFormula': 'C<sub>11</sub>H<sub>18</sub>O<sub>2</sub>',
  'molecularMass': '182.26',
  'experimentalProperties': [],
  'propertyCitations': [],
  'synonyms': ['2-Nonynoic acid, ethyl ester',
   'Ethyl 2-nonynoate',
   'NSC 190985'],
  'replacedRns': [],
  'hasMolfile': True}]

Display Molecule Drawings#

from IPython.display import SVG
# get svg image text
svg_strings = []
for svg_idx in range(len(casrn_data)):
    svg_strings.append(casrn_data[svg_idx]["image"])

# display the molecules
for svg_string in svg_strings:
    display(SVG(svg_string))
../_images/f61202bfc5231a15a544c00b2a1d322d50c45e3594abf6cfcc53bb759c52f6ef.svg../_images/84b4ffa29013b254af3c95907e2d1f4929811257b7b4e8dd0615e8584c705acb.svg../_images/9c636bc91d766eaeca32c48f217b396d5864cab96b6701f7b2cb02b01f06eb6a.svg../_images/7023b1a1b4e43d0f2a663efb6369adb2348ec99d220f44ba414da11a498430ec.svg../_images/82396cf91ccab678889e43dea50eb6c804e86ab9808d31d4620e0c896043aba9.svg

Select some specific data#

# Get canonical SMILES
cansmiles = []
for cansmi in range(len(casrn_data)):
    cansmiles.append(casrn_data[cansmi]["canonicalSmile"])
print(cansmiles)
['O=C(OCC)CCC1CCCCC1', 'O=C(C#CCCCCCC)OCC', 'O=C(OCC)CN1N=CC=C1', 'O=C(OCC)C1=CC=CC(=C1)CCC(=O)OCC', 'N=C(OCC)C1=CCCCC1']
# Get synonyms
synonyms_list = []
for syn in range(len(casrn_data)):
    synonyms_list.append(casrn_data[syn]["synonyms"])
pprint(synonyms_list)
[['Cyclohexanepropanoic acid, ethyl ester',
  'Cyclohexanepropionic acid, ethyl ester',
  'Ethyl cyclohexanepropionate',
  'Ethyl cyclohexylpropanoate',
  'Ethyl 3-cyclohexylpropionate',
  'Ethyl 3-cyclohexylpropanoate',
  '3-Cyclohexylpropionic acid ethyl ester',
  'NSC 71463',
  'Ethyl 3-cyclohexanepropionate'],
 ['2-Nonynoic acid, ethyl ester', 'Ethyl 2-nonynoate', 'NSC 190985'],
 ['1<em>H</em>-Pyrazole-1-acetic acid, ethyl ester',
  'Pyrazole-1-acetic acid, ethyl ester',
  'Ethyl 1<em>H</em>-pyrazole-1-acetate',
  'Ethyl 1-pyrazoleacetate',
  'Ethyl 2-(1<em>H</em>-pyrazol-1-yl)acetate'],
 ['Benzenepropanoic acid, 3-(ethoxycarbonyl)-, ethyl ester',
  'Hydrocinnamic acid, <em>m</em>-carboxy-, diethyl ester',
  'Ethyl 3-(ethoxycarbonyl)benzenepropanoate'],
 ['1-Cyclohexene-1-carboximidic acid, ethyl ester',
  'Ethyl 1-cyclohexene-1-carboximidate']]
# Transform synonym "list of lists" to a flat list
synonyms_flat = []
for sublist in synonyms_list:
    for synonym in sublist:
        synonyms_flat.append(synonym)    
pprint(synonyms_flat)
['Cyclohexanepropanoic acid, ethyl ester',
 'Cyclohexanepropionic acid, ethyl ester',
 'Ethyl cyclohexanepropionate',
 'Ethyl cyclohexylpropanoate',
 'Ethyl 3-cyclohexylpropionate',
 'Ethyl 3-cyclohexylpropanoate',
 '3-Cyclohexylpropionic acid ethyl ester',
 'NSC 71463',
 'Ethyl 3-cyclohexanepropionate',
 '2-Nonynoic acid, ethyl ester',
 'Ethyl 2-nonynoate',
 'NSC 190985',
 '1<em>H</em>-Pyrazole-1-acetic acid, ethyl ester',
 'Pyrazole-1-acetic acid, ethyl ester',
 'Ethyl 1<em>H</em>-pyrazole-1-acetate',
 'Ethyl 1-pyrazoleacetate',
 'Ethyl 2-(1<em>H</em>-pyrazol-1-yl)acetate',
 'Benzenepropanoic acid, 3-(ethoxycarbonyl)-, ethyl ester',
 'Hydrocinnamic acid, <em>m</em>-carboxy-, diethyl ester',
 'Ethyl 3-(ethoxycarbonyl)benzenepropanoate',
 '1-Cyclohexene-1-carboximidic acid, ethyl ester',
 'Ethyl 1-cyclohexene-1-carboximidate']