Accessing PubChem through PUG-REST: Part III#

About this interactive icons recipe
  • Author(s): Sunghwan Kim

  • Reviewer: Samuel Munday

  • Topic(s): How to retrieve chemical data using chemical identifiers.

  • Format: Interactive Jupyter Notebook (Python)

  • Scenario: You need to access and potentially download chemical data.

  • Skills: You should be familar with:

  • Learning outcomes:

    • How to access PubChem chemical data using a chemical identifiers

    • How to search PubChem using 2-D and 3-D molecular similarity

    • How to search PubChem using substructures and superstructures

  • Citation: ‘Accessing PubChem through PUG-REST - Part III’, Sunghwan Kim, The IUPAC FAIR Chemistry Cookbook, Contributed: 2023-02-28 https://w3id.org/ifcc/IFCC008.

  • Reuse: This notebook is made available under a CC-BY-4.0 license.

import requests
import time
import io
import csv
from IPython.display import Image, display

1. Using a SMILES or InChI string as an input query#

smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"
print(requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/" + smiles + "/cids/txt").text.strip())
2244

Some SMILES strings contain characters not compatible with the PUG-REST request URL syntax. For example, isomeric SMILES uses the “/” character (forward slash) to represent the E/Z or cis/trans stereochemistry of a molecule. However, because the “/” character is also used in the request URL to separate the segments of the URL path, the use of such SMILES strings as an input structure will result an error.

smiles = "CC(C)C1=NC(=NC(=C1/C=C/[C@H](C[C@H](CC(=O)O)O)O)C2=CC=C(C=C2)F)N(C)S(=O)(=O)C"
print(requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/" + smiles + "/cids/txt").text.strip())
Status: 400
Code: PUGREST.BadRequest
Message: Unable to standardize the given structure - perhaps some special characters need to be escaped or data packed in a MIME form?
Detail: error: 
Detail: status: 400
Detail: output: Caught ncbi::CException: Standardization failed
Detail: Output Log:
Detail: Record 1: Warning: Cactvs Ensemble cannot be created from input string
Detail: Record 1: Error: Unable to convert input into a compound object
Detail: 
Detail:

To circumvent this issue, the SMILES input should be provided in one of the following two ways:

  1. as a URL parameter

  2. in the HTTP header (using the HTTP POST method).

smiles = "CC(C)C1=NC(=NC(=C1/C=C/[C@H](C[C@H](CC(=O)O)O)O)C2=CC=C(C=C2)F)N(C)S(=O)(=O)C"

# As a URL parameter
print(requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt" + "?smiles=" + smiles).text.strip())

# In the HTTP header (using HTTP Post)
print(requests.post("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt", data={'smiles':smiles}).text.strip())
446157
446157

InChI encodes the chemical structure information into multiple layers and sublayers, separated by the “/” character. For this reason, InChI strings should also be provided as a URL parameter or in the HTTP header (using HTTP host).

inchi = "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)"

# With the request URL : WILL NOT WORK
#print(requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchi/" + inchi + "/cids/txt").text.strip())

# As a URL parameter
print(requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchi/cids/txt" + "?inchi=" + inchi).text.strip())

# In the HTTP header (using HTTP Post)
print(requests.post("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchi/cids/txt", data={'inchi':inchi}).text.strip())
2244
2244