Accessing the Chemical Elements Data from PubChem

Accessing the Chemical Elements Data from PubChem#

PubChem provides the Periodic Table (https://pubchem.ncbi.nlm.nih.gov/periodic-table/) and Element Pages to help users navigate the abundant chemical element data available within PubChem. More detailed information on these services is provided in the following paper:

PubChem Periodic Table and Element Pages: Improving Access to Information on Chemical Elements from Authoritative Sources
Kim et al., Chemistry Teacher International, 2021, 3(1):57-65.
DOI: 10.1515/cti-2020-0006

The data underlying the PubChem Periodic Table and Element Pages are freely available for download. This Jupyter notebook demonstrates how to download the properties of chemical elements and visualize the periodicity of those properties. The data used in this notebook can be downloaded from the PubChem Periodic Table (https://pubchem.ncbi.nlm.nih.gov/periodic-table/), by clicking the DOWNLOAD button and then the CSV button.

PubChem-Periodic-Table

Alternatively, the data can also be downloaded using a python script, as demonstrated below. Here, we use a python data analysis library called pandas. Using pandas, we can directly load the data from the web into a data frame.

import requests
import pandas as pd

url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/periodictable/CSV"
df = pd.read_csv(url)

df

	AtomicNumber	Symbol	Name	AtomicMass	CPKHexColor	ElectronConfiguration	Electronegativity	AtomicRadius	IonizationEnergy	ElectronAffinity	OxidationStates	StandardState	MeltingPoint	BoilingPoint	Density	GroupBlock	YearDiscovered
0	1	H	Hydrogen	1.008000	FFFFFF	1s1	2.20	120.0	13.598	0.754	+1, -1	Gas	13.81	20.28	0.000090	Nonmetal	1766
1	2	He	Helium	4.002600	D9FFFF	1s2	NaN	140.0	24.587	NaN	0	Gas	0.95	4.22	0.000179	Noble gas	1868
2	3	Li	Lithium	7.000000	CC80FF	[He]2s1	0.98	182.0	5.392	0.618	+1	Solid	453.65	1615.00	0.534000	Alkali metal	1817
3	4	Be	Beryllium	9.012183	C2FF00	[He]2s2	1.57	153.0	9.323	NaN	+2	Solid	1560.00	2744.00	1.850000	Alkaline earth metal	1798
4	5	B	Boron	10.810000	FFB5B5	[He]2s2 2p1	2.04	192.0	8.298	0.277	+3	Solid	2348.00	4273.00	2.370000	Metalloid	1808
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
113	114	Fl	Flerovium	290.192000	NaN	[Rn]7s2 7p2 5f14 6d10 (predicted)	NaN	NaN	NaN	NaN	6, 4,2, 1, 0	Expected to be a Solid	NaN	NaN	NaN	Post-transition metal	1998
114	115	Mc	Moscovium	290.196000	NaN	[Rn]7s2 7p3 5f14 6d10 (predicted)	NaN	NaN	NaN	NaN	3, 1	Expected to be a Solid	NaN	NaN	NaN	Post-transition metal	2003
115	116	Lv	Livermorium	293.205000	NaN	[Rn]7s2 7p4 5f14 6d10 (predicted)	NaN	NaN	NaN	NaN	+4, +2, -2	Expected to be a Solid	NaN	NaN	NaN	Post-transition metal	2000
116	117	Ts	Tennessine	294.211000	NaN	[Rn]7s2 7p5 5f14 6d10 (predicted)	NaN	NaN	NaN	NaN	+5, +3, +1, -1	Expected to be a Solid	NaN	NaN	NaN	Halogen	2010
117	118	Og	Oganesson	295.216000	NaN	[Rn]7s2 7p6 5f14 6d10 (predicted)	NaN	NaN	NaN	NaN	+6, +4, +2, +1, 0, -1	Expected to be a Gas	NaN	NaN	NaN	Noble gas	2006

118 rows × 17 columns

df.shape      # Check the dimension of the data frame (the number of rows, the number of columns)

(118, 17)

df.columns    # Check the available information.

Index(['AtomicNumber', 'Symbol', 'Name', 'AtomicMass', 'CPKHexColor',
       'ElectronConfiguration', 'Electronegativity', 'AtomicRadius',
       'IonizationEnergy', 'ElectronAffinity', 'OxidationStates',
       'StandardState', 'MeltingPoint', 'BoilingPoint', 'Density',
       'GroupBlock', 'YearDiscovered'],
      dtype='object')

In some charts generated in this notebook, data will be color-coded using the information on the period of an element (i.e., the row on the periodic table where an element appears). However, as shown in the above code cell, this information is not available in the data frame. Therefore, we add it to the data frame.

Period = [None] * 118

for i in range(0,len(Period)):

    an =i + 1             # Atomic number

    if (1 <= an) & (an <= 2) :
        Period[i] = "Period 1"
    elif (3 <= an) & (an <= 10) :
        Period[i] = "Period 2"
    elif (11 <= an) & (an <= 18) :
        Period[i] = "Period 3"
    elif (19 <= an) & (an <= 36) :
        Period[i] = "Period 4"
    elif (37 <=an ) & (an <= 54) : 
        Period[i] = "Period 5"
    elif (55 <= an) & (an <= 86) :
        Period[i] = "Period 6"
    elif (87 <= an) & (an <= 118) :
        Period[i] = "Period 7"

df["Period"] = Period

Now let’s import matplotlib and seaborn for data visualization.

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("darkgrid")

1. Atomic Mass#

ax=sns.scatterplot(x=df.AtomicNumber, y=df.AtomicMass, color='Blue')
ax.set(xlabel='Atomic Number')
ax.set(ylabel='Atomic Mass')

[Text(0, 0.5, 'Atomic Mass')]

../_images/381b5ae99a754693ec8bdc710d2d1fe3e6c4890774ed9119836593d33dadbc78.png

2. Ionization Energy, Electron Affinity & Electronegativity#

Now we want to look into the periodicity of the ionization energy, electron affinity, and electronegativity among the chemical elements. These properties are not known for some elements, especially for elements heavier than Lr (Atomic Number=103), as identified in the following code cell.

df[(df.IonizationEnergy.isnull()) & (df.ElectronAffinity.isnull()) & (df.Electronegativity.isnull())][['AtomicNumber', 'Symbol','IonizationEnergy','ElectronAffinity','Electronegativity']]

	AtomicNumber	Symbol	IonizationEnergy	ElectronAffinity	Electronegativity
103	104	Rf	NaN	NaN	NaN
104	105	Db	NaN	NaN	NaN
105	106	Sg	NaN	NaN	NaN
106	107	Bh	NaN	NaN	NaN
107	108	Hs	NaN	NaN	NaN
108	109	Mt	NaN	NaN	NaN
109	110	Ds	NaN	NaN	NaN
110	111	Rg	NaN	NaN	NaN
111	112	Cn	NaN	NaN	NaN
112	113	Nh	NaN	NaN	NaN
113	114	Fl	NaN	NaN	NaN
114	115	Mc	NaN	NaN	NaN
115	116	Lv	NaN	NaN	NaN
116	117	Ts	NaN	NaN	NaN
117	118	Og	NaN	NaN	NaN

Therefore, let’s make a new data frame that does not have those elements without the Ionization energy, electron affinity, and electronegativity data.

df_tmp = df.dropna(axis=0, how='all', subset=['IonizationEnergy','ElectronAffinity','Electronegativity'])

df_tmp.shape

(103, 18)

2.1. Ionization Potential#

plt.figure(figsize=(30,5))
ax=sns.barplot(data=df_tmp, x="Symbol", y="IonizationEnergy", hue="Period", dodge=False)
ax.set(xlabel="Atomic Symbol", ylabel="Ionization Energy (eV)")
plt.legend(loc='upper right');

../_images/a6b27a0d7f1a5ba34d583f842adbf371eff3a1b234242e3bda0750acf0983267.png

2.2. Electron Affinity#

plt.figure(figsize=(30,5))
ax=sns.barplot(data=df_tmp, x="Symbol", y="ElectronAffinity", hue="Period", dodge=False)
ax.set(xlabel="Atomic Symbol", ylabel="Electron Affinity (eV)")
plt.legend(loc='upper right');

../_images/37c673fbbe83695b93b63e68f1a6596b1a6a73e42015d9b3c9dc3af61d79fff6.png

2.3. Electronegativity#

plt.figure(figsize=(30,5))
ax=sns.barplot(data=df_tmp, x="Symbol", y="Electronegativity", hue="Period", dodge=False)
ax.set(xlabel="Atomic Symbol", ylabel="Electronegativity")
plt.legend(loc='upper right');

../_images/1ae86a19810bd5e88053c53d3a23b9674b20900d55712881d7fcb0f5d0206f60.png

2.4. Comparison between Ionization Energy, Electron Affinity, and Electronegativity#

fig, axs = plt.subplots(ncols=3,figsize=(20,6))

sns.scatterplot(data=df_tmp, x="ElectronAffinity", y="IonizationEnergy", hue="Period", ax=axs[0])
axs[0].set(xlabel="Electron Affinity (eV)", ylabel="Ionization Energy (eV)")

sns.scatterplot(data=df_tmp, x="Electronegativity", y="IonizationEnergy", hue="Period", ax=axs[1])
axs[1].set(xlabel="Electronegativity", ylabel="Ionization Energy (eV)")

sns.scatterplot(data=df_tmp, x="Electronegativity", y="ElectronAffinity", hue="Period", ax=axs[2])
axs[2].set(xlabel="Electronegativity", ylabel="Electron Affinity (eV)");

../_images/9d7789d0308797c4fd5e99c1599568230b3f7d71aa20c3047f9f4c26d533d56c.png

3. Melting Point and Boiling Point#

Now let’s visualize the melting and boiling points of chemical elements. These data are not available for all compounds, so we want to exclude those elements without the necessary data.

df[(df.MeltingPoint.isnull()) & (df.BoilingPoint.isnull())][['AtomicNumber', 'Symbol','MeltingPoint','BoilingPoint']]

	AtomicNumber	Symbol	MeltingPoint	BoilingPoint
103	104	Rf	NaN	NaN
104	105	Db	NaN	NaN
105	106	Sg	NaN	NaN
106	107	Bh	NaN	NaN
107	108	Hs	NaN	NaN
108	109	Mt	NaN	NaN
109	110	Ds	NaN	NaN
110	111	Rg	NaN	NaN
111	112	Cn	NaN	NaN
112	113	Nh	NaN	NaN
113	114	Fl	NaN	NaN
114	115	Mc	NaN	NaN
115	116	Lv	NaN	NaN
116	117	Ts	NaN	NaN
117	118	Og	NaN	NaN

The elements with an atomic number greater than 103 do not have melting and boiling point data. Therefore, we want to exclude those elements by using the df_tmp data frame, which was used previously to create the charts for the ionization potential, electron affinity, and electronegativity.

3.1. Melting Point#

plt.figure(figsize=(25,5))
ax=sns.barplot(data=df_tmp, x="Symbol", y="MeltingPoint", hue="Period", dodge=False)
ax.set(xlabel="Atomic Symbol", ylabel="Melting Point (K)")
plt.legend(loc='upper right');

../_images/6b26533b9d82ae3280524c1f878db3a15c72dc43a093657425c6a6cea1178b8b.png

3.2. Boiling Point#

plt.figure(figsize=(25,5))
sns.barplot(data=df_tmp, x="Symbol", y="BoilingPoint", hue="Period", dodge=False)
ax.set(xlabel="Atomic Symbol", ylabel="Boiling Point (K)")
plt.legend(loc='upper right');

../_images/2de2e22bcdbfc6fa12fdb19eb86f1d5059f86e4f761f16a9969d46b55c3795f1.png

3.3. Comparison between Melting and Boiling Points#

plt.figure(figsize=(6,6))
ax=sns.scatterplot(data=df, x="MeltingPoint", y="BoilingPoint", hue="Period")
ax.set(xlabel="Boiling Point (K)", ylabel="Melting Point (K)");

../_images/71d7a3bd0b736d77b54155497ba9b8b534fd685916e01df1f26e95fce57686ea.png

4. Density#

Now let’s look into the density of chemical elements. The density data used here are measured at STP (standard temperature and pressure, 0 °C and 1 atm).

plt.figure(figsize=(25,5))
ax=sns.barplot(data=df_tmp, x="Symbol", y="Density", hue="Period", dodge=False)   # Use df_tmp to exclude the elements without relevant data.
ax.set(xlabel="Atomic Symbol", ylabel="Density ($g/cm^{3}$)");

../_images/cbf1f4812e19ced097507bfd8aa6acabfa44d5d8347fd4fc1d7f5daa97eb28f9.png

Some elements have very small density values, less than 0.5 g/cm³. This is also shown in the following histogram.

import numpy as np

bins=np.arange(0,25,0.5).tolist()    # list from 0 to 25 with an increment of 0.5
ax = sns.histplot(data=df_tmp, x='Density', bins=bins)
ax.set(xlabel="Density ($g/cm^{3}$)");

../_images/b18037148779b807661a28bc546c93c7bc2bc85af1d9a004f066197a1c2c8cd7.png

df_tmp[ df_tmp.Density < 0.5 ][['AtomicNumber','Symbol','Density','StandardState']]

	AtomicNumber	Symbol	Density	StandardState
0	1	H	0.000090	Gas
1	2	He	0.000179	Gas
6	7	N	0.001251	Gas
7	8	O	0.001429	Gas
8	9	F	0.001696	Gas
9	10	Ne	0.000900	Gas
16	17	Cl	0.003214	Gas
17	18	Ar	0.001784	Gas
35	36	Kr	0.003733	Gas
53	54	Xe	0.005887	Gas
85	86	Rn	0.009730	Gas

All these elements with densities < 0.5 g/cm³ exist as gas at STP.

	AtomicNumber	Symbol	IonizationEnergy	ElectronAffinity	Electronegativity
103	104	Rf	NaN	NaN	NaN
104	105	Db	NaN	NaN	NaN
105	106	Sg	NaN	NaN	NaN
106	107	Bh	NaN	NaN	NaN
107	108	Hs	NaN	NaN	NaN
108	109	Mt	NaN	NaN	NaN
109	110	Ds	NaN	NaN	NaN
110	111	Rg	NaN	NaN	NaN
111	112	Cn	NaN	NaN	NaN
112	113	Nh	NaN	NaN	NaN
113	114	Fl	NaN	NaN	NaN
114	115	Mc	NaN	NaN	NaN
115	116	Lv	NaN	NaN	NaN
116	117	Ts	NaN	NaN	NaN
117	118	Og	NaN	NaN	NaN

	AtomicNumber	Symbol	MeltingPoint	BoilingPoint
103	104	Rf	NaN	NaN
104	105	Db	NaN	NaN
105	106	Sg	NaN	NaN
106	107	Bh	NaN	NaN
107	108	Hs	NaN	NaN
108	109	Mt	NaN	NaN
109	110	Ds	NaN	NaN
110	111	Rg	NaN	NaN
111	112	Cn	NaN	NaN
112	113	Nh	NaN	NaN
113	114	Fl	NaN	NaN
114	115	Mc	NaN	NaN
115	116	Lv	NaN	NaN
116	117	Ts	NaN	NaN
117	118	Og	NaN	NaN

	AtomicNumber	Symbol	IonizationEnergy	ElectronAffinity	Electronegativity
103	104	Rf	NaN	NaN	NaN
104	105	Db	NaN	NaN	NaN
105	106	Sg	NaN	NaN	NaN
106	107	Bh	NaN	NaN	NaN
107	108	Hs	NaN	NaN	NaN
108	109	Mt	NaN	NaN	NaN
109	110	Ds	NaN	NaN	NaN
110	111	Rg	NaN	NaN	NaN
111	112	Cn	NaN	NaN	NaN
112	113	Nh	NaN	NaN	NaN
113	114	Fl	NaN	NaN	NaN
114	115	Mc	NaN	NaN	NaN
115	116	Lv	NaN	NaN	NaN
116	117	Ts	NaN	NaN	NaN
117	118	Og	NaN	NaN	NaN

	AtomicNumber	Symbol	MeltingPoint	BoilingPoint
103	104	Rf	NaN	NaN
104	105	Db	NaN	NaN
105	106	Sg	NaN	NaN
106	107	Bh	NaN	NaN
107	108	Hs	NaN	NaN
108	109	Mt	NaN	NaN
109	110	Ds	NaN	NaN
110	111	Rg	NaN	NaN
111	112	Cn	NaN	NaN
112	113	Nh	NaN	NaN
113	114	Fl	NaN	NaN
114	115	Mc	NaN	NaN
115	116	Lv	NaN	NaN
116	117	Ts	NaN	NaN
117	118	Og	NaN	NaN