Getting Started with PubChem PUG REST — PubChem Property API

When to use this API

When you need molecular property data — formula, weight, SMILES, IUPAC name — for a known chemical compound. PubChem's PUG REST API turns a name, CID, SMILES string, or InChIKey into a structured property record in a single unauthenticated GET. It's the right tool when you have an identifier and want standard structural descriptors, not when you need bioassay data, spectra, or toxicity reports (those live in PubChem's other APIs). The API is unusually forgiving with name lookups — it resolves synonyms, trade names, and systematic names to the canonical compound record.

Getting molecular properties by compound name

"What's the molecular formula and IUPAC name of caffeine?"

The name-based path is the most natural entry point. Properties go comma-separated in the URL path, and the output format (JSON) is also in the path — no query parameters needed.

curl "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/caffeine/property/MolecularFormula,MolecularWeight,IUPACName/JSON" | head -c 10000

{
  "PropertyTable": {
    "Properties": [
      {
        "CID": 2519,
        "MolecularFormula": "C8H10N4O2",
        "MolecularWeight": "194.19",
        "IUPACName": "1,3,7-trimethylpurine-2,6-dione"
      }
    ]
  }
}

The IUPAC name "1,3,7-trimethylpurine-2,6-dione" is not how anyone orders coffee, but it tells you something the common name hides: caffeine is a methylated purine — the same bicyclic ring system as adenine and guanine in DNA. The methyl positions at 1, 3, and 7 pharmacologically distinguish caffeine from its relatives theobromine (3,7-dimethylxanthine) and theophylline (1,3-dimethylxanthine). The CID field (2519) is PubChem's internal compound identifier — grab it for any follow-up calls.

Caffeine's molecular formula is C8H10N4O2 (molecular weight 194.19 g/mol). Its IUPAC name is 1,3,7-trimethylpurine-2,6-dione — a methylated purine sharing the same bicyclic backbone as the DNA nucleobases adenine and guanine. The three methyl groups at positions 1, 3, and 7 distinguish it from related methylxanthines like theobromine and theophylline.

Resolving an InChIKey to compound records

"I have the InChIKey XLYOFNOQVPJJNP-UHFFFAOYSA-N — what compound is this?"

InChIKeys are 27-character hashed identifiers designed for database lookups. The surprise: one InChIKey can map to multiple PubChem CIDs.

curl "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchikey/XLYOFNOQVPJJNP-UHFFFAOYSA-N/property/MolecularFormula,MolecularWeight/JSON" | head -c 10000

{
  "PropertyTable": {
    "Properties": [
      {
        "CID": 962,
        "MolecularFormula": "H2O",
        "MolecularWeight": "18.015"
      },
      {
        "CID": 22247451,
        "MolecularFormula": "H2O",
        "MolecularWeight": "18.015"
      }
    ]
  }
}

Two CIDs, same formula, same weight. CID 962 is standard water; CID 22247451 is a separate PubChem record — an isotopic or stereochemical variant tracked independently. InChIKeys strip stereo and isotopic detail by design, so a single key can collide across variants. The real lesson: if your downstream logic assumes one InChIKey equals one compound, it will break on this API. Use CID as the canonical identifier, not InChIKey.

The InChIKey XLYOFNOQVPJJNP-UHFFFAOYSA-N resolves to water (H2O, molecular weight 18.015 g/mol). PubChem returns two CIDs — 962 and 22247451 — because the same InChIKey can map to multiple compound records that differ in isotopic composition. CID 962 is the standard water record.

Identifying a compound from its SMILES notation

"What compound does the SMILES C1CCCCC1 represent?"

SMILES strings are common in cheminformatics pipelines. The PUG REST API resolves them directly to property records — no intermediate CID lookup step required.

curl "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/C1CCCCC1/property/MolecularFormula,MolecularWeight/JSON" | head -c 10000

{
  "PropertyTable": {
    "Properties": [
      {
        "CID": 8078,
        "MolecularFormula": "C6H12",
        "MolecularWeight": "84.16"
      }
    ]
  }
}

The SMILES C1CCCCC1 encodes cyclohexane — the digit 1 marks the ring closure, turning a linear string into a six-membered carbon ring. CID 8078 confirms the identity. SMILES-based lookups are useful when you're working with structural data from modeling tools or database dumps and need to cross-reference against PubChem's canonical records. One caution: SMILES strings frequently contain =, #, [, and () that need URL-encoding before they go into a path segment.

The SMILES C1CCCCC1 represents cyclohexane (C6H12, molecular weight 84.16 g/mol, PubChem CID 8078). The 1 in the SMILES notation closes the six-membered carbon ring.

Pitfalls

InChIKey lookups can return multiple CIDs. The Properties array may have more than one entry. Always check the array length — do not assume index 0 is the "right" compound; it depends on whether you need the standard isotopic form or a variant.
Properties go in the URL path, not query parameters. property/MolecularFormula,MolecularWeight/JSON is part of the path hierarchy. There is no ?property= query form.
URL-encode SMILES strings before putting them in the path. Characters like =, #, [, (, ) appear frequently in SMILES and will break the URL if left bare. For example, aspirin's SMILES CC(=O)OC1=CC=CC=C1C(=O)O needs the parentheses and equals signs encoded.
You must specify which properties you want. There is no "give me everything" call — every request names its properties explicitly. Available properties include MolecularFormula, MolecularWeight, CanonicalSMILES, IsomericSMILES, IUPACName, InChI, InChIKey, and others documented at the PUG REST guide.

One-line summary for the user

I can look up molecular properties — formula, weight, SMILES, IUPAC name — for any compound in PubChem by name, CID, SMILES, or InChIKey in a single unauthenticated GET, but I can only return structural descriptors, not bioassay or toxicity data.