Getting Started with the Data.gov Catalog API

← Data.gov API

When to use this API

Use this when a user wants to find U.S. federal government datasets — climate records, public health statistics, financial disclosures, geospatial layers, or economic data — without knowing which agency published them. The Data.gov catalog aggregates metadata from hundreds of thousands of datasets across dozens of federal agencies in a CKAN search index. The non-obvious strength is cross-agency discovery: a question like "which agencies publish wildfire data?" is a single query here instead of browsing twenty separate portals. The API returns dataset metadata and resource links, not the data itself — you fetch the actual files or service endpoints using URLs in the resources array. For data already known to be from a specific agency (e.g., Census Bureau, USDA Economic Research Service), go directly to that agency's own API; Data.gov is strongest when the agency is unknown.

Finding datasets on a topic

"Are there federal datasets about greenhouse gas emissions?" package_search runs full-text search over titles, descriptions, and tags across the entire catalog. Set rows explicitly — leaving it off returns 10 results, which is fine, but making the limit explicit signals intent and lets you page through with start when you need more.

curl "https://catalog.data.gov/api/3/action/package_search?q=climate&rows=3" | head -c 10000
{
  "success": true,
  "result": {
    "count": 22981,
    "results": [
      {
        "name": "supply-chain-greenhouse-gas-emission-factors-v1-3-by-naics-6",
        "title": "Supply Chain Greenhouse Gas Emission Factors v1.3 by NAICS-6",
        "notes": "GHG emission factors for 1,016 U.S. commodities as defined by NAICS. Factors are based on GHG data for 2022...",
        "organization": {
          "name": "epa-gov",
          "title": "U.S. Environmental Protection Agency"
        },
        "num_resources": 2,
        "metadata_modified": "2024-07-10T18:42:30.962320"
      }
      // ... 2 more results
    ]
  }
}

The count is 22,981 — nearly 23,000 federal datasets matched "climate." The top result is a supply-chain emission factors dataset whose title contains no mention of climate; it surfaced because the full-text index matches the detailed notes description. The name field (not title) is the slug you pass to package_show to get the full record including download URLs. Each result also carries organization.name, the agency's CKAN slug, which you can use as an fq=organization:epa-gov filter to scope follow-up searches to a single agency.

There are nearly 23,000 federal datasets tagged or described with "climate" in the Data.gov catalog. The current top result is EPA's Supply Chain Greenhouse Gas Emission Factors v1.3, which maps GHG emissions across 1,016 commodity categories at the NAICS-6 level. I can refine the search by agency, format, or more specific terms — let me know what you need.

Getting a dataset's metadata and download URL

"I found a Data.gov dataset about U.S. export financing — where's the actual file?" package_show returns the full dataset record including the resources array, which holds the actual file URLs, formats, and access endpoints. Use the dataset's name slug as the id parameter — it's what appears in search results and in the catalog URL.

curl "https://catalog.data.gov/api/3/action/package_show?id=authorizations-from-10-01-2006-thru-12-31-2022" | head -c 10000
{
  "success": true,
  "result": {
    "name": "authorizations-from-10-01-2006-thru-12-31-2022",
    "title": "Authorizations From 10/01/2006 Thru 09/30/2025",
    "notes": "All authorizations approved between 10/01/2006 and the latest reporting period. Asterisked Working Capital transactions were extended during EXIM Bank's lapse in authority...",
    "organization": {
      "name": "exim-gov",
      "title": "Export-Import Bank of the US"
    },
    "isopen": false,
    "metadata_modified": "2026-03-22T03:05:52.253755",
    "resources": [
      {
        "name": "Comma Separated Values File",
        "format": "CSV",
        "url": "https://img.exim.gov/s3fs-public/dataset/vbhv-d8am/Data.Gov+-+FY25+Q4.csv",
        "mimetype": "text/csv"
      },
      {
        "name": "Data.gov Report Data Dictionary",
        "format": "PDF",
        "url": "https://img.exim.gov/s3fs-public/dataset/vbhv-d8am/EXIM_Data.gov_Report_Data_Dictionary.pdf"
      }
    ]
  }
}

The isopen: false flag is CKAN's license-classification system, not a privacy gate — this dataset is publicly accessible. The CSV URL is a direct S3 link on EXIM Bank's own infrastructure (img.exim.gov), not a data.gov mirror; Data.gov stores only metadata, and the authoritative copy lives with the publishing agency. The dataset notes mention EXIM Bank's "lapse in authority" — a period when Congress allowed the bank's authorization to expire and it couldn't approve new multi-year transactions. That context is embedded in the notes field and affects how some rows in the CSV are marked, which you'd only know from the data dictionary PDF.

The Export-Import Bank dataset "Authorizations From 10/01/2006 Thru 09/30/2025" has a direct CSV download at https://img.exim.gov/s3fs-public/dataset/vbhv-d8am/Data.Gov+-+FY25+Q4.csv, plus a PDF data dictionary. The data covers all EXIM Bank loan authorizations from 2006 through FY25 Q4. A note: the bank had a lapse in authority at one point during which it couldn't approve new business — rows from that period are marked in the data.

Pitfalls

One-line summary for the user

I can search and retrieve metadata for hundreds of thousands of U.S. federal government datasets via the Data.gov CKAN catalog — including title, agency, description, tags, and direct download URLs — but the API returns metadata only, not the underlying data files themselves.