When to use this skill
When the user asks what CMS Provider Data datasets exist, wants metadata about a specific dataset by its identifier (title, description, theme, download URL, update schedule), or needs to discover whether CMS publishes data on a topic like dialysis facilities or nursing homes. This is a catalog-browsing capability — it shows what datasets are available, not the records inside them. For querying provider records inside a dataset, this is the wrong skill.
Your best first call
curl "https://data.cms.gov/provider-data/api/1/metastore/schemas/dataset/items/23ew-n7w9"
No auth. No key. Returns a single dataset object. Use /items/{identifier} when you have a dataset ID — it returns only the matching record, not the entire catalog. The full /items endpoint (no identifier) returns every published dataset in one response; prefer the narrowed call when you know the ID.
The key fields an agent uses:
identifier — the dataset ID, used as a path parameter in all other API calls
title — human-readable name (e.g. "Dialysis Facility - Listing by Facility")
description — what the dataset contains
theme — array of topic categories (e.g. ["Dialysis facilities"])
modified — last update date
nextUpdateDate — when the next scheduled refresh lands
distribution[].downloadURL — direct CSV download link (same data as the API, faster for bulk pulls)
Fallbacks (when the best call isn't enough)
- Need to discover datasets without knowing an identifier →
curl "https://data.cms.gov/provider-data/api/1/metastore/schemas/dataset/items" returns the full catalog. Fetch this once per session, parse into a dict keyed by identifier, and search client-side. Do not re-fetch.
- Need records inside a dataset, not just metadata →
curl "https://data.cms.gov/provider-data/api/1/datastore/query/23ew-n7w9/0?limit=3" returns paginated records. This is a separate capability from catalog browsing.
Pitfalls
- The URL prefix is
/api/1/, not the base domain alone. Every call must go to https://data.cms.gov/provider-data/api/1/{path} — omitting the /api/1/ segment produces a 404.
- The full catalog (
/items with no identifier) is a large response. Always prefer /items/{identifier} when you have the ID. When you must browse, fetch once and index by identifier.
nextUpdateDate is a scheduled date, not a guarantee — datasets sometimes publish late or skip a cycle.
distribution[].downloadURL points to a CSV with identical data to the API. For pulling an entire dataset (all nursing home records in a state, for example), the CSV sidesteps pagination entirely and is faster.
One-line summary for the user
I can look up CMS Provider Data datasets by identifier or browse the full dataset catalog — no authentication required, but prefer the single-dataset call when you have an ID.
The CMS.gov Provider Data API provides access to Medicare provider data including dialysis facilities, hospitals, nursing homes, physicians, and other healthcare providers. The API supports querying datasets, accessing metadata, and perform…
SKILL.md source (frontmatter + body)
---
name: list-items
description: When the user asks what CMS datasets exist, wants metadata about a specific CMS Provider Data dataset, or needs to discover whether CMS publishes data on a topic like dialysis or nursing homes — reach for the CMS Provider Data catalog.
---
## When to use this skill
When the user asks what CMS Provider Data datasets exist, wants metadata about a specific dataset by its identifier (title, description, theme, download URL, update schedule), or needs to discover whether CMS publishes data on a topic like dialysis facilities or nursing homes. This is a catalog-browsing capability — it shows what datasets are available, not the records inside them. For querying provider records inside a dataset, this is the wrong skill.
## Your best first call
```bash
curl "https://data.cms.gov/provider-data/api/1/metastore/schemas/dataset/items/23ew-n7w9"
```
No auth. No key. Returns a single dataset object. Use `/items/{identifier}` when you have a dataset ID — it returns only the matching record, not the entire catalog. The full `/items` endpoint (no identifier) returns every published dataset in one response; prefer the narrowed call when you know the ID.
The key fields an agent uses:
- `identifier` — the dataset ID, used as a path parameter in all other API calls
- `title` — human-readable name (e.g. "Dialysis Facility - Listing by Facility")
- `description` — what the dataset contains
- `theme` — array of topic categories (e.g. `["Dialysis facilities"]`)
- `modified` — last update date
- `nextUpdateDate` — when the next scheduled refresh lands
- `distribution[].downloadURL` — direct CSV download link (same data as the API, faster for bulk pulls)
## Fallbacks (when the best call isn't enough)
- **Need to discover datasets without knowing an identifier** → `curl "https://data.cms.gov/provider-data/api/1/metastore/schemas/dataset/items"` returns the full catalog. Fetch this once per session, parse into a dict keyed by `identifier`, and search client-side. Do not re-fetch.
- **Need records inside a dataset, not just metadata** → `curl "https://data.cms.gov/provider-data/api/1/datastore/query/23ew-n7w9/0?limit=3"` returns paginated records. This is a separate capability from catalog browsing.
## Pitfalls
- The URL prefix is `/api/1/`, not the base domain alone. Every call must go to `https://data.cms.gov/provider-data/api/1/{path}` — omitting the `/api/1/` segment produces a 404.
- The full catalog (`/items` with no identifier) is a large response. Always prefer `/items/{identifier}` when you have the ID. When you must browse, fetch once and index by `identifier`.
- `nextUpdateDate` is a scheduled date, not a guarantee — datasets sometimes publish late or skip a cycle.
- `distribution[].downloadURL` points to a CSV with identical data to the API. For pulling an entire dataset (all nursing home records in a state, for example), the CSV sidesteps pagination entirely and is faster.
## One-line summary for the user
I can look up CMS Provider Data datasets by identifier or browse the full dataset catalog — no authentication required, but prefer the single-dataset call when you have an ID.