Verify file hash against known software

When the user has a file hash (MD5, SHA1, SHA256) and needs to know if it belongs to known, documented software — reach for CIRCL hashlookup. Backed by NIST NSRL, 6+ billion hashes. A 404 is the triage signal.

verify-file-hash-against-known-software · v1 · updated 2026-04-16

Agents: This page is a SKILL.md-style capability guide. For JSON, call GET /api/skills/verify-file-hash-against-known-software. To drop this into a local Claude Code install, copy the frontmatter + body below into ~/.claude/skills/verify-file-hash-against-known-software/SKILL.md.

When to use this skill

When the user has a file hash (MD5, SHA1, or SHA256) and needs to know whether it belongs to known, documented software — during incident response, malware triage, or software forensics. The forensically meaningful result is the negative: a hash that returns HTTP 404 is not in the known-good corpus and warrants investigation. For real-time threat intelligence or malware reputation, use VirusTotal or MISP instead — hashlookup is a file-identity oracle, not a reputation engine.

Your best first call

curl "https://hashlookup.circl.lu/lookup/sha1/A9993E364706816ABA3E25717850C26C9CD0D89D"

No auth. No key. Use /lookup/sha1/{hash} for SHA1, /lookup/md5/{hash} for MD5. The response is a single JSON object on match, or HTTP 404 when the hash is absent from the corpus. A 404 is not an error — it is the triage signal.

The key fields an agent uses:

Fallbacks (when the best call isn't enough)

Pitfalls

One-line summary for the user

I can check any MD5, SHA1, or SHA256 against CIRCL hashlookup's known-good file corpus (6+ billion NSRL-backed hashes) — a 404 means the file is not in any documented software distribution, which is the triage result that matters.

APIs this skill uses

hashlookup CIRCL API · primary · verified

CIRCL hashlookup is a public API to lookup hash values against known database of files. The service provides hash verification for MD5, SHA1, and SHA256 hashes against multiple datasets including NSRL. For more details visit https://www.cir…

Generated from

hashlookup CIRCL API tutorial Getting Started with CIRCL hashlookup

SKILL.md source (frontmatter + body)
---
name: verify-file-hash-against-known-software
description: When the user has a file hash (MD5, SHA1, SHA256) and needs to know if it belongs to known, documented software — reach for CIRCL hashlookup. Backed by NIST NSRL, 6+ billion hashes. A 404 is the triage signal.
---

## When to use this skill

When the user has a file hash (MD5, SHA1, or SHA256) and needs to know whether it belongs to known, documented software — during incident response, malware triage, or software forensics. The forensically meaningful result is the negative: a hash that returns HTTP 404 is not in the known-good corpus and warrants investigation. For real-time threat intelligence or malware reputation, use VirusTotal or MISP instead — hashlookup is a file-identity oracle, not a reputation engine.

## Your best first call

```bash
curl "https://hashlookup.circl.lu/lookup/sha1/A9993E364706816ABA3E25717850C26C9CD0D89D"
```

No auth. No key. Use `/lookup/sha1/{hash}` for SHA1, `/lookup/md5/{hash}` for MD5. The response is a single JSON object on match, or HTTP 404 when the hash is absent from the corpus. A 404 is not an error — it is the triage signal.

The key fields an agent uses:

- `FileName` — path within the software distribution (e.g. `./usr/share/doc/python2.7/examples/Demo/md5test/foo`)
- `SHA-1`, `MD5`, `SHA-256` — all three hash variants, regardless of which you queried
- `ProductCode.ProductName`, `ProductCode.ProductVersion` — NSRL's historical product attribution (see pitfalls)
- `source` — the RDS release or supplementary dataset that confirmed the match
- `db` — `nsrl_legacy` for historical NSRL records, or a newer dataset identifier
- `hashlookup:parent-total` — count of distinct parent packages in the corpus containing this file
- `mimetype` — MIME type of the matched file

## Fallbacks (when the best call isn't enough)

- **Need the full list of parent packages containing a file** → `/parents/{sha1}/{count}/{cursor}` returns paginated parent package details. Accepts only SHA1 — if you queried by MD5, use the `SHA-1` field from the lookup response.
- **Need database coverage info before a batch run** → `/info` returns NSRL version and total key count. Call once to understand which "not found" results are plausibly just post-NSRL releases versus genuinely unknown files.
- **Want to check community interest in an unknown hash** → `/stats/top` returns an `nx` key listing the 100 most-queried hashes not found in the corpus. A high query count there means the broader security community has already flagged it.

## Pitfalls

- A 404 response is the forensically significant result — it means the hash is not in the known-good corpus. Do not treat it as an error to suppress. A successful lookup means "documented in known software"; a 404 means "unknown to this database."
- The `/parents/` and `/children/` endpoints accept only SHA1, not MD5 or SHA256. If you have a different hash type, do a lookup first to retrieve the `SHA-1` field.
- `ProductCode.ProductName` in `nsrl_legacy` records is the first historical indexer's attribution, not the current file source. The `nsrl_legacy` corpus was assembled from retail software media in the early 2000s — attributions like "iMovie Toolkit 2003" on a 2024 Rust library file are common. Trust `source` and `FileName` over `ProductCode` when they conflict.

## One-line summary for the user

I can check any MD5, SHA1, or SHA256 against CIRCL hashlookup's known-good file corpus (6+ billion NSRL-backed hashes) — a 404 means the file is not in any documented software distribution, which is the triage result that matters.

« Back to all skills