Data Types & Breach Categories
Data Types at a Glance
SpyCloud classifies each dataset into one of three main data types:
BREACH
Data taken from a known organization or domain, including exfiltrated employee or customer information – often structured, curated, and cleaned.
MALWARE
Data harvested from infostealer-infected machines, including credentials, browser fingerprints, session cookies, and web behavior logs.
COMBOLIST
Credential pair lists (email:password or username:password) often leaked or traded on forums, not tied to a single known breach source.
🧠 Breach Categories
SpyCloud further classifies breach datasets into sub-categories using two fields:
- breach_main_category
- breach_category
These categories provide added context for the type of exposure, behavior, or source.
🗃️ Combolist
Credential and password pairs (username:password or email:password) found on paste sites, forums, or combolists — often without attribution to a specific breach. Many contain recycled credentials from older exposures or breached datasets.
🔓 Exfiltrated
Breaches that have been exfiltrated by threat actors from an identifiable organization, often including customer records, user databases, or internal employee info. Typically has a known breach name and metadata.
🌍 Exposed
Publicly accessible or misconfigured data stores (e.g., open S3 buckets, FTPs) — found unintentionally but can contain sensitive credentials or personal data.
🎣 Phished
Credentials harvested through phishing campaigns, kits, or spoofed login portals. Often limited in structure but high-fidelity in intent.
🧹 Scraped
Usernames, emails, and account metadata scraped from public websites (e.g., social media, forums) — not stolen directly, but aggregated at scale.
🪟 Malware
Data captured from infected machines, including:
- Login credentials
- Cookies
- Autofill data
- Browsing history
- Wallet credentials
- Device fingerprinting
Collected using infostealer malware such as Redline, Raccoon, Vidar, etc. Richest and most behaviorally complete dataset type in the SpyCloud corpus.
Updated 2 months ago