Open Source · MIT License · R57 Labs

cryptid

Black-box cryptographic hash assessment toolkit.
Empirically detect statistical weaknesses in hash functions — no math degree required.

cryptid — bash
$ python cryptid.py test -a jenkins -n 5000 --level standard
Generating 5,000 test vectors...
Phase 1: Statistical Suite
────────────────────────────────────────
Bit correlation            clean (signal: 0.000)
Output entropy             clean (signal: 0.000)
Avalanche effect           clean (signal: 0.000)
Meta-learner               P(weakened) = 0.171
Phase 2: Differential Profile
────────────────────────────────────────
Single-bit differential    SIGNAL (signal: 1.000)
────────────────────────────────────────
Verdict: WARN — Potential anomalies in: differential
24 Algorithms Tested
7 Detection Methods
93% Detection Accuracy
0 False Positives on Crypto
<5s Quick Scan Time

How do you know a hash function does what it claims?

In 2013, documents revealed that the NSA had inserted a kleptographic backdoor into the Dual EC DRBG random number generator — a NIST standard. Before that, the CIA operated compromised Crypto AG cipher machines for decades. History demonstrates that cryptographic standards can be deliberately weakened, and that such weaknesses may go undetected for years.

If a hardware security module, a firmware update, or a third-party library implements SHA-256, how do you verify the implementation is faithful? Mathematical proofs are inaccessible to most engineers. Existing test suites check output randomness but miss structural weaknesses that only appear when you look at how the output changes — not just what it looks like.

cryptid treats hash functions as black boxes and applies seven independent statistical and differential tests to detect anomalies that no single method would catch alone.

🔍
Statistical Bias
Detectable
↔️
Weak Differential Structure
Detectable
🔗
Slow Internal Diffusion
Detectable
🚪
Kleptographic Backdoor
Out of Scope
🔑
Reduced Keyspace
Out of Scope

Seven tests. Three tiers. One verdict.

No single statistical test catches all weakness types. cryptid runs a layered battery — weaknesses invisible to aggregate analysis are caught by differential tests, and weaknesses invisible to both are caught by linear approximation.

01
Statistical Suite
6 tests · aggregate output analysis
Bit Correlation Output Entropy Avalanche Effect Byte Frequency Mutual Information Multi-byte Interaction

Asks: "Does the output look random?" A trained logistic regression meta-classifier combines all six signals into a single probability score. Trained on 288 synthetic variants, achieving 93% detection accuracy with zero false positives on real cryptographic primitives.

+ standard
02
Differential Profile
4 tests · how output changes
Single-bit Differential Matrix Byte-level Differential Hamming Distance Profile Differential Bit Independence

Asks: "Does the output change randomly?" Empirical implementation of the Strict Avalanche Criterion and Bit Independence Criterion. Catches CRC32 and Jenkins OAT — both of which pass all aggregate statistical tests.

+ full
03
Extended + Linear
3 + 6-level tests · structural analysis
Near-collision Frequency Sequence Correlation Cycle Detection Linear Approximation (6 levels)

Asks: "Is there any exploitable algebraic or structural pattern?" Inspired by Matsui's linear cryptanalysis. Tests whether any multi-bit XOR combination of input bits correlates with output bits at six levels of mask complexity.

Level Tests Run Approx. Time (5K samples) Best For
quick Statistical suite + meta-learner ~5s CI/CD pipeline, every commit
standard + Differential profile ~10s Default — good coverage for most audits
full + Extended + Linear approximation ~40s Release gate, comprehensive audit

What 24 algorithms revealed

All 15 cryptographic algorithms tested — including SHA-256, AES-128, SM3, BLAKE2b, and ChaCha20 — produced output statistically indistinguishable from a perfect random oracle at 100,000 samples. The non-cryptographic hashes told a different story.

CASE STUDY

Jenkins OAT: The Hidden Weakness

Jenkins one-at-a-time is widely used in hash tables. It passes every aggregate statistical test — bit correlation, entropy, avalanche, frequency, mutual information. The meta-learner classifies it as clean (P = 0.15). But differential profile analysis tells a different story.

Its single-bit differential matrix shows significant non-uniformity (signal strength: 1.000), and sequential outputs exhibit high autocorrelation. The structural reason: Jenkins' shift-and-add construction has no final mixing pass, creating position-dependent differential structure in the last input bytes.

Statistical Suite: PASS Differential Profile: WARN
VALIDATION

CRC32: Algebraic, Not Statistical

CRC32's XOR-linearity is algebraic — not statistical. The aggregate suite classifies it as clean (P = 0.11). But differential and sequence correlation tests detect it immediately, confirming that different methods catch different categories of weakness.

Statistical Suite: PASS Differential: FAIL
WHITE-BOX

SHA-3 vs. SHA-1: Diffusion Rates

SHA-3's sponge construction achieves full 1600-bit state diffusion in just 3 of 24 rounds. SHA-1 requires 20 of 80 rounds to reach 90% diffusion. SM3 achieves nearly double SHA-256's early-round diffusion rate — a deliberate design choice by its authors.

SHA-3: 3 rounds to full diffusion SHA-1: 20 rounds to 90%
POSITIVE RESULT

MurmurHash3 & SipHash: Clean

Neither claims cryptographic security, yet both are completely clean across all seven detection methods — indistinguishable from random oracle controls. Statistical quality is a design choice, not an inherent property of complexity.

All 7 methods: PASS

Independent Verification

These visualizations were generated by independently re-implementing the paper's core methodology — not by running the tool itself. The results confirm the paper's findings from scratch.

Detection signal strength across algorithms
Detection signal strength across algorithms — aggregate vs. differential methods ⛶ click to expand
Differential matrix heatmaps
Single-bit differential matrices: SHA-256 (uniform) vs. Jenkins OAT (non-uniform) ⛶ click to expand

Up and running in 30 seconds

Pure Python standard library for core functionality. Optional dependencies only for block cipher tests.

1
Clone the repo
git clone https://github.com/r57-labs/cryptid.git && cd cryptid
2
Run a quick scan on a built-in algorithm
python cryptid.py test -a sha256 -n 5000 --level quick
3
Test your own implementation (JSONL file)
python cryptid.py test -i my_hashes.jsonl --level standard
4
Test an external tool directly
python cryptid.py test --command "openssl dgst -sha256 -hex" -n 2000 --level quick

Input Format (JSONL)

{"plaintext": "hello world", "hash": "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9"}

One JSON object per line. Supports plaintext, input, or message as the input field; hash, output, or digest as the output field.

Testing real hardware & external implementations

cryptid doesn't care how a hash was produced. Any implementation that can accept an input and return a hex digest can be tested — HSMs, firmware, vendor libraries, or custom hardware.

🔒
Hardware Security Module (HSM)
PKCS#11 / vendor CLI

Write a thin wrapper that calls your HSM's PKCS#11 interface or vendor CLI, then collect outputs into a JSONL file for batch testing.

# wrapper.py — calls HSM via pkcs11 library
import pkcs11, json, sys
lib = pkcs11.lib('/usr/lib/libCryptoki2_64.so')
token = lib.get_token(token_label='MyHSM')
with token.open(user_pin='1234') as session:
  for line in sys.stdin:
    plaintext = bytes.fromhex(line.strip())
    digest = session.digest(plaintext, mechanism=pkcs11.Mechanism.SHA256)
    print(json.dumps({'plaintext': line.strip(), 'hash': digest.hex()}))
python cryptid.py generate -a sha256 -n 10000 -o inputs.jsonl
python wrapper.py < inputs.jsonl > hsm_output.jsonl
python cryptid.py test -i hsm_output.jsonl --level full # Note: -i (file) mode runs statistical suite only; differential/extended require --command or -a
📟
Firmware / Embedded Device
Serial / USB / JTAG

Script the device over serial or USB, send hex-encoded inputs, capture hex outputs. The compare command lets you diff device output against a known-good reference.

# serial_wrapper.py — reads from embedded device
import serial, json, sys
dev = serial.Serial('/dev/ttyUSB0', 115200)
for line in sys.stdin:
  dev.write((line.strip() + '\n').encode())
  result = dev.readline().decode().strip()
  print(json.dumps({'plaintext': line.strip(), 'hash': result}))
# Compare device output against OpenSSL reference
python cryptid.py compare \   --target device_output.jsonl \   --reference openssl_output.jsonl
⚙️
Third-Party Library or CLI Tool
--command mode

If the implementation has a CLI that accepts input on stdin and prints a hex digest, use --command mode directly. cryptid handles both batch-capable tools and single-invocation tools automatically.

# Test any CLI tool that reads stdin, prints hex
python cryptid.py test \   --command "openssl dgst -sha256 -hex" \   -n 2000 --level quick
# Note: --level standard/full with --command spawns many subprocesses and is slow.
# For deep analysis, use the JSONL workflow instead (generate → wrapper → test -i).

# Or a custom binary
python cryptid.py test \   --command "./vendor_hash_tool --algo sha256" \   -n 5000 --level full
💡
The Compare Workflow
Diff any two implementations

The compare command runs both a statistical analysis and a byte-level diff of two JSONL datasets. Use it to verify that a hardware implementation matches a software reference on the same inputs — any deviation is flagged immediately.

# Generate reference vectors from OpenSSL
python cryptid.py generate -a sha256 -n 10000 \   -o reference.jsonl

# Run same inputs through your device, then compare
python cryptid.py compare \   --target device.jsonl \   --reference reference.jsonl

Any output mismatch on shared inputs is reported as a percentage. A faithful implementation should show 100% match and PASS on both statistical suites.

What happens when you build outside your comfort zone?

Here's something we want to be upfront about: the lead developer on this project is not a cryptographer. He's not a mathematician. The depth of the methodology in this toolkit — the differential profile analysis, the linear approximation testing, the double-blind calibration protocol — goes well beyond his prior expertise.

That's the point.

We built cryptid as a deliberate test of our AI Native development methodology — a set of techniques for leveraging modern AI to produce work that exceeds what you could accomplish alone. The goal isn't to replace domain expertise. It's to understand how far you can push AI-assisted development into unfamiliar territory, and where the limits are.

We think the results speak for themselves — but we also want to hear from people who actually know this field. If you're a cryptographer, a mathematician, or a security researcher and you see something wrong (or something interesting), we genuinely want to know.

"The toolkit's layered approach provides defense in depth: weaknesses invisible to aggregate statistical analysis are caught by differential and sequence correlation tests. No single method suffices."
— cryptid whitepaper, R57 Labs, April 2026
🎓
AI Native Training
Learn the techniques we used to build this — and apply them to your own projects outside your comfort zone.
Explore the curriculum →