Open Source · MIT License · R57 Labs

cryptid

Black-box cryptographic hash assessment toolkit.
Empirically detect statistical weaknesses in hash functions — no math degree required.

View on GitHub Read the Whitepaper

cryptid — bash

$ python cryptid.py test -a jenkins -n 5000 --level standard

Generating 5,000 test vectors...

Phase 1: Statistical Suite

────────────────────────────────────────

Bit correlation clean (signal: 0.000)

Output entropy clean (signal: 0.000)

Avalanche effect clean (signal: 0.000)

Meta-learner P(weakened) = 0.171

Phase 2: Differential Profile

────────────────────────────────────────

Single-bit differential SIGNAL (signal: 1.000)

────────────────────────────────────────

Verdict: WARN — Potential anomalies in: differential

THE PROBLEM

How do you know a hash function does what it claims?

In 2013, documents revealed that the NSA had inserted a kleptographic backdoor into the Dual EC DRBG random number generator — a NIST standard. Before that, the CIA operated compromised Crypto AG cipher machines for decades. History demonstrates that cryptographic standards can be deliberately weakened, and that such weaknesses may go undetected for years.

If a hardware security module, a firmware update, or a third-party library implements SHA-256, how do you verify the implementation is faithful? Mathematical proofs are inaccessible to most engineers. Existing test suites check output randomness but miss structural weaknesses that only appear when you look at how the output changes — not just what it looks like.

cryptid treats hash functions as black boxes and applies seven independent statistical and differential tests to detect anomalies that no single method would catch alone.

🔍

Statistical Bias

Detectable

↔️

Weak Differential Structure

Detectable

🔗

Slow Internal Diffusion

Detectable

🚪

Kleptographic Backdoor

Out of Scope

🔑

Reduced Keyspace

Out of Scope

METHODOLOGY

Seven tests. Three tiers. One verdict.

No single statistical test catches all weakness types. cryptid runs a layered battery — weaknesses invisible to aggregate analysis are caught by differential tests, and weaknesses invisible to both are caught by linear approximation.

Statistical Suite

6 tests · aggregate output analysis

Bit Correlation Output Entropy Avalanche Effect Byte Frequency Mutual Information Multi-byte Interaction

Asks: "Does the output look random?" A trained logistic regression meta-classifier combines all six signals into a single probability score. Trained on 288 synthetic variants, achieving 93% detection accuracy with zero false positives on real cryptographic primitives.

+ standard

Differential Profile

4 tests · how output changes

Single-bit Differential Matrix Byte-level Differential Hamming Distance Profile Differential Bit Independence

Asks: "Does the output change randomly?" Empirical implementation of the Strict Avalanche Criterion and Bit Independence Criterion. Catches CRC32 and Jenkins OAT — both of which pass all aggregate statistical tests.

+ full

Extended + Linear

3 + 6-level tests · structural analysis

Near-collision Frequency Sequence Correlation Cycle Detection Linear Approximation (6 levels)

Asks: "Is there any exploitable algebraic or structural pattern?" Inspired by Matsui's linear cryptanalysis. Tests whether any multi-bit XOR combination of input bits correlates with output bits at six levels of mask complexity.

Level	Tests Run	Approx. Time (5K samples)	Best For
quick	Statistical suite + meta-learner	~5s	CI/CD pipeline, every commit
standard	+ Differential profile	~10s	Default — good coverage for most audits
full	+ Extended + Linear approximation	~40s	Release gate, comprehensive audit

KEY FINDINGS

What 24 algorithms revealed

All 15 cryptographic algorithms tested — including SHA-256, AES-128, SM3, BLAKE2b, and ChaCha20 — produced output statistically indistinguishable from a perfect random oracle at 100,000 samples. The non-cryptographic hashes told a different story.

CASE STUDY

Jenkins OAT: The Hidden Weakness

Jenkins one-at-a-time is widely used in hash tables. It passes every aggregate statistical test — bit correlation, entropy, avalanche, frequency, mutual information. The meta-learner classifies it as clean (P = 0.15). But differential profile analysis tells a different story.

Its single-bit differential matrix shows significant non-uniformity (signal strength: 1.000), and sequential outputs exhibit high autocorrelation. The structural reason: Jenkins' shift-and-add construction has no final mixing pass, creating position-dependent differential structure in the last input bytes.

Statistical Suite: PASS Differential Profile: WARN

VALIDATION

CRC32: Algebraic, Not Statistical

CRC32's XOR-linearity is algebraic — not statistical. The aggregate suite classifies it as clean (P = 0.11). But differential and sequence correlation tests detect it immediately, confirming that different methods catch different categories of weakness.

Statistical Suite: PASS Differential: FAIL

WHITE-BOX

SHA-3 vs. SHA-1: Diffusion Rates

SHA-3's sponge construction achieves full 1600-bit state diffusion in just 3 of 24 rounds. SHA-1 requires 20 of 80 rounds to reach 90% diffusion. SM3 achieves nearly double SHA-256's early-round diffusion rate — a deliberate design choice by its authors.

SHA-3: 3 rounds to full diffusion SHA-1: 20 rounds to 90%

POSITIVE RESULT

MurmurHash3 & SipHash: Clean

Neither claims cryptographic security, yet both are completely clean across all seven detection methods — indistinguishable from random oracle controls. Statistical quality is a design choice, not an inherent property of complexity.

All 7 methods: PASS

Independent Verification

These visualizations were generated by independently re-implementing the paper's core methodology — not by running the tool itself. The results confirm the paper's findings from scratch.

Detection signal strength across algorithms — aggregate vs. differential methods ⛶ click to expand

Single-bit differential matrices: SHA-256 (uniform) vs. Jenkins OAT (non-uniform) ⛶ click to expand

GET STARTED

Up and running in 30 seconds

Pure Python standard library for core functionality. Optional dependencies only for block cipher tests.

Clone the repo

git clone https://github.com/r57-labs/cryptid.git && cd cryptid

Run a quick scan on a built-in algorithm

python cryptid.py test -a sha256 -n 5000 --level quick

Test your own implementation (JSONL file)

python cryptid.py test -i my_hashes.jsonl --level standard

Test an external tool directly

python cryptid.py test --command "openssl dgst -sha256 -hex" -n 2000 --level quick

Input Format (JSONL)

          {"plaintext": "hello world", "hash": "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9"}
        

One JSON object per line. Supports plaintext, input, or message as the input field; hash, output, or digest as the output field.

REAL-WORLD TESTING

Testing real hardware & external implementations

cryptid doesn't care how a hash was produced. Any implementation that can accept an input and return a hex digest can be tested — HSMs, firmware, vendor libraries, or custom hardware.

🔒

Hardware Security Module (HSM)

PKCS#11 / vendor CLI

Write a thin wrapper that calls your HSM's PKCS#11 interface or vendor CLI, then collect outputs into a JSONL file for batch testing.

            # wrapper.py — calls HSM via pkcs11 library

import pkcs11, json, sys

lib = pkcs11.lib('/usr/lib/libCryptoki2_64.so')

token = lib.get_token(token_label='MyHSM')

with token.open(user_pin='1234') as session:

  for line in sys.stdin:

    plaintext = bytes.fromhex(line.strip())

    digest = session.digest(plaintext, mechanism=pkcs11.Mechanism.SHA256)

    print(json.dumps({'plaintext': line.strip(), 'hash': digest.hex()}))

python cryptid.py generate -a sha256 -n 10000 -o inputs.jsonl

python wrapper.py < inputs.jsonl > hsm_output.jsonl

python cryptid.py test -i hsm_output.jsonl --level full
# Note: -i (file) mode runs statistical suite only; differential/extended require --command or -a

📟

Firmware / Embedded Device

Serial / USB / JTAG

Script the device over serial or USB, send hex-encoded inputs, capture hex outputs. The compare command lets you diff device output against a known-good reference.

            # serial_wrapper.py — reads from embedded device

import serial, json, sys

dev = serial.Serial('/dev/ttyUSB0', 115200)

for line in sys.stdin:

  dev.write((line.strip() + '\n').encode())

  result = dev.readline().decode().strip()

  print(json.dumps({'plaintext': line.strip(), 'hash': result}))

# Compare device output against OpenSSL reference

python cryptid.py compare \
  --target device_output.jsonl \
  --reference openssl_output.jsonl

⚙️

Third-Party Library or CLI Tool

--command mode

If the implementation has a CLI that accepts input on stdin and prints a hex digest, use --command mode directly. cryptid handles both batch-capable tools and single-invocation tools automatically.

# Test any CLI tool that reads stdin, prints hex

python cryptid.py test \
  --command "openssl dgst -sha256 -hex" \
  -n 2000 --level quick

# Note: --level standard/full with --command spawns many subprocesses and is slow.

# For deep analysis, use the JSONL workflow instead (generate → wrapper → test -i).


# Or a custom binary

python cryptid.py test \
  --command "./vendor_hash_tool --algo sha256" \
  -n 5000 --level full

💡

The Compare Workflow

Diff any two implementations

The compare command runs both a statistical analysis and a byte-level diff of two JSONL datasets. Use it to verify that a hardware implementation matches a software reference on the same inputs — any deviation is flagged immediately.

# Generate reference vectors from OpenSSL

python cryptid.py generate -a sha256 -n 10000 \
  -o reference.jsonl


# Run same inputs through your device, then compare

python cryptid.py compare \
  --target device.jsonl \
  --reference reference.jsonl

Any output mismatch on shared inputs is reported as a percentage. A faithful implementation should show 100% match and PASS on both statistical suites.

BUILT WITH AI NATIVE

What happens when you build outside your comfort zone?

Here's something we want to be upfront about: the lead developer on this project is not a cryptographer. He's not a mathematician. The depth of the methodology in this toolkit — the differential profile analysis, the linear approximation testing, the double-blind calibration protocol — goes well beyond his prior expertise.

That's the point.

We built cryptid as a deliberate test of our AI Native development methodology — a set of techniques for leveraging modern AI to produce work that exceeds what you could accomplish alone. The goal isn't to replace domain expertise. It's to understand how far you can push AI-assisted development into unfamiliar territory, and where the limits are.

We think the results speak for themselves — but we also want to hear from people who actually know this field. If you're a cryptographer, a mathematician, or a security researcher and you see something wrong (or something interesting), we genuinely want to know.

Learn AI Native Methodology → Send Feedback or Criticism

"The toolkit's layered approach provides defense in depth: weaknesses invisible to aggregate statistical analysis are caught by differential and sequence correlation tests. No single method suffices."

— cryptid whitepaper, R57 Labs, April 2026

🎓

AI Native Training

Learn the techniques we used to build this — and apply them to your own projects outside your comfort zone.

Explore the curriculum →