knlp logo

Kernel-Style Machine Learning

Rapid prototyping and automation for open source ML R&D using Linux kernel development methodologies

knlp Papers

FIM-Guided Research

All built on the same signal: diagonal Fisher Information (E[g²] ≈ Adam exp_avg_sq). Squisher (2025) proves the equivalence. Higher FIM trace = parameters doing more work.

Adam State-Based Pruning

bitter7 achieves 15.6% better perplexity than magnitude baseline (37.28 vs 44.15 PPL), leveraging Adam's exp_avg_sq (≈ FIM diagonal) for importance scoring.

importance = |w| × (exp_avg_sq + ε)^0.25
B200x4

FIM-Guided Quantization

Diagonal Fisher identifies critical tensors for precision allocation. Upgrading 4 layers from Q3_K to Q6_K achieves 1.26% better perplexity at only 1.8% size increase.

Mobile

KVSplice

FIM-guided compression adds ~20% extra compression on top of MLA (7.2x vs 6x), achieving 11% better perplexity (63 vs 71 PPL) and +1 HellaSwag vs MLA baseline.

B200x4

Reciprocal Attention

Learned Q@K.T ↔ K@Q.T alternation achieves 82% better perplexity (50.5 vs 282 PPL) vs baseline. Outperforms Qwen's SDPA G1 gate by 77%. FIM trace guides layer selection.

B200x4

Memory Tiering

FIM-ranked parameter placement across GPU, CPU, and storage tiers. exp_avg_sq ranking determines which tensors stay in fast memory.

FIM-Guided GNN Fraud Detection

Reciprocal Attention transfers from transformers to GNNs. FIM-guided RA achieves +7% F1 on DGraphFin by applying RA selectively to uncertain nodes (bottom 33% by FIM trace). Page-aware batching provides 4× better I/O locality.

GNN

Unified FIM Framework

All applications leverage the same underlying signal: E[g²]

Application FIM Signal Result
bitter7 pruning exp_avg_sq^0.25 15.6% better PPL
Mobile quantization Σg² per tensor 1.26% better PPL
KVSplice layers FIM trace 11% better PPL
Reciprocal Attention FIM trace 82% better PPL
Memory Tiering exp_avg_sq rank Optimal placement
GNN Fraud Detection FIM trace → nodes +7% F1

Decode & Bandwidth

Autoregressive decode is dominated by memory traffic. These explainers establish the structural and empirical foundations behind the BPA line of work: RGSA → BPA → fused KV quantization.

Documentation