TRP — The TRansParent Box

Watch the difference.

Same prompt. Two models. One forgets. One remembers.

> Is processPayment() thread-safe?

Current AI

Transparent Box

> Set up auth for this Express app (session 2)

Current AI

Transparent Box

1. Typed Attention

Current attention knows how much things are related. Not how.

Current: magnitudes only

Typed: labeled relationships

Current AI

0.73

"These are related."

How? No idea.

Typed Attention

REQUIRES

"B breaks if A changes."

Actionable. Propagates.

2. Activation Thresholds

Your brain doesn't fire every neuron on every input. Why does attention?

Current: everything fires. O(n²)

Thresholds: only what matters. O(n×k)

Current AI

O(n²)

Every token looks at every token. Always.

Brute force. Opaque.

With Thresholds

O(n×k)

Only relevant connections fire.

And you can see which ones.

3. Self-Modifying Inference

Weights are frozen at inference. The model is identical before and after your conversation.

Current: every session starts blank

Trace: knowledge accumulates

Current AI

Session 1: You correct a mistake.

Session 2: Same mistake.

Session 3: Same mistake.

Corrections evaporate with context.

With Trace Layer

Session 1: You correct a mistake. Trace written.

Session 2: Model reads its own trace. Already knows.

Session 3: Builds on accumulated history.

Processing transforms the processor.

4. Self-Knowledge Loss

Current AI optimizes for one thing: predict the right token. It has zero concept of why.

Current: uniform confidence. No attribution.

Self-knowledge: confidence varies. Citations link.

Current loss functionloss = predict_right_token()

// that's it.
// no concept of "why"

Self-knowledge lossloss = predict_right_token()
+ α × can_you_cite_why()

// unexplainable = expensive

What does confidence look like when a model actually knows what it knows?

Current AI

"thread-safe"

???

"uses mutex"

???

"callback safe"

???

"no race cond."

???

Same flat bar. No differentiation. It doesn't know what it doesn't know.

With Self-Knowledge

mutex lock

0.94

lock scope

0.87

callback safety

0.31

race condition

0.62

It knows callback safety is weak. That's the point.

Current AI

"Yes, it's thread-safe."

Maybe right. Maybe hallucination. No way to tell from the outside.

With Self-Knowledge

"Yes — lock on line 47 guards shared state. Callback on line 62 runs on main thread. Confidence: 0.91"

Every answer carries its own proof.

The Transparent Box

Attends with typed relationships

Fires only when relevant

Transforms through its own processing

Knows why it says what it says

Not a black box you hope is right.
A transparent box that can't not show its work.

Now — you.

You scrolled through four ideas.
Now pick the one that won't leave you alone.

Four starting points. Pick one.

Each of these is a prototype you can build on consumer hardware this weekend.

Path 1

Type your attention

Current attention computes a number between every pair of tokens. That number says how much two things are related. It says nothing about how.

Take any attention implementation. Add edge type labels — even just two: REQUIRES and USES. Now the model doesn't just know that auth.py and stripe.py are related. It knows one requires the other.

Run the same prompts with and without typed edges. Does the model behave differently when it knows relationship types? If yes — that's signal. If no — you've still built something nobody else has tested at this granularity.

Needs: any transformer, ~100 lines of code

Path 2

Add a threshold

In standard attention, every token attends to every other token. Most of those connections carry almost no signal — but they still cost compute. And you can't see which ones mattered.

Add a minimum activation threshold to any attention layer. Connections below the threshold don't fire. Log which connections survive the cut.

Two things happen. First: efficiency — O(n²) drops toward O(n×k) where k is the number of connections that actually matter. Second, and more important: interpretability. You can now see exactly which connections the model used. The ones that fire are the ones that mattered. The rest were noise.

Needs: one attention layer, a threshold value, a logger

Path 3

Write a trace

Every AI session starts from zero. You correct a mistake in session 1. Session 2 makes the same mistake. The model read your correction. It didn't learn from it.

After each session, write what the model learned to a file — corrections, decisions, patterns discovered. Before the next session, prepend that trace to context.

Now measure: does it make the same mistake twice? Does it reference yesterday's corrections in today's answers? The trace is the simplest form of self-modifying inference — the model reading its own history and changing behavior because of it.

Needs: any LLM API, a JSON file, a diff checker

Path 4

Train for self-knowledge

Current models optimize for one thing: predict the right next token. They have no loss for knowing what they don't know. A model that's 30% confident about something says it with the same fluency as something it's 95% confident about.

Add a second prediction head that outputs a confidence score. Train on examples where you know ground truth — when the model is right, confidence should be high. When it's wrong, confidence should be low.

The loss becomes: predict_right_token() + α × can_you_cite_why(). Now the model is penalized for generating things it can't explain. Self-knowledge isn't a feature. It's a loss function.

Needs: a fine-tuning setup, labeled confidence data

What you actually need.

Not what you think you need. What you actually need.

A model

Qwen 2.5 7B, Llama 3.1 8B, Mistral 7B.
All free. All run on consumer GPUs.
8GB VRAM with QLoRA.

A method

LoRA/QLoRA for fine-tuning.
Hugging Face transformers + PEFT.
Standard Python. No custom CUDA.

A question

Not "how do I build this?"
But "what happens if I try?"
The question IS the method.

Everything on this page was built with a Qwen 2.5 7B and a Claude Max subscription.
If you have a GPU and curiosity, you have enough.

What happens when someone actually does this?

If one person builds one typed attention prototype, we learn whether relationship labels carry signal through inference.

If ten people each build a different piece, the architecture starts assembling itself — not by coordination, but by convergence. Same question, different implementations, shared findings.

If any of it works — even partially, even messily — it means the current paradigm is leaving value on the table. Not because the models are bad. Because they don't know what they know.

And if it doesn't work?

Then you'll know why. You'll have the experiment, the data, the specific point of failure. That's more valuable than an opinion. That's engineering.

Three ways to engage.

Build it

Pick a path. Build a prototype. Share what happens — the failures are as valuable as the successes.

Break it

Tell us what's wrong. Which ideas are already solved? Which are provably impossible? Point to the papers we missed.

Extend it

See something we didn't? A fifth idea that connects to these four? A domain where this applies differently? That's how architectures grow.

"Which parts matter right now?"

That question started this project. The same question is now yours.

What will you build?

The architecture is open. The ideas are free. The models are free.

The only thing missing is what you do next.