Inference Pipeline

A text prompt flows through the network from client to response.

Pipeline sequence: Client request -> input MPC pair (shard 0) -> text shard 1 -> intermediate shards (2..N-2) -> text shard N-1 -> output MPC A/B capability path -> client receives sampled-token artifact in full_output_2pc mode.

Key insight: No single node in this pipeline sees both the original prompt and the final output. The MPC pair protects the embedding; subsequent shards only see intermediate hidden states.

Admission boundary: registry discovery feeds this pipeline only after identity + stake checks pass (including signed registration and ongoing eligibility revalidation in on-chain mode).

Model Sharding

Transformer models are split layer-by-layer across independent nodes.

Example split (30 layers): shard 0 serves embedding + layers 0-14 (as an MPC pair), shard 1 serves layers 15-29 plus LM head/sampling. Each shard can be run by a different operator.

The unfed-tools split command takes a HuggingFace model and produces a manifest (JSON) describing the shard boundaries, plus individual weight files in .safetensors or .pt format. Each shard is self-contained — a node only needs its own weight file and the manifest.

Multi-Party Computation

How shard 0 protects the most sensitive step — turning words into numbers.

The Problem

The embedding layer converts raw token IDs (your actual words) into dense vectors. Whoever runs this layer can trivially reconstruct your prompt. This is the most privacy-critical part of inference.

The Solution

Shard 0 is served by an MPC pair — two independent nodes (A and B). Token IDs are secret-shared before the embedding lookup. Each node computes on its share; neither can reconstruct the original tokens.

MPC message flow: client sends encrypted token IDs to MPC A -> A creates/forwards share B to MPC B -> B returns partial result to A -> A combines secure results and forwards hidden states (not raw tokens) to shard 1.

Result: neither MPC node can reconstruct the original token IDs alone.

Onion Routing

Each request is encrypted in layers, one per node in the circuit.

1

Circuit Construction

The client discovers available nodes via the registry and builds a circuit. It fetches each node's public key (X25519) and constructs an onion-encrypted blob with one encryption layer per hop.

2

Layer Peeling

Each node decrypts its layer, reads the next-hop address, and forwards the remaining encrypted blob. Nodes only know their predecessor and successor — not the full circuit.

Multimodal Vision Pipeline

How image understanding is added without breaking the privacy model.

Multimodal path: image -> vision shard (ViT encoder + connector) -> image embeddings; text prompt -> shard 0 MPC path; both streams merge at shard 0, then continue through the same text shards and output-privacy modes (off, decode_client_sample, full_output_2pc).

For vision-language models (SmolVLM, Qwen2-VL), a dedicated vision shard processes the image through a ViT encoder and connector. The resulting image embeddings are injected at shard 0 alongside the text tokens. From there, the pipeline is identical to text-only inference.

Privacy scope note: output-token privacy protections apply to both text-only and multimodal generation once the request reaches the final text shard. Vision-side MPC coverage continues to evolve in parallel.

Manifest System

A declarative JSON format that describes how a model is sharded.

{
  "model_id": "HuggingFaceTB/SmolVLM-256M-Instruct",
  "format_version": 2,
  "architecture": {
    "text": {
      "hidden_size": 1536,
      "num_layers": 30,
      "num_heads": 12,
      "vocab_size": 49152,
      ...
    }
  },
  "text_shards": [
    { "shard_index": 0, "layer_start": 0, "layer_end": 15,
      "has_embedding": true, "file": "text_shard_0.pt" },
    { "shard_index": 1, "layer_start": 15, "layer_end": 30,
      "has_lm_head": true, "file": "text_shard_1.pt" }
  ],
  "vision_shards": [
    { "shard_index": 0, "layer_start": 0, "layer_end": 12,
      "has_embeddings": true, "has_connector": true,
      "file": "vision_shard_0.pt" }
  ]
}

The v2 manifest format includes full architecture config, enabling the generic runtime — nodes can serve any transformer model without model-specific code. The unfed-tools CLI handles inspection, splitting, verification, and publishing of manifests.