Back

A token is not a commodity

A token is not a commodity

6 min readMarch 19, 2026
A token is not a commodity

At GTC 26, one idea was everywhere: data centers are turning into token factories.

That may be true.

But it is also slightly absurd, because the token is becoming the industry's preferred unit of output before it has become a standardized economic unit.

A token is easy to count. It is much harder to compare. The same number of tokens can reflect very different underlying economics depending on the model, the hardware, the quantization, the serving configuration, the context length, and the infrastructure stack underneath.

The market has converged on a common language for output before it has agreed on a common economic meaning for that output.

Why This Matters

If compute is not comparable, it is hard to buy well, hard to price well, and hard to finance well.

For buyers and developers, a lower listed token price does not necessarily mean a lower task cost. One model may need fewer tokens to complete the same job. Another may look cheaper per token while relying on a more expensive deployment path. Caching, routing, quantization, and context handling can all change the economics without changing the abstraction at the surface.

For operators, the problem is different. If the unit being sold is not economically consistent, pricing becomes harder to explain and revenue harder to benchmark.

For investors and lenders, the issue is broader still. If compute cannot be compared cleanly, it becomes harder to forecast, underwrite, and finance as a coherent asset class.

So the claim here is narrow: a billed token is useful as an interface abstraction, but it is not yet a standardized economic unit for comparison, forecasting, underwriting, or settlement.

Why the Token Is Misleading

What the market calls token pricing is really a stack of different economic objects compressed into one label.

  • Anthropic prices Claude differently for input, output, cache operations, and long-context usage.
  • Parasail combines token-based serverless pricing with batch discounts, cache discounts, quantization modifiers, and dedicated deployments priced by GPU-hour.
  • Lambda and CoreWeave still expose much of the infrastructure layer directly through GPU-hour pricing.

Even before you compare providers, the market is already using multiple billing primitives for what people casually describe as the same thing: compute.

That variation shows up in three places

First, at the model layer. A token produced by a small open model is not the same economic unit as a token produced by a larger reasoning model. Under the same benchmark shape, Lambda reports that OLMo Hybrid 7B on one H100 produces 1,066 output tokens per second, while Qwen3-Coder-Next requires four H100s to produce 1,810 output tokens per second.

Using published instance pricing, that implies a hardware-only output cost floor of roughly $0.98 per 1M output tokens for OLMo 7B on one H100, against roughly $2.32 per 1M output tokens for Qwen3-Coder-Next on four H100s.

Second, at the cloud layer. Hold model throughput constant and change only the infrastructure price, and token economics move materially. The same throughput assumption can imply very different token cost floors depending on who is selling the underlying GPU-hour.

Third, inside the vendors themselves. Anthropic separates input from output and applies different rates to long-context usage and cache operations. Parasail varies pricing by execution mode: serverless, batch, cache, and quantization all change the effective token cost. Dedicated deployments move back to GPU-hour pricing altogether.

So even within a single provider, the token is already condition-dependent.

Where Standardization Actually Breaks

The root of the problem sits lower in the stack, at the operator layer.

Operators are not selling an abstract token. They are selling an infrastructure bundle: GPU capacity, power, cooling, networking, utilization, redundancy, labor, financing, and service guarantees.

Those inputs vary widely across deployments. One operator may be running a retrofitted air-cooled hall. Another may be building a liquid-ready AI facility. A third may be aggregating capacity across multiple sites with different power prices, cooling architectures, network topologies, and uptime commitments.

Even when two operators say they are selling the same GPU, they are often selling materially different economic objects.

The market often speaks as if the primitive unit is the token or the GPU-hour. For operators, the primitive unit is really an infrastructure bundle. Change any one of those inputs and the true cost of delivering inference changes with it.

That is why token pricing sits downstream of operator standardization.

A token is not a primary commodity. It is a derived billing unit that inherits variation from the infrastructure and execution choices beneath it.

In that sense, the token is not solving the standardization problem. It is concealing it.

What Internet Backyard and Parasail Are Doing

This is the gap Internet Backyard is closing.

Its premise is that the market needs financial standardization from bare metal up to tokenization. Before compute can be benchmarked, underwritten, or eventually traded with any seriousness, the industry needs a way to translate physical infrastructure into financial units the market can trust.

That is where the collaboration with Parasail fits in.

The partnership has two purposes.

The first is practical. Parasail and Internet Backyard are giving away one billion tokens available to qualified, compute-intensive companies so they can test real workloads at meaningful scale. At current pricing, roughly $500 corresponds to around one billion tokens in Parasail's mid-tier models, with deployments available in 5 minutes under standard conditions.

The second is more structural. By working with Parasail, Internet Backyard can observe compute consumption end to end, from bare metal and execution conditions through to user-facing token consumption.

That visibility makes it possible to connect what customers buy to what the underlying compute actually costs to deliver.

And that creates the basis for a different kind of pricing model.

Instead of treating token pricing as a self-contained abstraction, Parasail and Internet Backyard can begin to relate it to the underlying drivers: hardware, utilization, execution path, and customer consumption patterns.

The result should be pricing that is more legible for customers, more defensible for operators, and more grounded in the economics of the infrastructure itself.

The Proposal

The proposal is not to abandon token pricing. Tokens are useful as a billing and interface abstraction.

Nor is it to force every provider to publish identical numbers.

The proposal is narrower and more practical: build a financial translation layer beneath the token, so that infrastructure cost, model efficiency, and execution conditions can be understood in relation to the price a user ultimately sees.

Only then does a billed token begin to function as more than a convenient label.

Until that happens, the token will remain a useful interface abstraction, but a poor guide to underlying economic reality.

Running Gnome
Running Gnome
Running Gnome
From Vancouver to San FranciscoBacked by Jay Adelson, Ian Crosby, Geordie Rose and top tier VCs