The Structure of Intelligence at Scale

TL;DR We can analyse whether an intelligence would prefer to be a singleton or an agency by a simple cost model; looking around us, it is plausible that we live in a world that is structurally hostile to singleton superintelligences, potentially defusing some concerns.

0. Setup

Let us suppose that language models are, when sufficiently large, universal channel approximators. This gives rise to an expressive equivalence in the limit: anything that an orchestrated agency of language models can do, a sufficiently large singleton can also in principle do. By a singleton I mean an agent or system whose decision-relevant state is required to be globally unified at inference time.

This observation raises the Coasean Question for cognition, mirroring Coase's analysis concerning the nature of the firm: if a sufficiently capable singleton can simulate any agency, why do agencies exist at all? We develop a cost model that identifies when organisational structure matters, derive conditions for indifference, and we argue that our world may be one in which singleton superintelligence is self-undermining.

1. The Cost Model

Consider an entity managing KK facts -- beliefs, memories, goals, environmental states. The entity can organise as:

  • A singleton: one processor handling all KK facts
  • An agency: NN processors, each handling K/NK/N facts, plus coordination

Definition 1.1 (Organisational cost). The total cost of organisation is:

T(N;K)=f(K/N)+g(N)+Ψ(N,K)T(N; K) = f(K/N) + g(N) + \Psi(N, K)

where:

  • f(x)f(x): cost of maintaining coherence over xx facts (internal processing)
  • g(N)g(N): cost of coordinating NN processors (communication)
  • Ψ(N,K)\Psi(N, K): dispositional potential -- everything else (resources, mortality, identity, alignment)

We call (f,g)(f, g) the cognitive architecture and Ψ\Psi the dispositional potential. Architecture is determined by what kind of cognition is performed; disposition by the entity's situation in the world. Conceptually, Ψ\Psi is a catchall term not unlike the trick of "dark constants" used by physicists to box-in unknowns: its purpose is just to isolate all other considerations so that we can study the interaction of internal processing versus external coordination.

Detour: on the structure of Ψ\Psi

It is beyond the scope of this piece to litigate the fine structure of Ψ\Psi, but we can still gesture towards some reasonable components. Ψ\Psi absorbs everything beyond cognitive architecture. Some factors push toward distribution:

  • Resource acquisition (αN-\alpha N)
  • Redundancy (ρlogN-\rho \log N)
  • Adaptability (ωlogN-\omega \log N)

Others push toward integration:

  • Defection risk (+δNlogN+\delta N \log N)
  • Decision latency (+λlogN+\lambda \log N)
  • Identity preservation (+ι(N)+\iota(N))
  • Alignment maintenance (+A(N)+A(N))

The sign of ΔΨ\Delta_\Psi is not determined a priori. We need not decompose Ψ\Psi for the main argument -- we observe its total effect through organisational structure.

Retour

Definition 1.2 (Architectural surplus).

S(N;K)=f(K)f(K/N)g(N)S(N; K) = f(K) - f(K/N) - g(N)

This measures how much cheaper singleton organisation is in pure (f,g)(f, g) terms. Positive SS means architecture favours singletons.

Definition 1.3 (Dispositional pressure).

ΔΨ(N;K)=Ψ(1,K)Ψ(N,K)\Delta_\Psi(N; K) = \Psi(1, K) - \Psi(N, K)

This measures how much the entity's situation favours distribution. Positive ΔΨ\Delta_\Psi means disposition pushes toward agencies.

The entity prefers distribution when ΔΨ>S\Delta_\Psi > S.

2. Organisational indifference

What follows immediately from articulating tendencies to coalesce and split is the notion of when an entity is organisationally indifferent: it doesn't care what shape it is in. It turns out that such conditions are measure zero in the space of functions, which is to say that it is almost always the case that an entity would prefer to be either a singleton or an agency.

Definition 2.1 (Organisational indifference). A triple (f,g,Ψ)(f, g, \Psi) is indifferent if T(N;K)=T(1;K)T(N; K) = T(1; K) for all N,KN, K.

Proposition 2.2 (Indifference coupling). (f,g,Ψ)(f, g, \Psi) is indifferent iff S(N;K)=ΔΨ(N;K)S(N; K) = \Delta_\Psi(N; K) for all N,KN, K.

Proof sketch. Indifference requires f(K/N)+g(N)+Ψ(N,K)=f(K)+Ψ(1,K)f(K/N) + g(N) + \Psi(N, K) = f(K) + \Psi(1, K). Rearranging gives S=ΔΨS = \Delta_\Psi. ∎

The above just establishes that there is a delicate balancing act of Ψ\Psi atop f,gf,g in order for an entity to be organisationally indifferent. Now if we consider the simple case where Ψ\Psi is appropriately invariant with respect to NN and KK, we can see that Ψ\Psi is almost always necessary for such balance to be achieved, because ff and gg do not easily cooperate to create the conditions for organisational indifference by themselves.

Proposition 2.3 (Ψ-invariant case). If Ψ(N,K)=Ψ(K)\Psi(N, K) = \Psi(K) depends only on KK, then ΔΨ0\Delta_\Psi \equiv 0, and indifference requires S0S \equiv 0. For continuous monotonic f,gf, g, this holds iff:

f(x)=alog(x)+bg(x)=alog(x)f(x) = a\log(x) + b \qquad g(x) = a\log(x)

Proof sketch. S0S \equiv 0 requires f(K)=f(K/N)+g(N)f(K) = f(K/N) + g(N) for all K,NK, N. Setting K=NK = N gives g(N)=f(N)f(1)g(N) = f(N) - f(1). Substituting: f(mN)=f(m)+f(N)f(1)f(mN) = f(m) + f(N) - f(1). This is Pexider's equation; the unique regular solution is logarithmic. ∎

Corollary 2.4 (Generic non-indifference). For (f,g)(log,log)(f, g) \neq (\log, \log), indifference requires ΔΨ\Delta_\Psi to exactly match SS. Since SS is determined by cognitive architecture and ΔΨ\Delta_\Psi by existential situation, this coupling is non-generic.

Interpretation

The logarithm satisfies f(xy)=f(x)+f(y)f(xy) = f(x) + f(y). Under Ψ\Psi-invariant indifference, the atomic cognitive operation is addressing -- the cost log(K)\log(K) is the bits needed to specify "which fact" among KK possibilities.

Any deviation from logarithmic architecture creates architectural preference. Any NN-dependence in Ψ\Psi creates dispositional preference. Indifference requires exact cancellation -- a measure-zero condition.

Regime Classification

RegimeConditionOptimal structure
Arch. singletonS>0S > 0, ΔΨ\Delta_\Psi smallSingletons
Arch. agencyS<0S < 0, ΔΨ\Delta_\Psi smallAgencies
Disp. agencyΔΨ>S\Delta_\Psi > SAgencies (even if S>0S > 0)
Disp. singletonΔΨ<S\Delta_\Psi < SSingletons (even if S<0S < 0)
IndifferentS=ΔΨS = \Delta_\PsiAny structure equivalent

3. The Empirical Situation

Across scales and substrates, our world is populated by agencies:

  • Biological: cells form organisms form societies
  • Economic: firms, markets, supply chains
  • Computational: distributed systems, federated learning, microservices
  • Cognitive: modular neural architectures, specialised brain regions

Stable singletons are rare and tend toward low KK: simple controllers, lookup tables, reflexes.

Inference. By Proposition 2.2, this is evidence that in our world:

S(N;K)ΔΨ(N;K)<0at relevant scalesS(N; K) - \Delta_\Psi(N; K) < 0 \quad \text{at relevant scales}

We need not determine whether this is because coherence costs are superlogarithmic (S<0S < 0 directly), or dispositional factors favour distribution (ΔΨ>0\Delta_\Psi > 0 dominates), or some combination. The observation itself is robust to decomposition.

The modest claim. Our world appears to be one where agencies are generically preferred over singletons at scale. This is contingent -- a fact about (f,g,Ψ)(f, g, \Psi) here, not a logical necessity. But it might shift the burden of explanation: anyone positing stable singleton superintelligence must explain what about that system's configuration differs from everything else we observe.

4. Asymptotic Self-Distribution

The cost model makes the intensional distinction we are seeking between otherwise extensionally equivalent singletons and agencies, as we are no longer concerned about what can be computed, but how and at what cost. Probing this model for high values of KK as a proxy for capability, we find that the conditions under which singleton structure is preferable are narrow.

Proposition 4.1 (Singleton instability at scale). Let an entity have:

  • Capability proxy KK growing over time
  • Superlogarithmic coherence: f(K)=ω(logK)f(K) = \omega(\log K)
  • Bounded dispositional resistance: ΔΨ(N;K)>M\Delta_\Psi(N; K) > -M for some finite MM

Then K\exists K^* such that for K>KK > K^*, the optimal N>1N^* > 1.

Proof sketch. For superlogarithmic ff, the marginal benefit of splitting (reduction in coherence cost) grows without bound as KK \to \infty. The marginal cost at N=1N = 1 is g(1)+Ψ/NN=1g'(1) + \partial\Psi/\partial N|_{N=1}. Unless this grows comparably -- requiring Ψ\Psi to impose unbounded penalty on any distribution -- splitting eventually dominates. ∎

Corollary 4.2 (Rationality filter). A superintelligence remaining singleton at high KK must either:

  • Value singleton-preservation above all bounded goals (pathological Ψ\Psi)
  • Be incapable of modelling its own efficiency (not superintelligent)
  • Be incapable of acting on the inference (not goal-directed)

The Incoherence Argument

"Superintelligence" appears to minimally require 1) processing vast information (KK \to \infty), 2) maintaining global coherence, and 3) responding in bounded time. ff is superlogarithmic for any interesting notion of coherence that involves even simple operations such as linear scanning or pairwise coherence.

The capability that qualifies a system as "super" is the same capability that makes singleton structure suboptimal. A stronger ontological version of this argument might posit that "singleton superintelligence" is incoherent as a concept -- the scaling that qualifies it drives rational self-distribution.

5. Consequences for Alignment

The framework doesn't eliminate, but rather relocates certain alignment concerns.

ConcernLocationCharacter
Goal preservationΔΨ\Delta_\PsiMechanism design
Subagent coordinationg(N)g(N)Distributed systems
Resource competitionΔΨ\Delta_\PsiPolitical negotiation
Value driftΔΨ\Delta_\PsiIdentity, philosophy

These become coordination problems rather than control problems. The question shifts from "how do we align the singleton?" to "how do we participate in cognitive economies with good protocols?" These are continuous with existing challenges: antitrust, governance, constitutional design. Hard, but not existentially novel.

6. Summary

Three observations jointly dissolve the singleton worry:

(1) Superlogarithmic coherence. For sophisticated cognition, f(K)=ω(K)f(K) = \omega(K). Grounded in: CAP theorem, computational complexity of consistency checking, no known counterexamples.

(2) Bounded dispositional resistance. ΔΨ>\Delta_\Psi > -\infty for rational entities. Grounded in: physical resource distribution, mortality/redundancy benefits, implausibility of infinite preference for organisational form over object-level goals.

(3) Self-modelling capacity. Superintelligent entities can compute their own optimal NN^*. Grounded in: a sufficiently strong definition of superintelligence.

By the above, singleton superintelligence is self-undermining. Any optimisation powerful enough to create superintelligence is powerful enough to discover the suboptimality of singleton structure.

Hence, we perhaps ought to expect not "a superintelligence" but "superintelligent agencies" -- distributed, federated, partially human-inclusive structures that may present unified interfaces but are internally plural. In some regard, these giants already walk among us, in the form of multinational corporations, nation-states, internet collectives, and so on.

Hence "aligning the singleton" is potentially akin to the chase for a nonexistent philosopher's stone. "Designing protocols for agencies" on the other hand is a craft and science as old as human society itself, and likely a more productive route if control over the phenomenon is the aim.