Zefram

Integral Calculus for Datatypes

TL;DR: McBride's 2001 insight that differentiating a datatype yields "the type with a hole in it" was formalised elegantly by Abbott et al. in their containers framework. But two Francophone combinatorialists had already developed both differential and integral calculus for data structures in 1988, complete with bijective proofs of integration by parts, change of variables, and Taylor's theorem with integral remainder. The $\partial$ -for-data authors apparently never found it.

The Setup

In 2001, Conor McBride observed something lovely: if you take the derivative of a polynomial type, you get the type of its one-hole contexts. For lists:

\text{List}(X) = 1 + X + X^2 + X^3 + \cdots

\partial \text{List}(X) = 1 + 2X + 3X^2 + \cdots

The derivative counts "lists with a hole": one way to put a hole in a singleton, two ways in a pair (hole-left or hole-right), three ways in a triple, etc. You can implement a generic $plug : \partial F \times X \to F$ operation that fills the hole.

Abbott, Altenkirch, Ghani, and McBride formalised this in their containers framework (2003–2005), proving the sum rule, product rule, chain rule, and crucially, rules for initial algebras and terminal coalgebras. It's beautiful work. Their "∂ for Data" paper cites Joyal's 1986 work on species, Rajan's 1993 adjoint characterization, and various other sources.

The Question That Led Somewhere

What happens if you go the other direction? If differentiation gives you "structure with a hole," what does integration give you?

Naively integrating List:

\int \text{List} = X + \frac{X^2}{2} + \frac{X^3}{3} + \cdots

The 1/n coefficient suggests "cyclic lists", structures where you can't distinguish absolute positions because there's no distinguished starting point. This checks out: poking a hole in a cyclic list leaves you with... a list.

But what's the integral of cyclic lists? What structure has the property that poking a hole in it leaves a cyclic list? This question turns out to force some interesting constraints (sharp 2-transitivity, which only exists for prime-power sizes).

More importantly, chasing this question leads to a 1988 paper that the $\partial$ -for-data authors seem to have missed entirely.

The 1988 Paper

Pierre Leroux and Gérard Viennot, working in Montréal and Bordeaux respectively, published "Résolution combinatoire des systèmes d'équations différentielles, II: Calcul intégral combinatoire" in Annales des sciences mathématiques du Québec.

They develop combinatorial integral calculus for L-species: structure species defined on linearly ordered sets. Working with linear orders lets you define integration by "fixing the position of the minimal element," which combinatorially realizes division. They even pre-empt the interpretation of differentiation as poking a hole!

$F-structures, F'-structures, and \int F-structures$

They establish, bijectively:

$\int x^n = \frac{x^{n+1}}{n+1}$ → "connected lists" (lists whose first element is minimal)
$\int \frac{1}{1-x} = \ln\left(\frac{1}{1-x}\right)$ → the relationship between lists and cyclic permutations
$\int e^x = e^x - 1$
Integration by parts
Change of variables (chain rule for integrals)
Taylor's theorem with integral remainder
The convolution theorem for Laplace transforms

They even have the concept of a "hole" depicted in their diagrams (Figure 2.1 shows $F$ -structures, $F'$ -structures, and $\int F$ -structures side by side).

The Punchline

The $\partial$ -for-data paper's related work section mentions:

Ehrhard and Regnier's differential lambda calculus (2003)
Fiore, Plotkin, and Turi's δ operator (1999)
Rajan's 1993 adjoint characterization
Joyal's 1986 species paper
Hasegawa's 2002 work on analytic functors

No Leroux-Viennot. The French paper predates all of the "related work" except Joyal, and it covers more ground: not just differentiation but integration, with complete bijective proofs of the fundamental theorems of calculus lifted to data structures.

To be fair: the paper is in French, published in a regional Canadian journal, and concerns L-species (linearly ordered) rather than B-species (Joyal's original unordered setting). The containers framework is genuinely different in important ways. But the core insight that calculus operations have combinatorial meaning for data structures was there in 1988, waiting to be found.

What's Still Open

Leroux and Viennot's framework handles integration cleanly because linear orders give you a canonical "minimal element" to anchor things. The containers framework doesn't assume decidable equality on positions, which makes some constructions cleaner but blocks others.

The question that started this chase, "what's the integral of cyclic lists?", remains interesting. The coefficients $1/(n(n-1))$ seem to say "you can only talk about dimension-2 data and up," which has groupoid vibes. Leroux and Viennot's ordered setting might offer tools that pure species theory lacks.

And of course, the always tantalising prospect: what can one do with integral calculus over datatypes?