Applying the free energy principle: Does the FEP stand on its own?

Noumenal Labs
May 12
7 min read

The tl;dr

In this blog post, we discuss the application of the free energy principle (FEP) to model real-world data-generating systems. Our main concern is the question whether the FEP stands on its own as a foundational principle in a purely unsupervised setting.

To answer this question, we begin by discussing key developments in the literature on the FEP. In particular, we review the recent shift from the earlier static formulations of the FEP to the more recent dynamic formulations, defined for entire paths or trajectories of systems in time. We focus on the definition of system boundaries in these formulations.

We then discuss the core difficulty that comes with the introduction of non-static, non-stationary boundaries. Specifically, we discuss the problem of selecting the ‘best’ partition of a dynamical system in a purely data-driven fashion. We show that on its own, the FEP provides little in the way of explicit insight into how to select among the set of possible partitions.

We argue that implicit in any information theoretic approach is a principle of parsimony implemented by the automatic Occam’s Razor effect of Bayesian inference. We show that this leads directly to a relatively simple dynamic Markov blanket detection algorithm, which selects the partition that is associated with the greatest simplification of the dynamics of the system as a whole.

Satisfyingly, this additional explanatory principle is ultimately underwritten by the same information theoretic principles that gave rise to the FEP itself, namely, the minimization of surprise.

Markov blankets and their role in the FEP

First, let’s review the FEP. We refer the reader to our previous blog post, which presented an accessible high level overview of the FEP — what it is, what it is not, what utility it brings to the table, and whether it lives up to the hype.

In a nutshell, the FEP is a mathematical principle that serves as the basis for a generalized modeling methodology. Generally speaking, principles provide us with an optimization objective and a methodology for modelling. It enables us to generate equations that describe the time evolution of dynamical systems in generalized, information theoretic terms. In doing so, the FEP extends the probabilistic modelling methods of statistical physics, systems identification theory, and reinforcement learning to complex dynamical systems that can be partitioned into objects or sub-systems coupled and separated via boundaries. The upshot is a generalizable, scale-free, information-theoretic definition of objects and object types, which is very useful for probabilistic modeling.

The FEP literature starts from the common sense assumption that what we call “objects” — or “sub-systems” of an overall dynamical system — can be defined in terms of the structure and statistics of their boundary. This is because the boundary of an object is, by definition, the locus of interactions between it and its environment. This boundary-based definition of objects seems like a sensible starting point, because the statistics and time evolution (or dynamics) of the boundary fully summarize the interactions between object and environment — that is, the effects of the environment on that object (which we call “inputs”) and vice versa (labelled as “outputs”). Mathematically, the boundary of a system is precisely the union of its inputs and outputs. To proponents of embodied cognition and ecological psychology, this should seem not only useful, but also conceptually satisfying, since it arms us with an environment-specific, interaction-based definition of object and object type.

In the FEP literature, this boundary is formalized as a Markov blanket — a mathematical construct borrowed from the field of probabilistic inference. Generically, the Markov blanket of a given variable is defined as the set of variables that, once given, render it independent of others, outside the blanket. In the context of machine learning, since it provides the basis of an efficient message passing algorithm for optimal inference of the internal variables. Indeed, if we want to compute the value of a given variable, then all we need to know is the value of those variables in its Markov blanket.

More precisely, given a set of variables X, the Markov blanket of a subset Z ⊂ X (which we call the “internal” variables) is defined as the subset B ⊂ X, such that Z is independent of all variables not in Z or B, when conditioned on the blanket B. That is, we have Z ⊥ S | B. In other words, given the Markov blanket of some internal variables, internal and external variables are independent of each other. That is, we have p(S, Z | B) = p(S | B) * p(Z | B), which is the definition of conditional independence between Z and S. This is equivalent to saying that B is composed of all inputs and outputs between Z and S.

Since the Markov blanket is, by construction, the union of inputs and outputs, then, appealing to the arguments above, it seems to provide a perfectly sensible definition of an object. Accordingly, they have been used in FEP literature to formally define the notion of an object, allowing us to mathematically describe the flow of information in the presence of a boundary between two or more sub-systems.

From static to dynamic formulations of the FEP

We can summarize the recent history of developments in the FEP literature (circa 2012-2025) as a shift from static to increasingly dynamic formulations of the FEP. The FEP was originally formulated using what one might call “static Markov blankets”. This means that in early formulations of the FEP, the location of the blanket is fixed and that the dynamics of the blanket are stationary. Dynamics are descriptions of the time evolution of a system, and to say that a system is stationary means that its dynamics do not themselves change over time.) Indeed, in early formulations of the FEP, the dynamics of the system as a whole (i.e., object, boundary, and environment) was assumed to be stationary.

These assumptions, while mathematically convenient, placed severe limitations on the kinds of objects that could be described using this version of the FEP. When blankets are defined statically, we are limited to describing objects with boundaries that cannot move or that are tied to matter — meaning that one cannot model traveling waves or the transfer of matter and energy between object and environment. This is problematic because it undermines the supposed quasi-universal scope of the FEP. It precludes using the FEP to model things that have a life cycle, like a living creature, or that exchange matter with their environment, or pop in and out of existence, such as flames, lightning bolts, whirlpools, soap bubbles, etc. This means that most of the things that matter to human beings — other human and nonhuman animals, all sorts of living creatures, as well as weather patterns, social systems, and so on — cannot be modelled using this simplified formulation of the FEP.

This led to some naive criticism that the Markov blanket itself was a much too severely limiting mathematical idealization for the purposes of modeling objects and object types (for instance, here and here) — and hence that the FEP was not as universal as its proponents claim it to be.

In response, proponents of the FEP moved to more dynamic formulations. The first move was to relax the stationarity constraint with respect to the global dynamics. Accordingly, from the end of the 2010s to present day, focus has been on formulating the FEP for paths or trajectories. That is, rather than associate the variables of our model with states of a system and the relationships that obtain between them, we use them to model paths or trajectories of a system through the space of all its possible states and configurations over time. This enabled practitioners to extend FEP theoretic modeling to systems that were not stationary or at steady state.

There have been two main approaches to formulating the path-based approach (which are arguably equivalent). The first was developed by Friston, Da Costa, and colleagues, using path integrals. The second is due to Sakthivadivel and Beck and Ramstead, which is based on the principle of maximum path entropy (aka caliber). All of these approaches make use of a free energy functional for trajectories of the system of the sort previously used to obtain information theoretic derivations of the equations of classical, statistical, and quantum mechanics.

Applying the FEP

The relevant mathematical object now becomes the statistics of a blanket path, which renders internal and external paths conditionally independent. This massively increased the set of physical phenomena that can be modeled within the FEP framework, effectively resolving issues regarding its domain of application.

But this new found flexibility came at a cost: Too many partitions (arguably, all of them) now provide potential definitions of objects. That is, there are way too many blankets and little in the way of guidance when it comes to how to select the best partitioning of a given dynamical system. Indeed, typical application of the FEP begins with a user-specified boundary, and the FEP is used to describe information flow in the presence of that boundary. But it does not directly provide us with any tools to determine which boundary is the relevant one in a purely data-driven manner. Thus, overcoming the issues of static Markov blankets had the side effect of making it clear that the even dynamic definitions of Markov blankets are alone not sufficient to result in sensible partitions of dynamical systems. At the end of the day, this is because one can choose any set of internal nodes and draw a blanket around it.

Does the FEP stand on its own?

It seems, then, that we need to step outside the logic of the FEP in order to resolve this issue, suggesting that the FEP is either incomplete or of no pragmatic use. Yet if we take a closer look at the derivation of the FEP and the information theoretic principles upon which it is founded, we see that there is an implicit principle at play when the FEP formalism is applied to the system as a whole. This is because, implicit in any probabilistic or Bayesian approach, is the notion that the simplest model is the best model, as implemented b y the automatic Occam’s razor effect of Bayesian inference. In other words, Bayesian approaches to probabilistic modelling naturally favor simpler models, i.e., models that describe the data with the fewest parameters. In this setting, fewer parameters means fewer partitions. This is known as the principle of parsimony — and because the FEP has its origins in information theory, it implicitly rests on this principle as well.

In practice, this means that we can treat the problem of partitioning dynamical systems via blankets as a simple problem of Bayesian modeling in the presence of dynamic boundaries with the ultimate goal of reducing the uncertainty regarding future observations of the system as a whole. This harkens back the whole point labeling a collection of particles as a thing: We do so because it reduces the entropy of our observations.

What is the implication of this for using the FEP? Is the FEP complete as a mathematical framework? Very nearly. Applying the FEP for the purposes of identifying objects and object types directly from data, requires going back to its origins as an information theory based modeling framework.