Rigorous bounds on the stationary distributions of the chemical master equation via mathematical programming

The stochastic dynamics of biochemical networks are usually modelled with the chemical master equation (CME). The stationary distributions of CMEs are seldom solvable analytically, and numerical methods typically produce estimates with uncontrolled errors. Here, we introduce mathematical programming approaches that yield approximations of these distributions with computable error bounds which enable the verification of their accuracy. First, we use semidefinite programming to compute increasingly tighter upper and lower bounds on the moments of the stationary distributions for networks with rational propensities. Second, we use these moment bounds to formulate linear programs that yield convergent upper and lower bounds on the stationary distributions themselves, their marginals and stationary averages. The bounds obtained also provide a computational test for the uniqueness of the distribution. In the unique case, the bounds form an approximation of the stationary distribution with a computable bound on its error. In the non-unique case, our approach yields converging approximations of the ergodic distributions. We illustrate our methodology through several biochemical examples taken from the literature: Schl\"ogl's model for a chemical bifurcation, a two-dimensional toggle switch, a model for bursty gene expression, and a dimerisation model with multiple stationary distributions.


I. INTRODUCTION
Cell-to-cell variability is pervasive in cell biology. A fundamental source of this variability is the fact that biochemical reactions inside cells often involve only a few molecules per cell [1][2][3] . Such reactions are key components in gene regulatory and signalling networks involved in cellular adaptation and cell fate decisions [4][5][6] . Mathematically, stochastic reaction networks are modelled using continuous-time Markov chains whose distributions satisfy the chemical master equation (CME). As the availability of accurate single cell measurements widens, it is crucial to develop reliable methods for the analysis of the CME that facilitate parameter inference 7,8 , both for the identification of molecular mechanisms 9 and for the design of synthetic cellular circuits [10][11][12][13][14] .
Significant effort has been devoted to investigating the stationary solutions of CMEs, which determine the long time behaviour of the stochastic process 15 . While exact 16,17 Monte Carlo methods have been developed to sample from stationary solutions of some CMEs, analytical solutions are known only in a few special cases. In general, the CME is considered intractable 18 because, aside of systems with finite state space, it consists of an infinite set of coupled equations.
An approach to circumvent the intractability of the full CME is to compute moments of its stationary solutions. However, moment computations are only exa) Electronic mail: juan.kuntz08@imperial.ac.uk b) Electronic mail: p.thomas@imperial.ac.uk c) Electronic mail: Corresponding author: g.stan@imperial.ac.uk d) Electronic mail: Corresponding author: m.barahona@imperial.ac.uk act for networks of unimolecular reactions; in all other cases, the equations of lower moments involve higher moments, leading to an infinite system of coupled equations that cannot be solved analytically. Moment closure schemes, usually requiring assumptions about the unknown solution, are thus employed to approximate the moments [19][20][21][22][23] . Yet few of these methods provide quantified approximation errors 20 . Another approach is provided by mathematical programming techniques, which have been employed to compute bounds on the moments of Markov processes in various contexts. In such schemes, the finite set of moment equations is supplemented by moment inequalities and the moments are bounded by solving linear programs (LPs) 24,25 or semidefinite programs (SDPs) 26,27 . Alternatively, when the CME has a unique stationary solution, several truncation-based schemes [28][29][30][31][32] have been proposed to approximate the solution, although in most cases they do not provide estimates of the error they introduce.
Here, we present two different mathematical programming approaches that yield bounds on, and approximations of, the stationary solutions of the CME. Our first approach builds on our previous work 27,33,34 , and uses semidefinite programming to obtain upper and lower bounds on the moments of stationary solutions of networks with polynomial and rational propensities. The scheme constrains the possible solutions of a truncated, underdetemined set of moment equations by appending semidefinite inequalities that are satisfied by all probability distributions on the state space. Independently of this work, similar approaches have been recently proposed for networks with polynomial propensities 33,[35][36][37] .
Here we extend the mathematical framework to rational networks of interest in biochemistry, and we state rigorous, checkable mathematical conditions for the validity of the approach.
The second approach employs linear programming to obtain lower and upper bounds on stationary averages of the CME, using state space truncations guided by moment bounds computed with our first (SDP) approach. In the case of a unique stationary solution, we prove that the LP bounds converge to the true average as the truncation approaches the entire state space. Because stationary averages can be tailored to bound distributions, the LP bounds provide approximations that converge in total variation to the stationary solution and its marginals with a computable approximation error (see also Ref. 38 ). Additionally, the LP bounds provide a computational test for the uniqueness of the stationary solution, a prerequisite for most other approximation schemes. In the non-unique case, the scheme provides converging approximations of the ergodic distributions.
The paper is organised as follows. In Section II, we introduce definitions regarding stationary solutions of the CME. Section III presents the SDP method to bound moments: first, the conceptual framework is introduced analytically for a simple birth-death process that displays a chemical bifurcation, followed by the general computational approach for multi-species networks with rational propensities using semidefinite programming. Section IV presents the LP approach to bound and approximate entire stationary solutions, their averages and marginals: first, the mathematical framework is introduced through semi-analytical expressions for birth-death processes, followed by the general computational approach for rational networks using linear programming. In Section V, we apply the methods to three additional examples: a toggle switch, a model of bursty gene expression with negative feedback, and a model with multiple stationary solutions. We conclude with a discussion in Section VI. For completeness, Appendix A presents theoretical results linking stationary distributions of continuous-time chains and the stationary solutions of CMEs, and Appendix B presents a Foster-Lyapunov criterion that guarantees existence and finiteness of moments of the stationary distribution for the examples in the paper.

II. NOTATION AND DEFINITIONS
Stochastic biochemical kinetics under well-mixed conditions are usually described by a set of m reactions R j involving n species S 1 , S 2 , ..., S n : where v ± ij ∈ N denote the stoichiometric coefficients, and a j : N n → [0, ∞) is the propensity of reaction R j .
Formally, the state of the system is described by the random variable X(t) = (X 1 (t), . . . , X n (t)) ∈ N n , a vector with components representing the number of molecules of each species at time t. The dynamical process is modelled with a minimal continuous-time Markov chain 39 with rate matrix Q = (q(x, y)) defined by: q(x, y) := m j=1 a j (x)(1 x+vj (y) − 1 x (y)), (2) where v j := (v + 1j − v − 1j , . . . , v + nj − v − nj ) denotes the stoichiometric vector containing the net changes in molecule numbers produced by reaction R j , and 1 y denotes the indicator function of state y: The state of the system takes values in a subset S ⊆ N n , known as the state space, with (possibly infinite) cardinality |S|. The set S must be chosen such that −q(x, x) = y∈S, y =x q(x, y) < ∞ ∀x ∈ S, (5) in which case Q is said to be totally stable and conservative.
If the Markov chain cannot leave the state space in finite time, the matrix Q is said to be regular (see Appendix A). In this case, the collection of probabilities p t (x) of observing the chain in state x at time t ≥ 0 is the only solution of the chemical master equation (CME ) dp t (x) dt = p t Q(x), p 0 (x) = λ(x) ∀x ∈ S, where we define the vector p t := (p t (x)) x∈S and p t Q(x) := y∈S p t (y)q(y, x), Following standard convention, probability distributions and measures are defined throughout as row vectors. Any probability distribution π := (π(x)) x∈S that solves the equation is called a stationary solution of the CME. By definition, each stationary solution belongs to the space 1 := (ρ(x)) x∈S : x∈S |ρ(x)| < ∞ of absolutely summable real sequences indexed by states in S. The set of all stationary solutions forms a convex polytope in 1 : where π ≥ 0 is shorthand for π(x) ≥ 0, ∀x ∈ S. For most networks of interest the stationary solutions determine the long-term behaviour of the chain (see Appendix A). In many cases, we will be interested in obtaining the πaverage of a real-valued function f on S: For example, the k-th stationary moment is x k π .
III. BOUNDING THE STATIONARY MOMENTS OF THE CME As a first use of optimisation techniques, we present a systematic approach that yields bounds of increasing tightness on the stationary moments of reaction networks with polynomial or rational propensities, and we give rigorous sufficient conditions for its validity. The moment bounds obtained in this section will be used in conjunction with linear programming to bound the full stationary distributions of the CME in Sec. IV.
To motivate our optimisation approach, we first present the mathematical formulation through a simple example, for which explicit analytical expressions can be obtained (Sec. III A). For more complex systems, the approach can be implemented computationally in a systematic manner through a general semidefinite programming method (Sec. III B). Readers interested in the computational approach (and not the theory behind it) should skip Sec. III A and go directly to Sec. III B.
A. A simple analytical example: moment bounds for Schlögl's model To illustrate the mathematical framework, consider the classic autocatalytic network with a single species S proposed by Schlögl 40 as a model for a chemical bifurcation: with mass-action propensities: where k 1 , k 2 , k 3 , k 4 > 0 are rate constants. Here, n = 1 and the state space is S = N. The CME of this network has a unique stationary solution π and all of its moments are finite (see Appendix B). The reaction network, as encoded in the rate matrix Q, imposes certain relationships between the stationary moments x k π . Such relations form an infinite system of coupled stationary moment equations:  (29), an outer approximation of the set of stationary moment vectors. Black dots mark the upper and lower bounds on the first and second moments (30). (b) By appending further moment equations and inequalities, the outer approximations E d (45) can be tightened systematically as we increase the order d. The boundaries of the sets E d (lines in different colours) were computed explicitly by applying Mathematica's Reduce function to (45). The singleton set of stationary moment vectors (black dot) is always contained in E d . The increasingly tighter lower and upper bounds on the moments (49) (coloured dots) are computed by solving the SDPs (47)- (48). Parameter values: k1 = 1, k2 = 1, k3 = 0.8, k4 = 1.
Except in particular instances, it is not possible to solve this coupled system exactly, and closure approximations are usually adopted by neglecting higher order moments.
An alternative approach is that of mathematical programming, which includes additional constraints in the form of inequalities that must be fulfilled by the moments of distributions. Including such inequalities allows us to obtain feasible regions for the solutions of the system, and hence rigorous bounds for the moments. Increasing the number of inequalities considered, restricts the feasible region further and makes the bounds tighter.
Let us consider the first moment equation for (10): We say that E 3 is an outer approximation of the set of stationary moment vectors. Hence the stationary moments have the following lower and upper bounds: Such bounds are usually handled computationally, but it is illustrative to obtain explicit expressions in this simple case. Combine (24) with the equalities in (25) to get (the other case is analogous), we use the quadratic formula to obtain the bound Hence E 3 can be rewritten equivalently as: Figure 1(a) shows the projection of E 3 onto the y 1 -y 2 plane. From (29), it is clear that (1, 0, 0, b 1 /b 4 ) ∈ E 3 , and the lower bounds in (26) for the first two moments are trivial: (L 3 1 , L 3 2 ) = (0, 0). The upper bounds, however, are not. Since r + 2 (x) > r − 2 (x) for all x > 0, the upper bounds in (26) are obtained by the northeasternmost intersection of y 2 1 and r + 2 (y 1 ): (U 3 1 , U 3 2 ) = (r 4 , r + 2 (r 4 )), where r 4 is the rightmost root of In summary, we get: As seen in Fig. 1(a), the analytical bounds based on E 3 are rough. However, we show in the following section how to obtain tighter bounds systematically by appending further moment equations and inequalities and solving the associated optimisations over higher order sets E d .
B. The general approach: Bounding the moments of rational CMEs by solving semidefinite programs The approach in the previous section can be applied to any reaction network (1) with n species and state space S, as long as the propensities of its m reactions are rational (or polynomial) functions, i.e., they can be rewritten as where b 1 , . . . , b m and s are polynomials on R n and the common denominator s satisfies s(x) > 0, ∀x ∈ S.
To deal with multiple species, we use standard multiindex notation: x α := x α1 1 x α2 . . . x αn n , where α is the multi-index (α 1 , α 2 , . . . , α n ) ∈ N n and |α| := α 1 + α 2 + · · · + α n is the degree of the monomial. Polynomial functions are expressed in terms of such monomials. For instance, the denominator of (31) is: where d s is the degree of s and we define the (column) vector of coefficients We also define the stationary rational moments which are directly related to the raw moments: The following checkable assumption is a sufficient condition for our general SDP approach to apply to generic reaction networks with rational propensities.
Assumption 1 (Order of the approximation and finiteness of moments). Recall that d s is the degree of the denominator s in (31). Let us denote the order of the approximation by an integer d ≥ d s , and let us compile the stationary rational moments (34) up to order d into the (column) vector of dimension # d := n+d n . We assume that all stationary solutions of the CME have finite rational moments up to order d + 1, i.e., This requirement can be verified using a Foster-Lyapunov criterion as detailed in Appendix B.
In order to write down each α-moment, it is helpful to define the associated polynomial function g α (x): with degree d gα = |α|+d b −1, where d b := max{d bi } is the maximum degree of the numerators in (31). We also define the (column) vector of polynomial coefficients The finiteness of moments guarantees that a subset of moment equations will hold, as stated in the following lemma.
Lemma 3 (The moment equations). If Assumption 1 is satisfied and π ∈ P, then the α-moment equation Proof. Consider the adjoint of (8): which, by Fubini's theorem, is valid for any f as long as (40), we get: In addition to the moment equations (39), the moments z satisfy additional constraints. Firstly, since π is a probability distribution, we have: Furthermore, the rational moments satisfy well-known semidefinite inequalities 34,42,43 . Specifically, the localising matrices are positive semidefinite: Similarly, it can be shown 42,43 that Since (43)

Bounding the moments
We can then establish the following lemma regarding outer approximations of the set of rational moments.
Lemma 5 (Outer approximations of the set of rational moments). If Assumption 1 is satisfied and π ∈ P, the vector of rational moments z belongs to the spectrahedron where d b is the maximum degree of the numerators in (31), and s, g α and M i d (y) are defined in (32), (38) and (42), respectively.
In summary, the vectors of stationary moments of order d are contained in a feasible set E d , defined by linear equalities and inequalities, which constitutes an outer approximation to the set of moment vectors. There are several implications of this lemma.
Firstly, the outer approximation property implies that extremal points of E d provide bounds on the stationary moments. Specifically, the vector in E d with largest (resp. smallest) α-entry provides an upper (resp. lower) bound on the α-moment. For example, Fig. 1(b) shows the projection of E d with increasing d onto the y 1y 2 plane for Schlögl's model. The northeasternmost (resp. soutwesternmost) vector of these outer approximations yield upper (resp. lower) bounds on the first two moments of Schlögl's model.
Secondly, note that the moment matrix M i d (y) is a principal submatrix of M i d+1 (y). Since a matrix is p.s.d. if and only if all of its principal submatrices are p.s.d. and E d+1 includes all moment equations in E d , then it follows that every vector in E d+1 (appropriately truncated) belongs to E d . As a result, the outer approximations tighten around the set of stationary moment vectors with bounds of increasing quality as the order of the approximation d is increased, as seen inFig. 1(b). These two observations are summarised in the following corollary.
Corollary 6 (Monotonic moment bounds). Suppose that Assumption 1 is satisfied and π ∈ P. If f is a polynomial where Proof. Note that z ∈ E d because Applying these results to x α π , the α-moment of the CME, is straightforward. Let f (x) := s(x)x α and choose d ≥ |α| + d s to obtain the bounds Corollary 6 establishes that outer approximations E d of increasing order can be used to compute a monotonically increasing (resp. decreasing) sequence of lower (resp. upper) bounds for x α π : Remark 7 (The sequence of moment bounds is monotonic but may not converge). The monotonicity of the bounds does not imply that the gap between the bounds (U d α − L d α ) will converge to zero as d → ∞. Although in our experience the bounds often converge numerically, there is no general guarantee for several reasons. Firstly, the stationary solution may not be unique and, in that case, the lower bounds are limited by the stationary solution with the smallest moment while the upper bounds are limited by that with the largest moment. Even if the solution is unique, the bounds may not converge because the semidefinite conditions are tailored to distributions with support on the non-negative real orthant but not to distributions with support on discrete state spaces 44 , for which more stringent conditions can be produced at a higher computational cost 42,44,45 .

Computing moment bounds via semidefinite programming
Given a reaction network with rational propensities and a polynomial f of degree d f , the moment bounds (47)- (48) are the extreme points of the linear functional y → f T y over the set E d , which is defined by linear equalities and semidefinite inequalities. Hence, computing the bounds amounts to solving a semidefinite program (SDP), a convex optimisation problem for which there exist efficient computational tools. Therefore, instead of ad hoc analytical manipulations, like those leading to (30), a general procedure by constructing and solving the SDPs systematically is implemented as follows: 1. Rewrite the reaction propensities in the form (31) removing all common factors and setting s to be the lowest common denominator.
2. Verify the existence of stationary solutions π and choose the order of the approximation d, an integer d ≥ d f for which the d + 1 stationary moments are finite (Assumption 1) using a Foster-Lyapunov criterion (Theorem 31 in Appendix B).
3. Compute the bounds L d f and U d f by solving the two SDPs (47)- (48). We set up the SDPs using the modelling package YALMIP 46 , and solve them using the multi-precision solver SDPA-GMP 47 with the interface mpYALMIP 48 . Examples of computation times are given in the figure captions.
SDPs involving high order moments can be numerically ill-conditioned 27,44,49 . Although the origin of this numerical instability remains an open problem, our computations suggest that it could be the result of the rapid growth of moments, which leads to ill-conditioned moment matrices. Such disparity is problematic for standard double-precision SDP solvers but we have mitigated it with the multiprecision solver SDPA-GMP 47 as in Ref. 49  Corollary 6 guarantees that the bounds will not loosen as we increase d, yet the bounds may stagnate (Remark 7). In this case, we recommend breaking the impasse by employing the LP approach of the next section (see Fig. 7(b)).
As an example of this procedure, Fig. 2(a) shows how the computed upper and lower bounds (L d α , U d α ) for the first three stationary moments of Schlögl's model become tighter as we increase d, the order of the approximation. Fig. 2(b) combines the moment bounds to obtain bounds on commonly used statistics, e.g., variance, coefficient of variation, and skewness.

IV. BOUNDING AND APPROXIMATING THE STATIONARY SOLUTIONS OF THE CME
The semidefinite programming scheme in the previous section allows us to compute bounds of the stationary moments of the CME by constructing and optimising over outer approximations of the set of stationary moments. In this section, we go further and introduce a linear programming scheme that yields bounds on the full stationary solutions of the CME, their marginals and their averages by constructing and optimising over outer approximations of the set of stationary solutions of the CME. To do so, we use a moment bound obtained in the previous section. Note that the approximations in this section only involve linear inequalities (instead of semidefinite ones). Hence the optimisations to be solved are linear programs (instead of SDPs), a simpler subclass of convex optimisation problems for which mature, industrial solvers 52 are available.
As for the SDP scheme above, we introduce the mathematical approach through a simple semi-analytic example (i.e., birth-death processes) in Sec. IV A, and then present the computational framework for general CMEs in Sec. IV B. Readers interested in the computational implementation (and not the mathematical background) should skip Sec. IV A and go directly to Sec. IV B.

A. A simple example: bounding the stationary solution of birth-death processes
Birth-death processes are one-species reaction networks (n = 1) with state space S = N whose value x ∈ N changes by ±1 in each reaction: The specific birth-death process is defined by the functional form of the given propensities a + (x) and a − (x). The stationary equations πQ = 0 then read Assuming non-vanishing death rates it is well known that the unique stationary solution is: 53 with π(0) given by the normalisation condition: where we have introduced the notation for sums over sets: Hence birth-death processes have at most one stationary solution, which exists if and only if γ(S) is finite.

Semi-analytical approach for bounds and approximation
For most birth-death processes, no closed-form expression for γ(S) is known and, consequently, the stationary solution cannot be computed exactly. However, we can obtain upper and lower bounds for the distribution, as follows.
Let us consider a state space truncation with size |S r | = r 1/α controlled by the parameters α ∈ Z + and r > 0. Let S c r denote its complement, i.e., the set of states outside of the truncation S r .
Let us assume that we have available an upper bound on the stationary α-moment: Note that for rational propensities, such a bound can be computed with the SDP scheme in Sec. III. Upper bound: An easy upper bound on π(0) is obtained by truncating the sum in (55) to get: whence it follows that Lower bound: Using (58), we obtain a bound on the probability mass m r outside of the truncation: We say that ε r is a tail bound.
A lower bound on π(0) then follows from (54)- (60): whence we obtain a lower bound for π(x): Convergent bounds: We have thus shown that and it is easy to see that both bounds converge to the stationary solution as the size of the truncation grows: as r → ∞, both u r x → π(x) and l r x → π(x). This follows from (54)-(55) and ε r → 0.
Approximating the distribution and the approximation error: Motivated by these facts, we define the two following measures (lower and upper bounds padded with zeros), and introduce them as approximations for π(x): π π, where π = l r or π = u r .
We quantify the approximation error of π with the total variation norm: where π(A) and π(A) are sums over sets, defined in (56). For u r and l r , the approximation error can be chararacterised further. Using (59)-(65), we have: Since ε r = c/r → 0 as r → ∞, it thus follows that both l r and u r converge in total variation to π.
We summarise these findings in the following theorem: Theorem 8 (Bounds and approximations of the stationary solution of birth-death processes). Consider any birth-death process (50) with non-vanishing decay rates (53) and finite sum γ(S) (55), such that it has a unique stationary solution π (54). Suppose that π satisfies the moment bound (58) and let S r ⊆ S be the truncation (57) of the state space controlled by the parameters r, α ∈ Z + , with tail bound m r = π(S c r ) ≤ c/r = ε r . Then the following hold: (i) The values of the distribution over the truncation are bounded above and below: The measures l r = (l r (x)) x∈S and u r = (u r (x)) x∈S defined in (64)-(65) approximate the solution with approximation errors: ||π − l r || = ε r and ||π − u r || = m r .
(iii) The bounds vary monotonically with r: and the sequences of approximations converge in total variation to π: Proof. This follows from (59)-(68) and Corollary 6.
Remark 9. If Assumption 1 holds, then π satisfies the moment bound (58) (48). An application of Theorem 8: Schlögl's model To illustrate our results, we apply Theorem 8 to compute bounds on the unique stationary solution of Schlögl's model (10)- (12). This model is a birth-death process for which an explicit analytical stationary solution can be obtained, thus allowing us to test the results directly without any simulations.
Through some analytical manipulations, the solution of Schlögl's model can be obtained explicitly in terms of: where 2 F 2 denotes the generalised hypergeometric function; c 1 := 1 − 4k 3 /k 1 ; and c 2 : The stationary solution goes from being unimodal (black line, Fig. 3(a)) to bimodal (black line, Fig. 3(b)) depending on the parameter values, analogously to a bifurcation. In Fig. 3(a)-(b), we compare the approximations l r and u r given in Theorem 8 (colour shades) to the analytical solution (black lines) of the unimodal and bimodal    (54) and (69) cases. As the size of the truncation (controlled by the parameter r) is increased, the bounds tighten around the analytical solution. In Fig. 3(c)-(d) we show that the approximation error tends to zero as the size of the truncation |S r | = r 1/α is increased. In the unimodal case ( Fig. 3(c)), the approximation error decreases rapidly when |S r | is larger than the mode. Furthermore, when the truncation is sufficiently large, employing bounds on higher order moments (larger α) provides tighter tail bounds and smaller approximation errors. On the other hand, if the size of the truncation is smaller than the mode, using higher order moments does not necessarily improve the approximation error. A similar dependence of the approximation error is observed in the bimodal case ( Fig. 3(d)), but the approximation error only decreases when the truncation size is larger than the second (larger) mode. This example shows how the ability to compute error bounds can reveal the presence of modes outside of the truncation.

Reformulation of the bounds as optimisations
The truncation method leading to Theorem 8 relies on the detailed balanced structure of birth-death processes. However, this semi-analytic method is not generalisable to arbitrary reaction networks. Instead, the bounds can be reformulated as an equivalent (and generalisable) optimisation problem, as follows.
Consider a truncation S r (57) controlled by the parameters α, r ∈ Z + with tail bound m r = π(S c r ) ≤ c/r = ε r . Definition 10 (Restriction of π to S r ). The restriction of π to S r is: The restriction π |r belongs to the convex polytope: where N r := {0, 1, . . . , r 1/α − 1}, which follows directly from the definition of the restriction (70); the tail bound (60); and the fact that the stationary equations (51)-(52) with x < r 1/α − 1 only involve states inside of the truncation. From the definition of the polytope (71), we can show that the bounds of the stationary solution in Theorem 8 (i) are recovered by optimising over P r , as stated in the following theorem.
Theorem 11 (Bounds and LP formulation). The bounds l r and u r in (64)-(65) are obtained by optimising over the polytope P r : Proof. We only present the argument for the upper bounds-the proof for the lower bounds is analogous. If x ∈ S r , the result is trivial. A distribution π r satisfies (71) if and only if π r (x) = γ(x)π r (0), ∀x ∈ S r , where γ(x) is given in (54). Therefore, we have which follows from π r (0) = π r (S r )/γ(S r ) ≤ 1/γ(S r ). However, u r clearly satisfies all constraints in (71), including the moment constraint: which follows from π(x) = γ(x)/γ(S) and the inequality Hence u r ∈ P r and together with (72), this completes the proof.
Importantly, the definition (71) only involves linear equations and inequalities. Therefore optimising over the polytope consists of solving a linear program, a class of optimisations for which there exist powerful computational platforms and algorithms. Furthermore, this optimisation reformulation can be extended seamlessly to arbitrary networks, as expanded in the next section.

B. Generalisation to arbitrary networks via linear programming
We now generalise the optimisation approach to obtain bounds and approximations with controlled errors of the stationary solutions of arbitrary reaction networks.
To characterise the solutions of the CME, we choose a norm-like function w, which plays the same role as the moment function (x α ) in Section IV A 2.
and has finite sublevel sets: Furthermore, we require that the growth of w be dominated by the stationary solution π, so that its expectation with respect to π is finite. We summarise these requirements in the following checkable assumption.
Assumption 13 (Existence of CME solutions and moment bound). We assume that the CME has at least one stationary solution π, and that we have available a normlike function w with sublevel sets S r such that every stationary solution π satisfies where c is a known constant. This inequality can be thought of as a generalisation of (58), hence we refer to it as a moment bound.
The existence of the stationary solutions can be verified on a case by case basis using a Foster-Lyapunov criterion (e.g., Theorem 31 in App. B).
Regarding the moment bound, in the case of networks with rational propensities satisfying Assumption 1, w can be chosen to be any norm-like rational function with numerator of degree d and the bounding constant c can then be computed using the SDP approach of Sec. III. For general reaction networks, the moment bound can be obtained using Foster-Lyapunov criteria 41 (see Appendix B).
In analogy with Lyapunov theory, the sublevel sets (74) of w play an important role in characterising the stationary solutions of the CME. Specifically, we use the sublevel sets S r as our state space truncations, noting that (75) allows us to establish a bound on the mass of the tail of the distribution outside of S r : (76) Just as in the previous section, this choice yields a sequence of increasing truncations that approach the entire state space: For each truncation, let N r denote the set of states x ∈ S r that cannot be reached in a single jump from outside of the truncation: and the associated convex polytope: which, analogously to (71), includes all the stationary equations that only involve states in S r . We then have the following lemma: Lemma 14 (Outer approximations of P). Suppose that Assumption 13 holds and let π |r be the restriction of π to S r , as defined in (70). If π ∈ P, then π |r ∈ P r .
Proof. This follows directly from (76) and the fact that The outer approximation property means that optimising over P r provides convergent bounds on the averages of functions f on the state space, as summarised in the following theorem.
Theorem 15 (Convergent bounds of stationary averages). Consider a reaction network (1) with state space S, rate matrix Q satisfying (2)-(5), and stationary solutions π forming the set P (9) and suppose that Assumption 13 holds.
If π ∈ P and f : S → R is any real-valued function, then we can bound its averages over the restrictions: where π |r is the restriction of π to S r defined in (70) and the bounds are given by: l r f := inf{ f π r : π r ∈ P r } u r f := sup{ f π r : π r ∈ P r }.
If we have additional information on f , we have the following bounds on the full π-averages: (iv) If the growth of f is stricly dominated by w as the size r of the truncations S r increases, i.e., then f is π-integrable for all π ∈ P and the sequences of bounds from below (l r f ) r∈Z+ and above (u r f ) r∈Z+ converge: lim r→∞ l r f = l f := inf{ f π : π ∈ P}, lim r→∞ u r f = u f := sup{ f π : π ∈ P}.
(iii) is a consequence of (79), the moment bound (75), and the following inequality: (iv) has two parts. First, the π-integrability of f follows from Second, the convergence of the bounds follows from the fact that every subsequence of (l r f ) r∈Z+ has a converging subsequence (l r k f ) k∈Z+ with limit π(f ), where π ∈ P (see Remark 16 and Theorem 3.5 in Ref. 38 ). By definition, l f ≤ π(f ), hence taking limits in (81) shows that l f ≥ π(f ) and the result follows. The case of the upper bounds is identical. For details, see Corollary 3.6(i) in Ref. 38 .
Remark 16 (LP computation). The bounds l r f and u r f in (80) are obtained by solving two linear programmes (LPs) with |S r | variables, |N r | equality constraints, and |S r | + 3 inequality constraints. LP solvers return the optimal value l r f (or u r f ) and an optimal point π * ,r , such that f π * ,r = l r f (or f π r = u r f ). The optimal points exist because the LPs are optimisations of a continuous function over a compact non-empty subset of R |Sr| .
Theorem 15 provides a general framework to obtain bounds (80) that can be used as approximations of stationary averages f π with a quantifiable error given by (i)-(iii); furthermore, under the conditions in (iv), the approximations converge to f π as the truncations approach the entire state space S of the reaction network.

The case of a unique distribution: bounds and approximations
Throughout this section, we assume that the CME has a unique stationary solution π, i.e., In this case, the results of Theorem 15 can be strengthened. and In particular, the feasible points π r ∈ P r are good approximations of the stationary solution in the sense that they converge to π in weak * , as detailed in the following corollary.
Corollary 17 (Convergence of bounds and feasible points for a unique solution). Let us assume that the conditions of Theorem 15(iv) hold and, in addition, that P = {π} consists of a single stationary solution π. Then the upper and lower bounds (80) converge to the average: and any sequence of feasible points (π r ) r∈Z+ belonging to the outer approximations (P r ) r∈Z+ (i.e., π r ∈ P r , ∀r ∈ Z + ) converges to π in weak * : for any function g that satisfies (82).
Proof. This proof is similar to that of Theorem 15(iv). For full details, see Corollary 3.6(iii) in Ref. 38 .

Remark 18.
When w is norm-like, convergence in weak* implies convergence in total variation-see Appendix B of Ref. 38 for a proof.
Remark 19. Corollary 17 shows that, given a sufficiently large truncation, the feasible points π r provide arbitrarily accurate approximations to π, yet with no quantifiable bound on the approximation error ||π − π r ||.
Bounding and approximating a unique stationary solution with quantifiable error: Corollary 17 can still be used to obtain approximations with quantified errors of the distribution π itself. Specifically, by applying (85) repeatedly with specific f s, the indicator function at each point of the truncation. This allows us to compute bounds on each value of the distribution π(x) in S r .
Given a truncation S r , let us define the set of indicator functions {1 x } x∈Sr , one for every state in the truncation, where each 1 x is defined as (3). For each function 1 x in the set (i.e., for each state in the truncation), we compute the bounds (79) by solving the LPs (80) with f = 1 x . Hence we obtain lower and upper bounds on 1 x π |r = π(x), for all states in the truncation: u r x = u r 1x = sup{π r (x) : π r ∈ P r }.
As in (64)- (65), we then collect these bounds, pad them with zeros, and define two approximations for π: These approximations of π have controlled errors, as summarised in the following corollary.
(ii) As the truncation size r approaches infinity (and S r approaches S), the approximation u r converges pointwise to the unique π, and the approximation l r converges to π in weak * (86) to π.
Proof. The bounds (92)-(93) follow directly from Theorem 15. The error (94) follows from (92)-(93) and the fact that the total variation norm of an unsigned measure is its mass. Similarly, (95) follows from (76) and Theorem 15(iv) shows that l r and u r converge pointwise to π. For any f satisfying (82) and r, r ∈ Z + , we have that Using the pointwise convergence of l r and (84), we can pick r such that the second sum is arbitrarily small and, subsequently, an r such that the first sum is arbitrarily small. Hence the weak * convergence of l r follows. See Theorem 4.1 in Ref. 38 for details.
Corollary 20 states that, for sufficiently large r, l r and u r are close to π. In contrast with the feasible points π r , we can answer the question "is r sufficiently large?" by evaluating the errors ε l r (94) and ε u r (95). Since lim r→∞ ε l r = 0, we will always find an approximation l r that verifiably meets any given error tolerance by increasing r.

Remark 21.
Although we have no proof that ε u r converges to zero (nor that u r itself converges to π in total variation), all the examples we have encountered in practice exhibit convergence of u r and ε u r . To instead answer the question 'when is r too small?' (i.e., to establish how large the partition S r must be to guarantee a given approximation error), the following proposition is of use.

Proposition 22 (Achievable approximation errors).
Under the same conditions as in Corollary 20, the errors of the approximations l r (90) and u r (91) cannot be made smaller than the tail bound or the mass of the tail, respectively, i.e., where m r = π(S c r ) ≤ ε r = c/r. Proof. The inequality (98) follows directly from (95). For (97), recall that there exists at least one optimal point π * ,r such that π * ,r (x) = l r (x) for any x ∈ S r (Remark 16). It is straightforward to verify that 1 − ε r π * ,r (S r ) π * ,r ∈ P r , which implies π * ,r (S r ) = 1 − ε r (due to the minimality of π * ,r (x)). Since l r bounds from below all feasible points of P r , it follows that l r (S r ) ≤ π * ,r (S r ) = 1 − ε r . Combined with (94), this gives (97).
In other words, the approximation error of the lower bounds is no smaller than the tail bound, whereas that of the upper bounds is no smaller than the tail mass.

Approximating marginal distributions:
For high-dimensional state spaces, we are often interested in marginal distributions rather than the full multivariate solution π defined on S. A marginalisation is associated with a partition of the state space into a collection of disjoint subsets: and the marginal distribution is defined with respect to each subset: Because {A i } i∈I is a partition of S,π is a probability distribution on I. Typically, we are interested in marginalising the distribution of a reaction network with n species (and state space S = N n ) over a subset of species. For instance, if we are interested in the molecule counts of species k, we consider the following (infinite) set of subsets: whose union trivially recovers the entire state space. Associated with this set {A i } i∈N we then have the marginal distributionπ π(i) = π ({x ∈ N n : x k = i}) , ∀i ∈ N which, in this case, corresponds to the (univariate) distribution describing the molecule counts of the k th species. The marginal distributionπ can also be bounded and approximated following a similar procedure to the one described above for the full distribution. Using the indicator functions 1 Ai as the functions f , we solve the analogous LPs: for all the subsets A i that intersect with the truncation, i.e., I r = {i ∈ I : A i ∩ S r = ∅}. As before, we construct two approximations by padding (101)-(102) with zeros: l r := (l r (i)) i∈I ,l r (i) := lr i if i ∈ I r 0 if i ∈ I r (103) which are the analogues for the marginal distribution of the approximations to the entire distribution (90)-(91), and have similar (but not identical) properties, as summarised in the following two corollaries.
Furthermore, as the truncation size is increased (r → ∞ and S r approaches S),l r converges toπ in total variation.
Proof. This proof is analogous to that of Corollary 20 except that one needs to use Corollary 4.3 in Ref. 38 (with g(x, y) = 1 Ay (x) instead of Theorem 4.1 in Ref. 38 . Proof. The proof is analogous to that of Corollary 20.
Note thatû r does provide a controlled approximation of the marginal distribution, as it is a pointwise convergent approximation toπ with a guaranteed, computable error boundε u r (106). However, using the fact that the probability mass of A i ∩ S c r is bounded by the mass of the tail m r (76), we have the following easy (but loose) upper bounds: π(i) ≤û r (i) + c/r ∀i ∈ I r . (107)

Non-uniqueness, ergodic distributions and a uniqueness test
Theorem 15 shows that our LP optimisation over the polytopes P r yields bounds on the stationary averages, even if there are multiple stationary solutions. In the non-unique case, however, the gap between the lower bounds and the upper bounds will reflect the fact that the extreme points of π → f π over P can be achieved by different solutions in the polytope. Yet it is possible to characterise further the set of solutions and the extreme points in terms of the ergodic distributions of the CME, and use this description to turn our LP approach into a test of uniqueness, as we show below.
To see how multiple stationary solutions of the CME can arise, consider the simple reaction network with mass action kinetics. It is clear that its state space S = N 2 decomposes into three disjoint sets: where C 1 and C 2 are closed communicating classes and T contains the remaining states. A set C ⊆ S is a closed communicating class 39 if the chain can transit between any pair of states in C but cannot leave C. Another common source of multiple closed communicating classes are conservation laws in reaction networks 54 . For instance, the reactions conserve the quantity n = x 1 + 2x 2 . Hence there exists a different closed communicating class for every n ∈ N.
The closed communicating classes are intimately related to the stationary solutions, as summarised in the following theorem that compiles some facts that are broadly known in the literature. 15 ). Consider a reaction network (1) with rate matrix Q satisfying (2)-(5), assume Q is regular, and decompose the state space as

Theorem 27 (Ergodic distributions and communicating classes
where C j are closed communicating classes and T contains the remaining states. (i) For each C j , there is at most one stationary solution π j ; hence π j (C j ) = 1. Whenever it exists, π j is known as the ergodic distribution associated with C j .
(ii) Let J be the set of indexes j of the ergodic distributions π j . The set of stationary solutions P (9) is the set of convex combinations of the ergodic distributions: Proof. If Q is regular, the stationary solutions of the CME are the stationary distributions of the chain (Theorem 30 in Appendix A). Using this fact, (i) can be found in most books on continuous-time chains (e.g., Ref. 39 (Th. 3.5.2)), and (ii) is given in Ref. 15 (Th. 3.4).
Theorem 27 states that the ergodic distributions π j are orthogonal to each other and that they are the extreme points of the convex polytope P (9) of stationary solutions of the CME. Because P is contained in the nonnegative orthant of 1 , it follows that each face of the non-negative orthant contains at most one of the ergodic distributions.
Using this fact, we obtain a computational test of the uniqueness of stationary solutions, as summarised in the following corollary.

Corollary 28 (A uniqueness test). If Assumption 13 holds and Q is regular, then
Proof. The proof is by contradiction. Suppose that P is not a singleton. Theorem 27(ii) implies that P contains two or more ergodic distributions, π j , each associated with a different closed communicating class, C j . Let us consider, e.g., a state x ∈ C 1 . Since the classes C j are disjoint, then π j (x) = 0, ∀j = 1, which contradicts the lower bound property of l r (Theorem 15(i)). Hence, P must be a singleton. The converse follows from the convergence of the bounds in Corollary 20 (ii).
In other words, if π is unique, then the lower bound of any (and all) states in the support of π is non-zero for sufficiently large r. Conversely, finding a single non-zero lower bound for any x ∈ S provides a proof of uniqueness of the distribution. Hence, if there is more than one ergodic distribution, all the lower bounds are zero for all states in the state space S.
When π is unique, l r or u r are good approximations of the stationary solution. However, this is not so in the non-unique case. Indeed, it is easy to show that, in the non-unique case, the lower and upper bounds are always loose: Corollary 28 shows that the lower bounds are trivially zero everywhere (l r = 0), whereas Theorems 15(ii) and 27 imply that for large r, the mass of u r will be no smaller than the number of ergodic distributions: hence the upper bound is not tight. However, our LP framework can still be used to obtain approximations of the ergodic distributions by using sequences of feasible points π r . To this end, we require the following generalisation of Corollary 17.
Corollary 29 (Convergent approximations of ergodic distributions). Let f be any function that satisfies (82) (i.e., f is dominated by the norm-like function w as the size r of the truncations increases), and let (π * ,r ) r∈Z+ be a sequence of optimal points such that π * ,r ∈ P r and f π * ,r = u r f = sup{ f π r : π r ∈ P r }, for all r ∈ Z + .
(ii) If f is the indicator function of a state (or of a subset) that is contained in a closed communicating class C j with associated ergodic distribution π j , then the sequence of optimal points (π * ,r ) r∈Z+ converges to π j in weak* as r → ∞.
Proof. The proof of (i) is similar to that of Theorem 15(iv). See Corollary 3.6(ii) and Remark 3.7 in Ref. 38 for details. (ii) follows from (i), Theorem 27(ii), and the fact that indicator functions satisfy (82).
Corollary 29 provides a rationale for how to use our computational framework to obtain approximations of the ergodic distributions π j in the non-unique case. Importantly, we do not need to know a priori what the closed communicating classes are. Using the indicator function for a chosen state x, we obtain the sequence of optimal points π r satisfying π r (x) = u r (x). Should x belong to a closed communicating class C j with ergodic distribution π j , Corollary 29(ii) shows that π r will converge to π j as r tends to infinity. Indeed, by looking at the states for which π r (x) > 0, we can in principle deduce which communicating class C j (if any) the state belongs. Once the class is known, we replace S with C j and proceed as for the unique case to obtain bounds on π j . See Section V C for an example of the application of this procedure.

Computational implementation and numerical considerations
Let us consider a given reaction network with rational propensities. In order to obtain approximations of its stationary solutions with controlled error smaller than a tolerance , we proceed as follows: 1. Verify the existence of stationary solutions π and the finiteness of their moments (Assumption 1) using a Foster-Lyapunov criterion (Theorem 31 in App. B).
2. Choose a norm-like rational function w and define the truncations S r as the sublevel sets (74) controlled by r.
We have found it best to choose functions w that define truncations that cover most of the probability mass and that tend quickly to infinity, so that the size of the truncation grows slowly with r. For example, if we take w(x) = x α , then higher values of α induce smaller truncation sizes |S r | ≈ α √ r. To guide the selection of w, one can run the scheme with various w to gain information about the shape of the distribution.
3. Use the SDP approach of Sec. III B to find a moment bound (75)  For large truncations, the coefficients in the constraints of the LPs span many orders of magnitude, leading to round-off errors in double-precision arithmetic and poor solver performance. One way to ameliorate this issue is to scale the decision variables; in particular, scaling π r (x) by −q(x, x) or w(x) often significantly improves solver performance.
6. Evaluate the error of the approximations l r and u r using (94) and (95), respectively. If the error is larger than our tolerance , we increase the truncation size r and return to the previous step.
7. In addition to approximating the full distribution, we can apply the above steps to compute other measures of interest by changing the LPs and associated errors in Steps 5-6: • If we want to approximate a marginal distribution, we solve the LPs (101)-(102) and quantify the error using (105)-(106).
• If we are interested in a particular stationary average f π , we instead solve the LPs (80) and control the error using the bounds in Theorem 15(i)-(iii).
8. As the particular reaction network could have several stationary solutions, we check in Step 5 for non-trivial lower bounds, l r (x) > 0. If we find one such bound, the solution is unique (Corollary 28).
Otherwise, we investigate further the uniqueness question by increasing r and recomputing the lower bounds to examine the presence of communicating classes as discussed in Corollary 29.
Our computations were carried out on a desktop computer with a 3.5GHz processor and 16GB of RAM.

V. APPLICATION TO BIOLOGICAL EXAMPLES
We now present the application of the methodology to three examples. First, we showcase how to obtain tight bounds on the stationary solution (and marginals) of a two-dimensional toggle switch. Second, we consider a model of bursty gene expression with negative feedback, through which we explore the capabilities of our method to deal with promoter switching noise. Third, we demonstrate the application of our methods to the non-unique case with a dimerisation network. The code used to compute the approximations and bounds for this last example is available at 55 .

A. A toggle switch
Toggle switches are common motifs in many cell-fate decision genetic circuits 16,56,57 . A simple such circuit consists of two mutually repressing genes 56 . In particular, we consider the asymmetric case with mutual repression modelled via Hill functions and dilution/degradation modelled via linear decay: The state space of the CME is x ∈ S = N 2 with x = (x 1 , x 2 ), where x 1 and x 2 denote the number of protein P 1 and P 2 , respectively, and the propensities of the reactions are: where the k i > 0 are kinetic constants and θ > 0 is the dissociation constant of P 1 . We follow the steps detailed in Section IV B 3 to obtain bounds and approximations for this reaction network. First, we show that a stationary solution π exists and that all of the moments of every solution are finite using a Foster-Lyapunov criterion (App. B).
We pick the norm-like function and compute the moment bound w π ≤ c := 4.48 × 10 8 by solving the SDP (48) with d := 10 and f := w (solver time = 3.6 minutes). We then solve the LPs (88)-(89) and compute the bounding approximations u r and l r of the stationary solutions. The fact that the lower bounds l r are non-zero provides us with a proof of uniqueness of the stationary solution. Figure 4 shows the bounds for small (r = 45) and large (r = 75) state space truncations. The maximum absolute discrepancies are found near the modes, and by increasing the size of the truncation, the upper and lower bounds become nearly indistinguishablethe maximum discrepancy drops under 10 −4 (Fig. 4(d)). Overall, the total approximation error including the tail is less than 2.6 × 10 −3 , as given by (94)-(95).
We have also used our method to obtain approximations on the marginal distributions of the number of proteins P 1 (x 1 ) and P 2 (x 2 ) (Sec. IV B 1). The results in Fig. 5(a) show that the bounds get tighter for truncations of increasing r (although, as discussed in Remark 26, the upper bound (107) remains loose). Note, however, that the approximationû r in Fig. 5(b) rapidly approaches the Gillespie numerical simulations. Fig. 5(c) shows that the errors of bothl r andû r can be made arbitrarily small by increasing the truncation size (Corollaries 24-25).
Finally, we apply the method to chart the change of the stationary solution as a function of a parameter. In particular, the dissociation constant of protein P 2 (θ) can be thought of as a bifurcation parameter: increasing θ allows for higher expression of protein P 1 . Fig. 6 presents the lower boundsl on the marginals of both proteins. At small values of θ, we observe a single population with high numbers of P 2 repressing P 1 . At large values of θ, the opposite happens: the population we observe has high numbers of P 1 repressing P 2 . For intermediate θ, we observe coexistence of both populations. Indeed, we find that the modes of the marginal distributions are in good correspondence with the stable solutions of the deterministic steady-state rate equations.

B. Bursty gene expression with negative feedback
As a second example, consider a model of bursty production of a protein that regulates (negatively) its own expression. The model 58 involves a promoter that switches between active (G on ) and inactive (G off ) states, and the protein P it encodes. When the promoter is on, the protein is expressed in bursts of size b, a geometrically distributed random variable 59 with mean b . The protein represses its own production by switching off the promoter: The state space of the CME is x = (x 1 , x 2 ) ∈ S = {0, 1}× N, where x 1 = {0, 1} is a binary variable describing the off/on state of the promoter and x 2 ∈ N represents the protein count. The propensities are where the k i > 0 are reaction rate constants. In App. B we show that the network has a unique stationary solution π and that all of its moments are finite.
This example provides an interesting test case for SDP methods since the protein noise is particularly large: CV (x 2 ), the coefficient of variation of x 2 , grows 58 with the burst size b . Therefore, we expect that getting tight bounds for the CV will entail the use of a large number of moment equations. To investigate the effect of such large noise on the efficacy of our SDP method, we compute the following bounds: where we use (47)-(48) and we append the following equalities to our SDP (45): Figure 7(a) shows how the bounds (115) get tighter as we increase the number of moment equations in our SDP calculations. As expected, for small mean burst sizes ( b = 1), the bounds become tight with 10 moment equations, but tightening the bounds becomes difficult when the burst size is larger ( b = 10, 100).
Computing tight bounds for large b with the naive SDP approach would thus require a prohibitive number of moment equations. However, we can apply the LP method of Sec. IV to overcome this limitation. To do this, use SDP to compute a (cheaper) loose upper bound on the sixth moment L 13 in Theorem 15 to obtain: which we combine as in (115) to obtain much tighter bounds on CV(x 2 ). Figure 7(b) shows the convergence of these tight bounds for CV(x 2 ) with b = 100 as the truncation size is increased. These results exemplify the fact that it is enough to obtain loose SDP bounds on a higher order moment in order to obtain arbitrarily tight LP bounds on lower order moments (Theorem 15).
Finally, we exemplify in Figure 7(c) another use of our capability to bound marginal distributions following the steps in Section IV B 3. In this case, we marginalise over the on/off promoter variable (x 1 ) and we compute upper boundsû r and lower boundsl r (visually indistinguishable in Fig. 7(c)) onπ(x 2 ), the distribution of protein counts, for three different burst sizes, b = 1, 10, 100. As expected, the protein distribution widens considerably (yet still with light tails) as the burst size increases. To compute these bounds, we set  In total, 3486 bounds were computed to obtain the full bifurcation diagram (θ increased in steps of 0.15): solver time = 62 minutes, average of 1 second per bound. Approximation error:ε l r ≤ 3 × 10 −3 , ∀θ. All parameters (other than θ) as in Fig. 4.

C. Dimerisation network with multiple stationary distributions
To illustrate the use of our method on CMEs with multiple stationary solutions (see Section IV B 2), we consider the reversible dimerisation network with state space S = N and mass action kinetics: a 1 (x) := k 1 and a 2 (x) := k 2 x 1 (x 1 − 1). The reactions preserve the parity of the number of molecules; hence the even numbers C 0 and the odd numbers C 1 are closed communicating classes. Furthermore, because the network obeys detailed balance 23 , it is straightforward to obtain analytical expressions for the two ergodic distributions: where µ := k 1 /k 2 and the normalising constants are given by Z 0 := cosh(µ) and Z 1 := sinh(µ).
To illustrate the use of our tools in such a non-unique case, suppose we were not aware of the above facts and instead apply our computational procedure. First, we use a Foster-Lyapunov criterion and find that the rate matrix is regular; at least one stationary solution exists; all of the moments of each stationary solution are finite;  We then solve the LPs (88)-(89) to compute upper and lower bounds, u r and l r (Fig. 8(a)). Note that the lower bounds remain trapped at zero, indicating that the stationary solution is non-unique (as given by Theorem 28). Furthermore, we observe that the mass of the upper bounds approaches two as the truncation grows (i.e., u r (S r ) → 2 as r → ∞), hinting that there exist two ergodic distributions corresponding to two closed communicating classes (c.f. (110)).
To identify the communicating classes, we note that the optimal points π * ,r 0 and π * ,r 1 of the linear programs sup{π r (0) : π r ∈ P r }, sup{π r (1) : π r ∈ P r } (118) approach distributions with support on the even and odd numbers, respectively ( Fig. 8(b)). These observations in-dicate that the communicating classes are the even numbers C 0 and the odd numbers C 1 (Corollary 29).
To verify this claim, we compute two different sets of upper and lower bounds over C 0 and C 1 separately. When computed over the subsets C 0 and C 1 separately, the lower bounds are non-zero, thus attesting to the uniqueness of each stationary distribution over its communicating class. Using r = 25 7 , the total variation error of the lower bounds (94) is approximately 8 × 10 −3 , while that of the upper bounds (95) is bounded above by 8 × 10 −3 . In fact, the actual errors of the upper bounds computed from the analytical expressions (117) are < 10 −4 , i.e., substantially smaller than the guaranteed bound.

VI. DISCUSSION
We have introduced two mathematical programming approaches that yield bounds on: (i) the stationary moments, and (ii) the stationary distributions of biochemical reaction networks. These statistical quantities typically satisfy infinite sets of coupled equations: (i) the stationary moment equations and (ii) the stationary CME. Both our approaches consider a subset of these equations and employ: (i) semidefinite programming and (ii) linear programming to bound the set of solutions. The bounds we obtain provide converging estimates of moments and probabilities with quantifiable errors.
Regarding our first method, which provides bounds for stationary moments, recently, and independently of our work, SDP-based procedures have been proposed by several authors 33,[35][36][37] . Our work differs from those works in two ways: firstly, our results apply to networks with both polynomial and rational propensities, a wider class of networks of interest in biochemistry, beyond the mass action models considered in Refs. [35][36][37] ; secondly, we give mathematically precise conditions for the validity of the method (Assumption 1) and we explain how these conditions can be verified in practice. To the best of our knowledge, our second approach, the LP bounding and approximation procedure for probability distributions, has not appeared in the CME literature (see Ref. 38 for a discussion of related methodologies in the optimisation literature). Importantly, both methods are tightly interlinked: our second method uses SDP moment bounds to formulate the LPs, in order to obtain controlled approximations of stationary solutions and marginals.
For some CMEs, the SDP approach might need to include a large number of moments to obtain accurate estimates of lower order moments (see Figs. 2 and 7). Such large SDPs pose a computational challenge for larger networks, as the number of moments # d is n+d d , where n denotes the number of species in the network and d is the maximum moment order, and thus explodes combinatorially with the number of species. Similar costs are encountered when using moment-closure methods 21,22 . In contrast with moment closure methods [19][20][21][22] , however, the proposed SDP method to bound moments yields approx-imations with quantified errors. Furthermore, we show that repeated applications of our SDP method yield upper (resp. lower) bounds that are monotonically decreasing (resp. increasing) as the number of moment equations and inequalities is increased (Theorem 5). Although, as mentioned in Sec. III B, there are reaction networks for which the bounds do not converge to the exact moments, they often converge in practice (Fig. 2 and other examples in Refs. [35][36][37]44,60 ). In addition, when tight SDP bounds prove computationally too expensive, our LP approach can be used to tighten bounds on moments of interest employing a loose SDP bound on a higher moment (Fig. 7). Lastly, it is possible to obtain sharper SDP bounds for restricted state spaces [42][43][44]61 , but these refinements are beyond the scope of this paper.
As stated above, the LP approach produces convergent bounds on the stationary solutions (including their marginals or averages). To do so, it uses a moment bound obtained with the SDP method. It is worth remarking that, while we have limited ourselves to rational networks where the moment bound can be obtained using SDPs, the LP approach can be extended beyond rational propensities by using Foster-Lyapunov criteria 41 . If the CME has a unique solution, our LP method yields converging lower and upper bounds on this solution and easy-to-evaluate error bounds (Corollaries 20, 24 and 25). If the CME has multiple solutions, our method provides bounds over the set of possible solutions. Furthermore, the procedure can be adapted to infer the closed communicating classes and to compute converging approximations of (and bounds on) the corresponding ergodic distributions. Additionally, if we are unsure whether the stationary solution is unique, our method provides a uniqueness test (Corollary 28) that settles the question.
Although LP solvers are highly mature and scalable, the applicability of our LP approach can present computational challenges. Firstly, as discussed in Sec. IV B 3 (Step 5), the LPs can become ill-conditioned if the truncation is large, although this issue is mitigated by scaling the variables and by ongoing improvements in LP solvers. Secondly, although the computational cost of solving an LP depends on the algorithm, the cost per bound is at least O(|S r |), where |S r | is the size of the truncation. For the purpose of computing the entire distribution, we need O(|S r |) such bounds; hence the cost is at least O(|S r | 2 ). If computing a marginal distribution where k species remain, we need O(|S r | k/n ) bounds with an cost of at least O(|S r | 1+k/n ). If only a stationary average is of interest, the cost is at least O(|S r |), since only two bounds per average need to be computed. Note also that the truncation size typically grows combinatorially in the number of unbounded species, e.g., the number of states for a simplex truncation {x ∈ N n : x 1 + · · · + x n ≤ M } is n+M n , where M is an upper cut-off for the species count. Hence the cost of LPs suffers a combinatorial explosion in the number of species, as for all truncation-based methods [28][29][30][31][32]62 .
Several other truncation-based schemes have been pro-posed to approximate the stationary solutions of the CME [28][29][30]32 . In contrast with ours, those schemes typically assume that the CME has a unique stationary distribution, which has to be verified separately 63,64 . Perhaps most extensively studied is the truncationand-augmentation (TA) scheme, originally proposed by Seneta 65 for discrete-time chains. Its continuous-time counterpart 28,29,62 converges in total variation for exponentially ergodic chains, monotone chains, and certain generalisations 28 . However, bounds on the TA approximation error can be conservative and often involve constants that are difficult to compute in practice 28,29,66-72 . Spieler et al. 30,31 overcame this issue by iterating the TA scheme and applying a tail bound derived from a Foster-Lyapunov criterion to bound the stationary distribution. Spieler's truncation-based scheme is thus closest to ours. However, their scheme entails solving only systems of linear equations which, although cheaper to compute and simpler to implement than LPs, offer no guarantee of convergence and are only applicable in the unique case.
Another distinct feature of our method is that it enables the direct computation of bounds on the marginal distributions, without the need to compute bounds for each state of the joint distribution. Marginal distributions are of particular interest for the analysis of highdimensional networks and for inference of model parameters from single cell data. Since our approach yields upper and lower bounds on the marginals, it can be used to bound the likelihood or likelihood ratios from experimental observations. This would be useful to extend the work in Ref. 73 avoiding error redistribution using water filling methods, and aiding by selecting the size of truncations that are sufficient for parameter identifiability. Similar bounds on acceptance ratios could be used in Metropolis-Hastings algorithms to extend the applicability of our method to Bayesian inference. We therefore expect that our approach will be valuable not only for estimating distributions, but also for estimating model parameters from noisy single cell data where accurate approximations with quantified errors are needed.

VII. ACKNOWLEDGEMENTS
We thank Justine Dattani and Diego Oyarzún for stimulating discussions, and Michela Ottobre and Jure Vogrinc for insights on the stability and long-term behaviour of continuous-time chains. JK gratefully acknowledges support through a BBSRC PhD studentship (BB/F017510/1); PT through a Fellowship of the Royal Commission for the Exhibition of 1851; GBS through the EPSRC Fellowship EP/M002187/1; and MB through the EPSRC grant EP/N014529/1 funding the EPSRC Centre for Mathematics of Precision Healthcare. In practice, the chain X = (X(t)) t≥0 in Sec. II is often constructed by running the Gillespie Algorithm 74-76 , i.e., one starts the chain from a state x sampled from an initial distribution λ := (λ(x)) x∈S . If q(x) := −q(x, x) in (4)-(5) is zero, leave the chain at the x for all time. Otherwise, wait an exponentially distributed amount of time with mean 1/q(x), sample y = x from the probability distribution (q(x, y)/q(x)) y =x , and update the chain's state to y (we say that the chain jumps from x to y and we call the time at which it jumps the jump time). Repeat these steps starting from y instead of x. All random variables sampled must be independent of each other.
If T n denotes the n th jump time, then the limit is known as the explosion time of the chain, i.e., the first instant by which the chain has left every finite subset of the state space 77 (Sec. 2.3). If no such explosion occurs, then T ∞ = ∞, and we say that the chain is non-explosive: where P λ denotes the probability measure underlying the chain (the subscript λ emphasises the fact that the starting state was sampled from the distribution λ). If (A1) holds for every probability distribution λ (λ(x) ≥ 0, ∀x ∈ S, λ(S) = 1), then the rate matrix Q is regular.
We collectively refer to the probabilities (p t (x)) x∈S,t≥0 = (P λ ({X t = x, t < T ∞ })) x∈S,t≥0 of observing the process in the state x := (x 1 , . . . , x n ) ∈ S at time t ≥ 0 as the time-varying law of the chain. The time-veraying law is the minimal non-negative solution of the CME (6) (see Ref. 34 (Cor. 2.21)) A probability distribution π := {π(x)} x∈S on S is said to be a stationary (or steady-state or invariant) distribution of the chain if sampling the chain's starting position from π ensures that it will be distributed according to π for all time: Summing both sides of (A2) over x ∈ S and taking the limit t → ∞, we find that the chain is non-explosive when its starting location is sampled from a stationary distribution: P π ({T ∞ = ∞}) = 1. (A3) Taking the derivative in time of (A2), we find that stationary distributions are stationary solutions of the CME (6) (that is, it belongs to (9)). The reverse direction is slightly more complicated: Theorem 30 (Theorem 2.41 34 ). Let X be a continuoustime chain with rate matrix Q satisfying (4)- (5). A probability distribution π on S is a stationary distribution of X if and only if it is a stationary solution of the CME and the chain is non-explosive when initialised with law π (i.e., (A3) holds).
In particular, assuming that Q is regular, π is a stationary distribution if and only if it is a stationary distribution of the chain. In other words, (9) is an analytical (as in non-probabilistic) linear programming characterisation 34,78 of the set of stationary distributions for regular Q. The non-explosivity in Theorem 30 is crucial: a counterexample is the birth-death process (50) with a + (x) := 2 2x and a − (x) := 2 2x /2. In this case, the sum in (55) is finite showing that the CME has a unique stationary solution π given by (54)-(55). However, 79 (Theorem 11) shows that the process is explosive for any initial distribution (including π) and it follows from (A3) that no stationary distribution exists.
Stationary distributions are of interest because, if the chain is stable, then, regardless of the initial distribution λ, they determine 15 the chain's long term behaviour. i.e., the time-varying law of the chain converges in total variation to a stationary distribution, π: In general, the stationary distributions featuring in (A4) and (A5) depend on the initial distribution λ and on the starting location X 0 , respectively. If the chain starts in one of the closed communicating classes (defined in Sec. IV B 2), then it can never escape the class. The convergence of the empirical distribution in (A5) then implies that, in the stable case, there must exist at least one stationary distribution per closed communicating class C j and that the stationary distribution must have support contained in C j (π j (C j ) = 1). This distribution π j is unique and, if the initial distribution λ has its mass contained in C j , then 15 both the time-varying law and the empirical distribution converge to π j (in the sense that the π featuring in both (A4) and (A5) is π j ). For this reason, π j is known as an ergodic distribution of the chain. The definition of the set T := S\ ∪ j C j featuring in the decomposition (108) implies that the chain visits any given state x in T at most finitely many times (in particular 1 x (X t ) → 0 as t → ∞ P λ -almost surely). It follows that the chain's paths must eventually either leave T or diverge to infinity. In the case of a stable chain, tending to infinity is not an option and so the chain eventually enters one of the closed communicating classes. It then follows from (A2) that no stationary distribution π such that π(x) > 0 exists for a state x in T . Bringing this discussion together 15 , we have that stationary distribution π in (A5) is the ergodic distribution π j of the closed communicating class C j that the chain's path eventually enters, while that in (A4) is a weighted combination of the ergodic distributions where the weight given to π j is the probability that the chain ever enters C j . Theorem 27 in Sec. IV B 2 follows from these facts.

Appendix B: A Foster-Lyapunov criterion
In practice, verifying whether a chain is stable is done by applying a Foster-Lyapunov criterion. For our examples, we will use the following well-known criterion: Theorem 31 (Foster-Lyapunov criterion 15,80,81 ). If there exist constants K 1 ∈ R, K 2 > 0, and a norm-like function w (Definition 12) that satisfy Qw(x) := y∈S q(x, y)w(y) ≤ K 1 − K 2 w(x) ∀x ∈ S, then the following hold: (i) The rate matrix Q is regular.
(ii) There exists at least one stationary distribution.
(iii) For each closed communicating class, there exists an ergodic distribution with support in that class.
(v) The stationary distributions determine the longterm behaviour of the chain: a) for every deterministic starting distribution there exists a stationary distribution π satisfying (A4); b) for any starting distribution λ and for P λalmost every path, there exists a stationary distribution π satisfying (A5).
Proof. Part (i) is Ref. 80  a. Schlögl's model: In the case of Schlögl's model (10), fixing w(x) := x d−2 for an integer d > 2, we have where g d−1 is a polynomial of degree d − 1. Thus, and the supremum is finite. Taking d ≥ 3, Theorem 31 tells us that (10) has a regular rate matrix and at least one stationary distribution; that the moments x 1 π , . . . , x d−2 π are finite for any stationary distribution π; and that the limits (A4)-(A5) hold. Since we can choose ever larger d, we have finiteness of all moments. Uniqueness of the distribution follows from (54)-(55).
b. Toggle switch: In the case of the toggle switch chain of Sec. V A, setting w(x) := (x 1 + x 2 ) d , we have Qw(x) = (a 1 (x) + a 3 (x))(( where g d−1 is a polynomial of degree d − 1. For this reason, proceeding as we did above for Schlögl's model, we have that Q is regular, that the chain does have a stationary distribution, and that all of the moments are finite of each of the stationary distributions are finite. The non-trivial lower bounds in Fig. 4 and Corollary 28 show that it is unique. c. Bursty gene model: For the bursty gene expression model of Sec. V B, let w(x) := x d 2 and (p(k)) k∈N denotes the geometric distribution, p(k) = (1−p(0)) k p(0) with p(0) = 1/(1 + b )), and b l denotes its l th moment. We then have where g d−1 is a polynomial of degree d − 1 (note that all moments of a geometric random variable are finite). Because the state space is {0, 1} × N, w is norm-like. Thus, proceeding as we did above for Schlögl's model, we have that Q is regular, that there exists at least one stationary distribution, and that each stationary distribution has all moments finite. For each set of parameter values, we solved LP (90) with x = (0, 0) and obtained a nontrivial lower on π((0, 0)). Corollary 28 then showed that the stationary distribution is unique.
d. Dimerisation network with multiple stationary distributions: For the network (116), choosing w(x) := x d−1 with integer d > 1, we have where g d−1 is a polynomial of degree d − 1. Proceeding as for Schlögl's model above, we have that Q is regular, that there exists at least one stationary distribution, and that each stationary distribution has all moments finite.