Finite-Dimensional Characterization of Optimal Control Laws Over an Infinite Horizon for Nonlinear Systems

Infinite-horizon optimal control problems for nonlinear systems are considered. Due to the nonlinear and intrinsically infinite-dimensional nature of the task, solving such optimal control problems is challenging. In this article, an exact finite-dimensional characterization of the optimal solution over the entire horizon is proposed. This is obtained via the (static) minimization of a suitably defined function of (projected) trajectories of the underlying Hamiltonian dynamics on a hypersphere of fixed radius. The result is achieved in the spirit of the so-called shooting methods by introducing, via simultaneous forward/backward propagation, an intermediate shooting point much closer to the origin, regardless of the actual initial state. A modified strategy allows one to determine an arbitrarily accurate approximate solution by means of standard gradient-descent algorithms over compact domains. Finally, to further increase the robustness of the control law, a receding-horizon architecture is envisioned by designing a sequence of shrinking hyperspheres. These aspects are illustrated by means of a benchmark numerical simulation.


Finite-Dimensional Characterization of Optimal
Control Laws Over an Infinite Horizon for Nonlinear Systems Mario Sassano , Senior Member, IEEE, and Thulasi Mylvaganam , Senior Member, IEEE Abstract-Infinite-horizon optimal control problems for nonlinear systems are considered.Due to the nonlinear and intrinsically infinite-dimensional nature of the task, solving such optimal control problems is challenging.In this article, an exact finite-dimensional characterization of the optimal solution over the entire horizon is proposed.This is obtained via the (static) minimization of a suitably defined function of (projected) trajectories of the underlying Hamiltonian dynamics on a hypersphere of fixed radius.The result is achieved in the spirit of the so-called shooting methods by introducing, via simultaneous forward/backward propagation, an intermediate shooting point much closer to the origin, regardless of the actual initial state.A modified strategy allows one to determine an arbitrarily accurate approximate solution by means of standard gradient-descent algorithms over compact domains.Finally, to further increase the robustness of the control law, a receding-horizon architecture is envisioned by designing a sequence of shrinking hyperspheres.These aspects are illustrated by means of a benchmark numerical simulation.Index Terms-Hamiltonian systems, nonlinear systems, optimal control, stability of nonlinear (NL) systems.

I. INTRODUCTION
I NFINITE-HORIZON optimal control problems for nonlin- ear systems are considered herein.The class of problems involves determining feedback stabilizing control policies with the property that a certain cost functional is minimized along the trajectories of the resulting closed-loop system (see, for instance, [1]).Such problems are, in general, challenging nonconvex, infinite-dimensional optimization problems.The optimal policy can be determined by relying either on Pontryagin's minimum principle (PMP) [2] or on the dynamic programming (DP) method [3], [4].A combination of the two methods may also be viable [5], [6], [7], [8].In practice, both approaches possess specific drawbacks.The former implies that the optimal trajectory satisfies a certain ordinary differential equation, defined on an extended state-space with the initial condition partially unknown.The latter, on the other hand, relies on the solution of a partial differential equation (PDE), namely the Hamilton-Jacobi-Bellman (HJB) equation, for which closedform solutions are rarely feasible to obtain and for which numerical solutions also pose a significant challenge due to the computational complexity associated with solving nonlinear PDEs numerically.Various approaches have been explored to overcome the computational hurdles associated with solving both finite and infinite-horizon optimal control problems, encompassing numerical methods [9], approaches based on the notion of viscosity solution [10], [11], or regional methods [12], to mention just a few.For the special case in which the dynamics are linear and the running cost is quadratic in the input and state, namely the well-known linear quadratic regulator (LQR), solutions are readily found via the algebraic Riccati equation (ARE).Consequently, methods based on state-dependent Riccati equations (SDREs) have emerged for nonlinear problems, where essentially an ARE is solved pointwise along the trajectory of the system.While the approach is relatively simple from a computational point of view, the resulting control strategies are, in general, suboptimal and characterized by a quality of approximation that is not easily quantifiable a priori (see, e.g., [13] and references therein).In [14] and [15], the solution of a single algebraic matrix equation is used to construct a dynamic control law, which satisfies, by construction, a partial differential inequality in some nonempty neighborhood including an equilibrium of the system.Consequently, the approach solves (exactly) a modified optimal control problem and yields an approximate (local) solution of the original optimal control problem, where the level of approximation can be explicitly quantified.Alternative approaches to obtain approximate solutions analytically can be found, for instance, in [16].Therein, the HJB PDE is considered and, recognizing the relevance of a certain associated Hamiltonian system (which possesses a stable invariant manifold characterizing the solution of the HJB PDE), two approaches for obtaining approximate solutions are proposed.One approach relies on viewing the control input as a Hamiltonian perturbation and uses geometric tools to approximate the stable manifold associated with the aforementioned Hamiltonian system.Methods to compute invariant manifolds of dynamical systems are also relevant in this regard (see, e.g., [17], [18]).The second approach relies on expressing the dynamics of a system in terms of a linear and a nonlinear component and presents a sequence that provides an approximation of the flow of the Hamiltonian system on the stable manifold.A different approach, still utilizing the relationship between optimal control problems and their associated Hamiltonian systems, is taken in [19].Therein, solutions for the LQR problem with input constraints are solved via a numerical procedure, which exploits the property that the optimal solution is associated with a specific trajectory of the associated Hamiltonian system.Approximate solutions to the constrained optimal control problems are then found via the generation of a "lookup table" created based on backward integration of the Hamiltonian dynamics from initial conditions in a neighborhood of the origin.Similarly to the approaches presented in [16] and [19], we utilize the Hamiltonian system associated with an optimal control problem to characterize exact and approximate solutions of the problem.Differently from the results in [16] and [19], however, the approach proposed herein ultimately requires only the solution of an (unconstrained) static optimization problem.It does not rely on approximating the stable invariant manifold of the Hamiltonian system and does not require any sequential process or lookup table.The Hamiltonian dynamics are in fact employed to design a certain finite-dimensional cost function that attains its minimal value at the intersecting point of the optimal trajectory and a hypersphere of fixed radius.
To be more specific, in this article, we consider a class of infinite-horizon optimal control problems characterized by the property that the underlying value function satisfies a certain differentiability condition.This condition is implied (locally) by equivalent properties of the data of the problem and, hence, is easily verifiable a priori (see [20]).Such differentiability conditions are instrumental in showing that a solution to the HJB PDE also satisfies the so-called Hamilton condition (sensitivity relations, see, e.g., [21]) associated with PMP.Namely, the optimal process is associated with a certain trajectory of the Hamiltonian system defined on the extended state/costate space, obtained via a Hamiltonian lifting (see, e.g., [16]).We utilize the above property to provide a finite-dimensional characterization of the solution of the problem via the formulation of a static minimization problem, defined in terms of trajectories of the underlying Hamiltonian system.More precisely, the solution of the optimal control problem is determined by first computing the intersection of the corresponding trajectory of the Hamiltonian system with a sufficiently small hypersphere around the origin.Consequently, the optimal process is determined by forward and backward propagating the flow of the Hamiltonian dynamics from this intermediate point, which is a minimizer of the aforementioned static cost function, involving the initial condition of the underlying plant.It is worth observing that, differently from existing methods that aim at providing a finite-dimensional characterization of the optimal policy, herein the results do not rely on any quantization arguments with respect to time or space.The proposed conditions instead provide an exact finite-dimensional description of the entire optimal trajectory, namely over the infinite horizon.In practice, the static minimization problem cannot, in general, be solved by using standard gradient-descent algorithms unless explicit expressions for the flow of the Hamiltonian dynamics and its sensitivity are available.Such expressions are rarely available except for specially structured classes of systems, such as for linear systems.This limitation is, therefore, circumvented through the introduction of a hybrid architecture that comprises state variables whose evolution captures the propagation of the flow of the Hamiltonian dynamics and of its sensitivity to the initial condition.Finally, a receding-horizon implementation of the architecture is suggested by suitably generating a sequence of hyperspheres of decreasing radii.Building on the property that such a strategy provides not only a suboptimal policy over a restricted time interval but ideally the entire (infinite-horizon) optimal policy, the novel algorithm allows for particularly large moving windows without hindering the accuracy of the overall scheme.This is an interesting difference with respect to existing receding-horizon strategies that are based on the computation of a (relatively small) portion of the optimal strategy at each sampling instant.
The remainder of the article is organized as follows.The infinite-horizon optimal control problem and some preliminaries are recalled in Section II.A finite-dimensional characterization of optimal control laws is provided in Section III.Two results are provided therein: One is the characterization of the stable (which yields the optimal control law) and the unstable invariant submanifolds of the underlying Hamiltonian system via a static minimization problem.Through this problem, the set containing the intersections between the invariant manifolds and a hypersphere of the fixed radius is determined.Having determined these intersection points, the optimal control input can be readily obtained by backward/forward integration of the flow of the Hamiltonian system.A companion result yields a characterization of trajectories "close" to the stable and unstable submanifolds via a modified (saturated) static minimization problem.The latter lends itself efficiently to an algorithmic interpretation, which is provided in Section IV.Note that while the stable and unstable manifolds of the Hamiltonian system are crucial in the characterization of the optimal control laws, the proposed approach does not require determining the submanifolds themselves.Namely, the relationship between the Hamiltonian system, its stable/unstable invariant manifolds, and the optimal control law is exploited to determine the specific trajectory of the Hamiltonian system associated with the optimal control input (for a given initial condition) of an infinite-horizon optimal control problem.Finally, the results presented in this article are demonstrated by means of a benchmark example, before some concluding remarks are provided in Sections V and VI, respectively.

II. PROBLEM FORMULATION AND PRELIMINARIES
Consider a dynamical system with state x : R → R n and control input u : R → R m .In this article, we consider the infinite-horizon optimal control problem recalled below.
Problem 1: Determine, if it exists, a continuous function u : for any x 0 ∈ X , where X is a nonempty neighborhood of the origin and where the positive definite function q : R n → R >0 , the vector field f : R n → R n , f (0) = 0, and the vector fields g i : R n → R n , i = 1, . .., m, columns of g, are assumed to be sufficiently smooth functions in X 1 .
• The first term of (1a) quantifies a running cost on the state variable while the second term represents a penalty on the control effort.The overall integral in (1a) is the cost functional to be minimized via the selection of the control input u, whereas the constraints (1b) represent the dynamics dictating the behavior of the system and the initial condition of the state of the system.Problem 1 is associated with a so-called value function V : X → R >0 , defined as V (x 0 ) min u {Q x 0 }, for any initial condition x 0 ∈ X (see, for instance, [21]).
Remark 1: It has been shown in [20] that V locally inherits certain properties from the problem data.In particular, provided the requirements of Problem 1 hold, there exists a neighborhood X of the origin such that V ∈ C 2 (X ).
Let A := ∇f | x=0 and B = g(0) describe the linearized dynamics, δ ẋ = Aδx + Bu, of (1b) around the origin and let q(x) = x Qx, with Q := ∇ 2 q(x)| x=0 represent the corresponding quadratic approximation of the running cost.The following assumption is instrumental for guaranteeing the existence of a solution to Problem 1, at least locally around the origin, as well as ensuring (local) asymptotic stability of the origin for the closed-loop optimal system.Assumption 1: The pairs (A, B) and (A, Q) are stabilizable and observable, respectively.
• Note that, as a consequence of Assumption 1, the linearized dynamics together with the quadratic approximation of the running cost admit the optimal solution u = −B P x, where P ∈ R n×n denotes the (unique) positive definite solution of the ARE 0 = A P + P A + Q − P BB P . ( It can be demonstrated by means of the DP approach (see, e.g., [23]) that the solution to Problem 1 is given by the feedback 1 The smoothness assumption on the data of the problem is a condition verifiable a priori, see, e.g., [20], [22].control law for any x ∈ X , with X a neighborhood of the origin.In particular, (4) is derived from the HJB PDE for any x ∈ X .Moreover, such function V coincides with the value function V , which possesses the properties mentioned in Remark 1.The DP approach provides sufficient conditions for optimality: a function V satisfying the HJB PDE (4) yields, via , a solution to the optimal control problem Q x 0 in (1) in terms of a feedback control.
Since DP provides the optimal solution in terms of a feedback policy, the DP approach yields (locally) information regarding the optimal solution and the minimal cost for any initial condition, hence slightly generalizing the requirements of Problem 1.However, to take advantage of these insights, a solution of the HJB PDE (4) or ( 5) is required.Typically, closed-form solutions of the HJB PDE are not available and numerical approaches to solve the PDE are cursed with high computational demands.Thus, alternative design approaches based on PMP and the Hamilton Conditions (see [6], [7], [21] for insightful discussions on the relation between PMP and DP) are often preferred in applications.The Hamilton conditions are trajectory-based and, therefore, computed only for specific initial states.Moreover, these requirements provide, in general, only necessary conditions of optimality.This alternative approach is summarized in the remainder of this section.
Let H : R n × R n → R denote the (minimized) Hamiltonian function associated with the optimal control problem (1), i.e., where for a certain X ⊆ R n , as implied by the data of the problem for x 0 ∈ X (see Remark 1) the optimal process x , i.e., the trajectory of (1b) in a closed loop with the optimal control, evolves according to the dynamics (Hamilton conditions) Let ϕ H (t; x 0 , λ 0 ) denote the flow of the Hamiltonian dynamics (7), which is assumed to be complete, at time t and from the initial condition (x 0 , λ 0 ), and let π x • ϕ H and π λ • ϕ H denote the projections of the flow on the x components and on the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
λ components, respectively, of the state/costate space.Since the conditions in Assumption 1 imply that the system (1b) is locally exponentially stabilizable and detectable from the (virtual) output y = q(x), it follows, by [24,Sec. 6] that the Hamiltonian dynamics (7) possesses a hyperbolic equilibrium point at (x, λ) = (0, 0) with n-dimensional global stable N s and unstable N u submanifolds through the origin that is invariant for the system (7).The interested reader is referred to the work in [25] for similar results in the case of nonlinear H ∞ control problems in which the arising Hamiltonian dynamics are structurally identical to those in (7).In addition, recalling that the stable and unstable submanifolds of a Hamiltonian system are Lagrangian (see, e.g., [25,Lemma 1]), one has that such submanifolds are locally described as the graph of closed one-forms, namely N s = graph(∇V s ) and N u = graph(∇V u ), respectively, for some generating functions V s : R n → R and V u : R n → R that constitute smooth solutions to the HJB (4).Finally, note that the stable (unstable, respectively) invariant submanifold is tangent at the origin to the n-dimensional subspace W s (W u , respectively) described by the eigenspace of the linearized Hamiltonian system associated with eigenvalues with negative (positive, respectively) real parts of the Hamiltonian matrix As a consequence, the origin of the state-space is a locally asymptotically stable equilibrium point of the (forward-time) closed-loop dynamics ẋ = f (x) − g(x)g(x) ∇V s , x(0) = x 0 and of the (backward-time) dynamics ẋ = −(f (x) − g(x)g(x) ∇V u ), x(0) = x 0 .Moreover, the optimal costate λ : R → R n in (7) satisfies λ (t) = ∇ V s (x (t)), for any t 0 (and, provided the problem satisfies the aforementioned conditions, under which the DP method provides necessary and sufficient conditions of optimality, V = V s ).

III. FINITE-DIMENSIONAL CHARACTERIZATION OF OPTIMAL CONTROL LAWS
To provide a concise statement of the following result-which yields a finite-dimensional, static characterization of the solution to Problem 1-consider a vector s = [s 1 s 2 . . .s 2n−1 ] ∈ R 2n−1 and a scalar positive constant ε ∈ R >0 , and define the vector-valued function α : R 2n−1 → R 2n according to The mapping α describes the Cartesian coordinates of points on the hypersphere of radius ε in R 2n , namely the set Casting the problem in terms of the angular coordinates s, rather than the original (x, λ) coordinates, is central to the construction of a static, unconstrained optimization problem characterizing the solution to Problem 1. Nonetheless, the specific selection of the parameterization in (10) may be replaced by alternative descriptions.
In the following, it is assumed that, whenever s consists of a set of (potentially infinitely many) values s = {s i } i∈I⊆R , the application of the function α(•) to the set s denotes in turn the set α(s) := {α(s i )} i∈I .Finally, let φ(x 0 , λ 0 ) define the locus of points along a specific trajectory of the Hamiltonian dynamics (7), namely φ(x 0 , λ 0 Recall that V s : R n → R and V u : R n → R locally describe the generating functions for the stable and unstable invariant submanifolds of (7), respectively.Consider the function The first term of (11) quantifies the distance between the initial condition x 0 and the trajectory of the Hamiltonian dynamics (7) starting from the point α(s) and propagated in backward or forward time (for τ 1 positive or negative, respectively).The second term, instead, quantifies the distance between the origin and the trajectory of the Hamiltonian dynamics (7) starting from the point α(s) and propagated in forward or backward time (for τ 2 positive or negative, respectively).The following result shows that the Hamiltonian system (7) has the property that the minimum of C x 0 can be obtained by two trajectories only, namely those associated with its stable and unstable submanifolds and passing through x 0 .Moreover, such trajectories can be parameterized by the intersection point α(s) ∈ S ε , together with the times τ 1 and τ 2 .As a consequence, the solution to Problem 1 can be equivalently characterized by the static, unconstrained minimization of (11) with respect to (s, τ 1 , τ 2 ).The above discussion suggests that the solution to Problem 1 can be constructed in two steps.
First, the minimizers of C x 0 (associated with the generating functions V s and V u ) are computed (see (13)).This requires detecting the intersections, described by the point α(s) for some s, of two sets: 1) the loci of points of the trajectories of ( 7) that both converge to the origin and have a projection on the state space that contains x 0 ; 2) the surface of S ε .These intersections are identified via the static minimization of C x 0 .
Second, once the intersection α(s ) corresponding to the stable submanifold N s has been obtained, the optimal control law is u (t) = −g(x(t)) λ(t), with (x(t), λ(t)) the trajectory of (7) corresponding to the initial condition x(0 By inspecting the latter step (i.e., a standard forward propagation of certain initial conditions), it is evident that the former (i.e., the characterization and the static minimization of C x 0 ) constitutes the crucial aspect of the construction of the optimal control law.Hence, in what follows attention is focused mainly on this first step.The above (intuitive) discussion is formalized in the following statements.
Theorem 1: Let x 0 ∈ X ⊆ R n be given and consider the infinite-horizon optimal control problem Q x 0 .Suppose that V s Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and V u are C2 (X ).Fix a sufficiently small ε > 0 and suppose that Assumption 1 holds.Then where 2 s ∈ [0, 2π) 2n−1 , is such that there exists τ 1 ∈ R with the property that with equal to either −∞ or +∞.
• Remark 2: Relying on the definition of the extended real line R, the intuition behind Theorem 1 can be captured by abusing the notation (i.e., letting a minimizer belong to R): the values s are those that belong to the set Proof: The claim is demonstrated by showing that the inclusions and hold simultaneously.Consider first the inclusion (14), which is verified provided the cost function C x 0 obtains its infimum value, relative to s in Ξ, at the intersection between the trajectory identified by φ(x 0 , ∇V s (x 0 )) (similarly for V u ) and the ball of radius ε.The initial value problem defined by the dynamics (7) and a generic initial condition (x 0 , λ 0 ) ∈ R n × R n admits a unique solution locally around the origin of the state-costate space, which is continuous with respect to time and such that the limit for t that tends to +∞ (−∞, respectively) is equal to zero for any initial condition in the stable N s (unstable N u , respectively) invariant submanifold of (7).Thus, it follows that the intersection between φ(x 0 , ∇V s (x 0 ) ) ( φ(x 0 , ∇V u (x 0 ) ), respectively) with S ε , for sufficiently small ε, contains a single point z s ∈ R 2n (z u ∈ R 2n , respectively) such that the infimum with respect to τ 2 in R of the second term of C x 0 in (11), i.e., the limit (13), can be zero.Moreover, by definition of φ(•, •), z s is such that also the first term of C x 0 is zero for some τ 1 ∈ R since z s ∈ φ(x 0 , ∇V s (x 0 )).Therefore, since C x 0 is clearly nonnegative and since z s satisfies z s = ε by definition of S ε , if follows that z s ∈ α(s ).An identical reasoning can be carried out for z u , hence showing (14).
To prove the converse inclusion (15) note that the function ϕ H (•; z 0 ) is continuous in R and ϕ H (±∞; z 0 ) = lim t→±∞ ϕ H (t; z 0 ).Thus, it must be shown that there does not exist any point z ∈ R 2n , with a norm equal to ε, other than z s and z u , at which C x 0 (z, τ 1 , τ 2 ) = 0 for some τ 1 ∈ R and τ 2 ∈ R. To this end, suppose initially that there exists such a point z that does not belong to N s ∪ N u .Then, clearly inf τ 2 ∈ R ϕ H (τ 2 ; z) 2 is strictly greater than zero, since the trajectory ensuing from z does not tend to the origin neither in forward nor in backward time.If instead z ∈ N s \ φ(x 0 , ∇V s (x 0 )) (an identical discussion can be carried out for the case of the unstable invariant submanifold 2 is strictly greater than zero by definition of the set φ(x 0 , ∇V s (x 0 )) and of the distance between such a set and the locus of points defined by ϕ H (t; z) for any t ∈ R. The proof is concluded by recalling that φ(x 0 , ∇V s (x 0 )) (and similarly for V u (x 0 )) contains a single point with norm equal to ε, namely z s (z u , equivalently).
The proposed results seemingly share common ingredients with strategies that aim at a trajectory-based characterization of certain invariant manifolds for nonlinear systems as well as with the class of the so-called shooting methods, which possess a long history in the literature.Therefore, it is worth stressing the particular features of the method discussed herein, which significantly distinguish them from the two above-mentioned frameworks.First, differently from the former, the objective here is not to approximate the entire manifold but rather a single trajectory with certain properties.Differently from the latter, instead of considering only the forward propagation from a prescribed boundary condition, the desired trajectory is computed by determining its intersection with a hypersphere centered at the origin.In fact, the key idea consists of the simultaneous forward and backward propagation of trajectories from points lying on the surface of the hypersphere.As a consequence one obtains a computationally amenable method that hinges upon the unconstrained minimization of a certain function (11).Furthermore, since the radius of the hypersphere can be selected arbitrarily small, differently from standard shooting methods over relatively long horizons, the knowledge of the linearized solution becomes a valid initial guess for the nonlinear iterations (see the more detailed discussion in Remark 5).Finally, it is worth mentioning that, as a side-effect of relying on a completely unconstrained minimization [namely without even restricting the sign of the time variables τ 1 and τ 2 in (11)], the minimum of the equivalent cost function is obtained also at a trajectory that belongs to the unstable invariant manifold.
Remark 3: Theorem 1 establishes that the (static) minimization of the function C : Ξ → R 0 , parameterized with respect to the initial state x 0 ∈ R n , characterizes the intersection points between the trajectories associated with the solutions V s (x 0 ) and V u (x 0 ) of the HJB PDE lying on the stable and unstable Lagrangian submanifolds of (7), respectively, with the hypersphere of radius ε in the state/costate space.
Remark 4: As noted in the proof of Theorem 1, for a given initial state x 0 there are (only) two initial conditions λ 0 , such that the resulting trajectories of the Hamiltonian system (7) converge to the origin in either forward or backward time.The set α(s ) contains the two intersection points between these trajectories and the ball of radius ε.Equation (12) entails that the knowledge of the intersection point corresponding to the trajectory belonging to the stable submanifold, i.e., z s , permits the computation of the optimal initial condition for the costate variable λ 0 = ∇V s (x 0 ) by integrating backward in time Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the Hamiltonian dynamics (7).The relation ϕ H (−τ 1 , α(s )) ⊆ graph ∇ x V s ∪ graph ∇ x V u suggests that by considering a certain number of suitably selected initial conditions x 0,i and by determining minimum points of C x 0,i it is possible to envision a strategy to reconstruct the stable and unstable invariant submanifolds, which would in turn yield the gradient of the value function V s without explicitly solving the HJB PDE (4).The objective of this article is, however, different: rather than determining the invariant manifolds, the objective is to determine (or approximate) the trajectory of the Hamiltonian system corresponding to the optimal process, and thereby construct the optimal control law, for a specific initial condition of (1b).
Remark 5: The intuition of forward propagating the trajectories of the underlying Hamiltonian dynamics to determine the solution to an optimal control problem has been explored in the literature before, especially in the case of finite-horizon problems, see, e.g., [26] and references therein.Due to the intrinsic instability of the Hamiltonian dynamics, the so-called shooting methods necessarily require an accurate initial guess of the costate initial condition λ(0) or a (short) bounded time interval even in the case of infinite-horizon problems.A novel feature of the method proposed herein is that the optimal state/costate trajectory is not approximated by considering only the forward propagation from certain initial conditions (as is common in the literature), but by characterizing the intersection of such a trajectory with a hypersphere, namely by forward and backward propagating the trajectory from an intermediate point that belongs to the surface of the hypersphere.Provided the radius of the hypersphere is selected sufficiently small, the intersection to be determined is (much) closer to the origin than x 0 .Thus, a valid initial guess for the intermediate point on the boundary of the hypersphere may, indeed, be suggested by the linearized solution, i.e., by letting s(0) ∈ R 2n−1 be such that α(s(0)) := ε x 0 P x 0 Such a strategy is employed and illustrated in the numerical simulations of Section V.

Remark 6:
The main objective of this article-namely to provide a finite-dimensional characterization of the solution in infinite-horizon optimal control problems-is similar in spirit to that of the well-known model predictive control (MPC) strategies, see, e.g., [27] and references therein.However, a few relevant differences should be highlighted.In MPC the finite-dimensional characterization is obtained by considering a time quantization over a finite, typically short, time interval around the current value of the state: This allows one to pose a static optimization task with respect to the (finite) samples of the underlying control law.However, in continuous-time nonlinear system-since portions of trajectories that are optimal over a finite horizon are not, in general, restrictions of the optimal control law over the entire horizon-such a strategy inevitably introduces a residual approximation error even in the nominal case of perfect knowledge of the plant and absence of noise or disturbances.Theorem 1 instead provides a finite-dimensional exact characterization of the optimal control law over the entire Fig. 1.Graphical illustration of the statement of Theorem 1.The solid black circles constitute the set α(s ).The solid black line depicts the forward/backward flow of the optimal set along the Hamiltonian dynamics, which is such that C x 0 (s , τ 1 , τ 2 ) = 0. Any other trajectory of the Hamiltonian dynamics that intersects the set S ε (depicted by the gray lines) is such that either the first or the second term of C x 0 (•, τ 1 , τ 2 ) is strictly positive for any τ 1 ∈ R and τ 2 ∈ R. infinite horizon.Therefore, even in a receding-horizon implementation of the constructions proposed here, potentially in the presence of disturbances, the proposed strategy allows one to employ moving windows of length significantly larger than alternative receding horizon strategies in general.This feature is illustrated via a comparative study in Section V.
The intuition behind Theorem 1 is illustrated for the scalar case, namely with n = 1, in Fig. 1, where the black circles constitute the set α(s ), namely the intersections of φ(x 0 , ∇V s (x 0 )) and φ(x 0 , ∇V u (x 0 )), which in the scalar case coincide with N s and N u , respectively, with the set S ε .The solid black line represents the forward/backward flow of the optimal set along the Hamiltonian dynamics, which is such that C x 0 (s , τ 1 , τ 2 ) = 0.As shown in the proof of Theorem 1, any other trajectory of the Hamiltonian dynamics that intersects the set S ε (represented by the gray lines) is such that either the first (for α(s 1 )) or the second (for α(s 2 )) term of C x 0 (•, τ 1 , τ 2 ) is strictly positive for any τ 1 ∈ R and τ 2 ∈ R.
Remark 7: The discussion in the proof of Theorem 1 suggests two further implications on the values τ 1 and τ 2 .First, it is evident that the infimum of C x 0 is obtained by considering the limit of τ 2 ∈ R to ±∞, as implied by (13).Moreover, one has that τ 1 τ 2 > 0, i.e., the infimum is obtained by simultaneous forward/backward evaluation of Hamiltonian trajectories.
Remark 7 motivates the result below, which allows one to approximate the solution of Q x 0 with an arbitrary degree of accuracy.Let V and W be two nonempty sets and define the Hausdorff distance between V and W, denoted d H (V, W), as i.e., defining the largest among all the distances between the points in one set and the closest point in the other set.
Theorem 2: Let x 0 ∈ X ⊆ R n be given and consider the infinite-horizon optimal control problem Q x 0 .Suppose that V s Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and V u are C 2 (X ).Fix ε > 0 sufficiently small, and suppose that Assumption 1 holds.Define where ε ∈ R >0 and ε < ε.Then, for any μ > 0 there exist where 3 (ŝ, τ1 , τ2 ) := arg min (s,τ 1 ,τ 2 )∈Θ C ε x 0 (s, τ 1 , τ 2 ) and Θ := • Proof: First, note that, by continuity of the function , pointwise for any (s, τ 1 , τ 2 ), where the latter function is precisely the one defined in (11).Then, let C x 0 ,θ : Θ → R >0 be the restriction of the function C x 0 in (11) to the set Θ. Given μ > 0, select θ > 0 and μ θ < μ such that, letting where s θ , τ θ 1 , τ θ 2 := arg min (s,τ 1 ,τ 2 )∈Θ C x 0 ,θ (s, τ 1 , τ 2 ), which exist by continuity of the function C x 0 with respect to its arguments.Therefore, consider a subsequence of indices {ε ,i } i∈N such that lim i→∞ ε ,i = 0 + and define the corresponding sequence of the functions {C x 0 } i∈N with the underlying arguments restricted to the compact domain Θ.By the discussion above, the sequence {C x 0 is equi-continuous, being a continuous function.Hence, [28,Th. 7.10] implies also epi-convergence 4 of {C ε ,i x 0 } i∈N to C x 0 ,θ .Furthermore, one has that the restrictions of {C ε ,i x 0 } i∈N to Θ are level-bounded, following from compactness of the corresponding supports, proper and lower semicontinuous for any i ∈ N. Therefore, by relying on [28,Th. 7.33], it follows that and the claim is shown by the definition of the Hausdorff distance and by the inequality (20).Remark 8: Similarly to the interpretation in Remark 3 following the statement of Theorem 1, the rationale behind the formal claim of Theorem 2 can be further explained as follows.The arguments that attain the minimum of function C ε x 0 characterize (via the intersection points represented in spherical coordinates ŝ) the set of all the trajectories of the Hamiltonian dynamics (7) that enter (in finite time) or are tangent to the set S ε := {(x, λ) ∈ R n × R n : (x, λ) = ε } and whose projection on the state component intersects x 0 .The fundamental difference with respect to the characterization of the trajectories as in the statement of Theorem 1 lies in the fact that the minimum of C ε x 0 is achieved also at some |τ i | < ∞, namely for certain (ŝ, τ1 , τ2 ) belonging to the compact set Θ.These observations are illustrated for the scalar case in Fig. 2 .7) that satisfy the initial condition x(0) = x 0 and that (in finite time) enter (depicted by the gray, dashed line) or are tangent (depicted by the gray, solid lines) to the set S ε .

IV. ALGORITHMS FOR MINIMIZING THE HAMILTONIAN TRAJECTORY-BASED COST
The main objective of this section lies in presenting an algorithm that achieves the minimization of the trajectory-based (finite-dimensional) cost functions defined in (11) and especially (18).After providing a general statement, outlining the design of such an algorithm, the discussion is focused on showing how to circumvent specific implementation issues arising by employing standard minimization techniques.In the following statement, we introduce a continuous-time dynamical system, whose (asymptotically stable) equilibrium corresponds to a minimizer of the static cost function (18).This dynamical system constitutes a central component of the algorithm presented in the following sections.To provide a concise statement of the result, consider the partial derivatives of the function (18) with respect to its arguments, provided in (22), shown at the bottom of the next page, and let B denote a ball of radius one Proposition 1: Let x 0 ∈ X ⊆ R n be given.Fix ε > 0 and ε > 0 sufficiently small and any θ 1 , θ 2 ∈ R >0 .Suppose in addition that Assumption 1 holds.Let γ > 0 and consider where for t 0 and η(0) ∈ arg min (s,τ 1 ,τ 2 )∈Θ C ε x 0 (s, τ 1 , τ 2 ) + μB.• Proof: First, note that the function C ε x 0 restricted to Θ is lower semicontinuous, since it is, in fact, continuous, proper, and level-bounded.It follows by [28,Th. 1.9] that the set (ŝ, τ1 , τ2 ) := arg min (s,τ 1 ,τ 2 )∈Θ C ε x 0 is nonempty and compact; hence, the set α(ŝ) is compact.Moreover, by definition of the set (ŝ, τ1 , τ2 ) (namely, the points in Θ at which C ε x 0 (s, τ 1 , τ 2 ) = 0) and of (local) minimum, it follows that there exists a strictly Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
, is well-defined in the same neighborhood.The remainder of the proof then follows directly from the result of Theorem 2 and standard Lyapunov stability arguments.To this end, note first that the function C ε x 0 is (locally) positive-definite with respect to the set (ŝ, τ1 , τ2 ) and for any η ∈ (ŝ, τ1 , τ2 ) + μ 2 B, for some μ 2 ∈ R >0 ; hence, it can be employed as a candidate for a Lyapunov function.Moreover, (locally) the dynamics (23) are such that Ċε The latter implies that with μ := min{μ 1 , μ 2 }, by continuity of α(•), ξ(t) converges exponentially to ŝ.
Remark 9: The intuition behind the statement of Proposition 1 suggests that each connected component of the set α(ŝ) is locally exponentially stable for the dynamics (23).It is worth observing that, for a fixed value of the parameter ε > 0 while clearly α(s ) ⊂ α(ŝ), there may be instead connected components of α(ŝ) that possess empty intersection with α(s ) (see Fig. 2 compared with the exact case depicted in Fig. 1 for a graphical intuition in the case of scalar systems).Furthermore, note that, by combining the claims of Proposition 1 with those of Theorem 2, one has that the trajectories of system (23) recover in practice the actual solution s provided ε is selected arbitrarily small and θ i arbitrarily large.
Remark 10: The result of Proposition 1 entails that a single static minimization allows one to compute an essentially open-loop approximation of the entire optimal control policy.Note that the strategy ensures that the resulting trajectory enters the hypersphere S ε in finite time.Within the hypersphere, one could, for instance, implement the solution of the linearized problem, which will induce a residual error of the order of O(ε 2 ) on the feedback policy and of the order of O(ε 3 ) on the cost of the optimal solution, namely on the value function.Thus, the resulting strategy (for the entire time horizon) can be made arbitrarily accurate through the selection of ε .Toward the end of this section, we also provide an algorithm (see Algorithm 1), in which the static minimization problem is solved iteratively, yielding a receding-horizon architecture.The receding-horizon architecture has the additional benefit that it can be implemented online and yields an overall strategy with favorable robustness properties.
By inspecting (22), the dynamics ( 23) cannot be easily implemented since the underlying vector field depends on the knowledge of the flow ϕ H (t; •) and the sensitivity, namely the Jacobian matrix of the flow with respect to the initial condition, whose closed-form expressions are typically not available for nonlinear Hamiltonian dynamics (7).This computational issue can be circumvented via a hybrid implementation.In the following statement, standard notation is used to represent the hybrid time domain: (t; k), with t ∈ R ≥0 and k ∈ N, is used to denote the continuous time parameter t along with the index k representing the discontinuous jumps (for reasons of space the interested reader is referred to, e.g., [29] for further details on hybrid systems).
Proposition 2: Suppose that the assumptions of Proposition 1 hold and consider a hybrid system with state jump dynamics with and as specified in ( 29), shown at the bottom of the next page.Then there exists τ M > 0 with the property that the trajectories of ( 25), (26) are such that Proof: To begin with, let t i denote the continuous time at which the ith jump occurs.Note that the flow dynamics (25f) is such that τ increases at the same rate as the underlying continuous-time component of the hybrid time domain until it reaches τ ≥ τ M , at which point a jump occurs and its value is "reset" to zero, namely it behaves as a timer variable.Thus, regardless of the initial condition of the state of the hybrid system, the hybrid arcs exhibit a nonempty continuous time interval with the property that t k+1 − t k = τ M , for any k > 0, followed by (periodic) jumps.Note also that η remains constant during flows.Dynamics (25b) and (26b) are such that for any ρ ≤ τ M and any k > 0. Moreover Similarly, (25c) and (26c) are such that χ f (t k+1 , k) = ϕ H (σ 2 (t k , k), α(ξ(t k , k)), for any k > 0. Furthermore, by comparing the structure of ( 22) and of (29), it is clear that the rationale behind the dynamics (25d) and (25e) consists precisely in yielding the Jacobian matrix of the (unknown) flow of the Hamiltonian dynamics (7) with respect to the initial condition.To show this claim, let Ψ : R → R 2n×2n denote the sensitivity of the flow ϕ H (t, z 0 ) with respect to z 0 , namely Ψ(t) = ∇ z 0 ϕ H (t, z 0 ), where z 0 ∈ R 2n denotes the initial condition of the flow.Recalling that, by definition, the flow satisfies the fixed-point condition it follows that the sensitivity satisfies the relation Thus, it follows from (30) (noting also that σ 1 and σ 2 are constant during flows, namely for any t ∈ (t k , t k+1 ]) that the dynamics (25d) and (26d) are such that which is precisely the sensitivity [as given in (32)] of the flow ϕ H (−σ 1 , α(η)).Similarly, it can be shown that which is the sensitivity of the flow ϕ H (σ 2 , α(η)).As a consequence of the preceding discussion, one has that h(η, χ, Ψ) (t k ,k) = C x 0 (η) [with C x 0 as defined in (18)] and [as defined in (22)].That is, the dynamics ( 25) and ( 26) are such that Namely, the sequence η(t k+1 , k + 1), for k ∈ Z >0 , realizes the discretization, via Euler's method, of the continuous dynamics (23).Recalling the main result of Proposition 1, namely that the set α(ŝ) is locally exponentially stable for the continuous dynamics (23), it follows that for sufficiently small τ M , the set α(ŝ) is locally exponentially stable for the discretized dynamics (33).
Remark 11: The intuition behind the formal statement of Proposition 2 can be summarized as follows.The state τ of the hybrid system ( 25)-( 26) essentially represents a "sampling time" and the system is such that at jumps χ and Ψ are "initialized" in a manner that ensures that, via the flow dynamics, χ b and χ f determine the backward and forward flows of the Hamiltonian system, whereas Ψ b and Ψ f determine the sensitivities of the backward and forward flows, respectively.Namely, it is possible to periodically determine the flows and their sensitivities, for which analytical expressions are, in general, not available, required to integrate the continuous dynamics (23) of Proposition 1.At jumps, this knowledge is then used to implement a discretized version of (23).
The previous discussions and formal statements are finally summarized in the following algorithm, which essentially outlines a receding-horizon implementation of the results of Propositions 1 and 2, in which the hyperspheres defined by ε and ε are allowed to shrink.
A few comments about the practical implementation of the algorithm above are necessary before considering a benchmark numerical simulation.To begin with, it is worth observing that the iterations may be terminated whenever x c (which defines, via ξ c in step (2), the intersection of the trajectory with the outer hypersphere of radius ε) is sufficiently small.In addition, in order to obtain that τ κ = τ κ−1 + τ rh for all κ ∈ N-namely such that Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.25), (26), with ξ c such that α(ξ c ) = ε[x c , x c P ] / [x c , x c P ] as in (16) with P denoting the positive definite solution of the ARE (2).(3) Integrate the hybrid system (25), (26) in [0, ντ M ] and let σi = σ i (ντ M , ν), τ κ = τ κ−1 + min{τ rh , σ1 + σ2 } and

Algorithm 1: Receding
Implement the control input u • in the system (1b) for τ κ − τ κ−1 seconds, i.e., compute the time instants at which the control law is updated coincide with the periodic pattern induced by the desired value τ rh -one should select ε sufficiently small and ν sufficiently large.This aspect is further discussed also in the numerical simulations below.

V. BENCHMARK EXAMPLE
To illustrate and validate the theory discussed in the previous sections, the following benchmark example for infinite-horizon optimal control problems in the presence of nonlinear dynamics is considered.In particular, the problem is borrowed from [30] and [31], where a detailed comparative analysis is carried out among several alternative techniques.Toward this end, consider the nonlinear system described by the equations ẋ1 = x 2 , ẋ2 = x 3  1 + u (37) with x(t) ∈ R 2 and u(t) ∈ R, which exhibits a cubic nonlinearity in the state variables, together with a cost function of the form (1a), with q(x) = x x.As implied by [20] (see also Remark 1), the problem admits a locally C 2 value function.The graphs in Figs. 3 and 4 depict the outcome corresponding to the strategy obtained by implementing Algorithm 1 to solve the infinite-horizon optimal control problem defined by (37) and (1a) in the nominal setting, i.e., in the absence of any noise or disturbances.The results are obtained by selecting the configurable parameters of Algorithm 1 as τ M = 0.01 s, τ rh = 1 s, ν = 6 × 10 5 , c 1 = 0.5, c 2 = 0.1, τ 10 = τ 20 = 1, and γ = 0.1.The dashed vertical lines in Fig. 3, which are uniformly distributed over time, show that the selection of c 2 , hence in turn of ε at each iteration, induces the property that τ κ − τ κ−1 = τ rh = 1 s for all κ.Moreover, it is worth observing that-by relying on the discussion of Remark 6, which highlights that at each execution of the steps (2)-(4) of Algorithm 1 the entire (infinite-horizon) optimal solution is characterized provided ε is sufficiently small and ν is large-Algorithm 1 is capable of determining a solution with a cost lower than all the alternative techniques considered in the comparison provided in [31].More precisely, J(u • ) = 1.4828, even with a relatively long moving window, namely with one second between two consecutive updates of the approximate policy.The proposed strategy (and its performance) is explicitly compared with control laws designed on the basis of linear and nonlinear MPC strategies (see, e.g., [32]).In particular, within the framework of a linear (parameter-varying) MPC architecture, the dynamics (37) are linearized around the current value of the state-which is assumed constant during the prediction phase-and subsequently discretized via Euler's method, with respect to the sampling time δT ∈ R >0 .To obtain a finite-dimensional characterization of the optimal control task, as it is common in this setting, it is assumed that the control action is piecewise constant during the evolution of the prediction model: This choice allows one to pose the underlying optimal control problem restricted to the moving window of length δT N u , with N u ∈ N denoting the prediction steps, as a static quadratic optimization task.Fig. 5 shows the time-histories of the state variable x 1 (t) of the system (37) for several selections of the parameters δT and N u .In particular, the cost of the choice (δT, N u ) = (0.2, 5) (solid black line) is equal to 1.8389.It is worth observing that for the above selection of parameters, the optimization is performed over a receding window of 1 s, whereas the control action is computed every 0.2 s, i.e., at a significantly higher rate than the 1 s for the control design method we propose.Furthermore, (δT, N u ) = (0.2, 10) (solid dark gray line) induces a cost of 2.9949, (δT, N u ) = (0.5, 4) (solid light gray line) induces a cost of 2.1528 and, finally, (δT, N u ) = (1, 2) (dash-dotted gray line) results in an unbounded trajectory of the closed-loop system.A similar comparison is performed also with respect to a nonlinear MPC scheme, with δT = 0.2 and N u = 5, in which the optimization problem at each step is solved via the fmincon command in Matlab.A quadratic virtual weight on the terminal state of the moving window, described by x(t + N u |t) Sx(t + N u |t) with S = 10 2 I, is required to induce a bounded evolution to the resulting closed-loop system.The corresponding trajectory is described by the dashed gray line in Fig. 5 with associated cost equal to 1.8922.The objective of the second numerical simulation is instead to assess the behavior of Algorithm 1 in the presence of an exogenous disturbance signal affecting the dynamics (37).In particular, the disturbance is unmatched with the control input since it affects (linearly) the dynamics of the first state, namely ẋ1 = x 2 + w, with w(t) defined as w(t) = 0.5 for t ∈ [0.4,0.8], w(t) = −0.5

VI. CONCLUSION
Infinite-horizon optimal control problems for nonlinear systems have been studied.Their intrinsically infinite-dimensional nature renders this class of control problems particularly challenging to solve.The contribution of this article has been twofold.On one hand, we have provided a finite-dimensional characterization of the solution to such problems in terms of the set in which a certain function-which involves the trajectories of the associated Hamiltonian system and their sensitivity with respect to the initial condition-attains its minimum value.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Then, we have shown that a suitably adapted hybrid implementation of a standard gradient-descent algorithm permits the minimization of such functions, without requiring explicit knowledge of the flow of the Hamiltonian system, which is seldom available in practice.This result is well suited for an algorithmic interpretation, which can be implemented in a receding horizon fashion.The efficacy of the resulting algorithm is demonstrated by means of a benchmark infinite-horizon nonlinear optimal control problem.

Fig. 2 .
Fig. 2. Graphical illustration of the statement of Theorem 2. The solid segments on the set S ε indicate the arguments, which attain the minimum of C ε x 0 .These constitute intermediate points of all trajectories of the Hamiltonian system (7) that satisfy the initial condition x(0) = x 0 and that (in finite time) enter (depicted by the gray, dashed line) or are tangent (depicted by the gray, solid lines) to the set S ε .

Fig. 5 .
Fig. 5. Time histories of the state x 1 (t) of the system (37) with x(0) = (1, 0), in closed-loop with MPC controllers different values of the configuration parameters δT (sampling time) and N u (length of the receding horizon).