2022-03-05

How to Tell Whether a Functional is Extremized

I happened to be thinking recently about how to tell when a functional is extremized. Examples in physics include minimizing the ground state energy of an electronic system expressed as an approximate density functional \( E[\rho] \) with respect to the electron density \( \rho \) or maximizing the relativistic proper time \( \tau \) of a classical particle with respect to a path through spacetime. Additionally, finding the points of stationary action that lead to the Euler-Lagrange equations of motion is often called "minimization of the action", but I can't recall ever having seen a proof that the action is truly minimized (as opposed to reaching a saddle point). This got me to think more about the conditions under which a functional is truly maximized or minimized as opposed to reaching a saddle point. Follow the jump to see more. I will frequently refer to concepts presented in a recent post (link here), including the relationships between functionals of vectors & functionals of functions. Additionally, for simplicity, all variables and functions will be real-valued.

Determining the Nature of a Stationary Point

For a function \( f \) of a single variable \( x \), the nature of a stationary point can be determined as follows. If \( x_{0} \) is a stationary point such that \( \frac{\mathrm{d}f}{\mathrm{d}x}\bigg|_{x_{0}} = 0 \), then the condition for it to correspond to a minimum (maximum) is that \( \frac{\mathrm{d}^{2} f}{\mathrm{d}x^{2}}\bigg|_{x_{0}} \) is positive (negative); if the second derivative vanishes, then it may be a saddle point, though this may not be the case if it turns out that there exists a nonvanishing higher-order even derivative. (An example is \( f(x) = x^{4} \), which clearly has a minimum at \( x_{0} = 0 \) but which has \( \frac{\mathrm{d}^{2} f}{\mathrm{d}x^{2}}\bigg|_{x_{0}} = 0 \), so one must use \( \frac{\mathrm{d}^{3} f}{\mathrm{d}x^{3}}\bigg|_{x_{0}} = 0 \) and \( \frac{\mathrm{d}^{4} f}{\mathrm{d}x^{4}}\bigg|_{x_{0}} > 0 \) to make the case.)

For a functional \( F \) of multiple variables, which may be collected as a vector \( \mathbf{v} \), the nature of a stationary point can be determined as follows. If \( \mathbf{w} \) is a stationary point such that \( \frac{\partial F}{\partial v_{i}}\bigg|_{\mathbf{w}} = 0 \) for every index \( i \), then the condition for it to correspond to a minimum (maximum) is that the [symmetric] Hessian matrix whose elements are given by \( \frac{\partial^{2} F}{\partial v_{i} \partial v_{j}}\bigg|_{\mathbf{w}} \) is positive definite (negative definite). If the matrix has some eigenvalues that are positive and others that are negative, then the stationary point is a saddle point. If the matrix has nonvanishing eigenvalues that are all the same sign but also has vanishing eigenvalues, then higher-order derivatives must be used to determine the nature of the stationary point, though writing this out explicitly is often much more cumbersome especially as the derivative of order \( n \) produces a "matrix" (perhaps more properly a tensor) with \( n \) indices.

This suggests that for a functional \( F \) of a function \( g \), the nature of a stationary point can be determined as follows. If \( f \) is a stationary point such that \( \frac{\delta F}{\delta g(x)}\bigg|_{f} = 0 \) for every continuous index \( x \), then the condition for it to correspond to a minimum (maximum) is that the [symmetric] Hessian operator whose elements are given in position space by \( \frac{\delta^{2} F}{\delta g(x) \delta g(x')}\bigg|_{f} \) is positive definite (negative definite). In particular, using the definition \( K(x, x') = \frac{\delta^{2} F}{\delta g(x) \delta g(x')}\bigg|_{f} \), it is possible to perform a spectral decomposition into eigenvalues \( \lambda(\theta) \) and eigenfunctions \( u(x, \theta) \) (assumed to be indexed continuously by a variable \theta as the functions are assumed to be supported over the entire real line) such that \( K(x, x') = \int \lambda(\theta) u(x, \theta)u(x', \theta)~\mathrm{d}\theta \). Thus, if \( \lambda(\theta) > 0 \) for all \( \theta \), then the functional is minimized, while if \( \lambda(\theta) < 0 \) for all \( \theta \), then the functional is maximized. If \( \lambda(\theta) \) changes sign, then the functional has reached a saddle point, while if \( \lambda(\theta) \) vanishes for some \( \theta \) but otherwise never changes sign, then higher-order derivatives must be considered, presenting the same difficulties as for functionals of vectors.

How Likely is a Functional to Have a True Maximum or Minimum?

As I thought about it more, I recognized that as the dimension of a vector space grows, it seems unlikely that a stationary point of a functional would be a true maximum or minimum, as the addition of new dimensions would open new opportunities for flipping the sign of a new eigenvalue of the Hessian matrix. In the case where the dimensions correspond to physical spatial dimensions, there would have to be clear physical constraints specific to the problem that would allow one to argue that the directions along which an eigenvalue of the Hessian matrix could be negative can be neglected. However, as the number of dimensions increases without bound (which is equivalent to considering a functional of functions instead of a functional of finite-dimensional vectors, as both countably & uncountably infinite sets of basis vectors can be used to represent infinite-dimensional vector spaces), it becomes less common for these dimensions to represent physical spatial dimensions and more common for these dimensions to represent other quantities, like frequencies, energies, momenta/wavevectors, or other things like that. (This statement is meant to imply only correlation, not causation in any direction of the statement.) In such cases, it may be easier to argue on general physical grounds for the neglect of negative eigenvalues of the Hessian matrix if they correspond to unphysically high energies, momenta, frequencies, or things like that. With such neglect, it may be easier to then argue that the stationary point of the functional is, to the degree that the aforementioned approximation is justified, a true maximum or minimum.

Second Derivative of the Nonrelativistic Classical Action for a Single Degree of Freedom

The nonrelativistic classical action for a single degree of freedom is typically given by \[ S[x; t] = \int_{0}^{t} L(t', x, \dot{x})~\mathrm{d}t' \] and its first derivative \[ \frac{\delta S}{\delta x(t)} = \frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial L}{\partial \dot{x}} \] yields the Euler-Lagrange equations of motion when set to vanish. Before proceeding, it is helpful to consider a few points of notation. Here, the functions \( f(x) \) are replaced by \( x(t) \), as the time coordinate \( t \) is the continuous index, and the trajectory \( x(t) \) is the relevant function.

To derive the second derivative, it may be better to define a new functional \( J[x; t] = \frac{\delta S}{\delta x(t)} \) that depends on the trajectory \( x \) and is parameterized by \( t \). This means \( \frac{\delta^{2} S}{\delta x(t) \delta x(t')} = \frac{\delta J[x; t]}{\partial x(t')} \). From above, it can be seen that \[ J[x; t] = \int_{-\infty}^{\infty} f(t, t'', x, \dot{x})~\mathrm{d}t'' \] where \[ f(t, t'', x, \dot{x}) = \left(\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial L}{\partial \dot{x}}\right)\bigg|_{x(t'')} \delta(t - t'') \] has been defined to look similar in an abstract sense to \( L \) itself. This is strongly suggestive that the same Euler-Lagrange equations can be applied to \( f(t, t'', x, \dot{x}) \) which means \[ \frac{\delta^{2} S[x; t]}{\delta x(t) \delta x(t')} = \frac{\partial f}{\partial x(t)}\delta(t - t') + \frac{\partial f}{\partial \dot{x}(t)} \frac{\mathrm{d}}{\mathrm{d}t} \delta(t - t') \] should hold.

In general, this may be quite nasty to expand. This is because total time derivatives appear in various places, and the existence of the Dirac delta function without an integral means that switching the position of the total time derivative operator becomes quite tricky.

Second Derivative of a Typical Newtonian Action for a Single Degree of Freedom

Things become much simpler in the case that \[ L(t, x, \dot{x}) = \frac{m\dot{x}^{2}}{2} - V(x) \] is used. In this case, the first derivative is \[ \frac{\delta S}{\delta x(t)} = -\frac{\partial V}{\partial x(t)} - m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}} x(t) \] which is the usual Newtonian equation of motion when the first derivative vanishes. The rules of functional differentiation yield \[ \frac{\delta^{2} S}{\delta x(t) \delta x(t')} = -\frac{\partial^{2} V}{\partial x(t)^{2}} \delta(t - t') - m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}} \delta(t - t') \] as the second derivative of the action.

The second derivative can be justified as follows, using the analogy to functionals of finite-dimensional vectors. If the function \( x(t) \) is replaced by the vector \( u_{i} \) with discrete indices \( i \) and the derivative operator \( \frac{\mathrm{d}}{\mathrm{d}t} \) is replaced by the antisymmetric matrix with elements \( D_{ij} \), then the first variation is \[ \frac{\partial S}{\partial u_{i}} = -\frac{\partial V}{\partial x}\bigg|_{u_{i}} - m\sum_{k,l} D_{ik} D_{kl} u_{l} \] in this discrete analogy. The second variation is therefore \[ \frac{\partial^{2} S}{\partial u_{i} \partial u_{j}} = -\frac{\partial^{2} V}{\partial x^{2}}\bigg|_{u_{i}}\delta_{ij} - m\sum_{k,l} D_{ik} D_{kl} \delta_{lj} \] which looks analogous to the above continuum expression.

It may be worth noting that the continuum expression looks sort of like a nonrelativistic single-particle quantum Hamiltonian acting on the wavefunction of a particle localized at a single point in space \( x_{0} \), which in position space looks like \( V(x)\delta(x - x_{0}) - \frac{\hbar^{2}}{2m} \frac{\partial^{2}}{\partial x^{2}} \delta(x - x_{0}) \). When going from the quantum case to the classical case, the role of position \( x \) is replaced by time \( t \), the role of the wavefunction is replaced by an identity operator \( \delta(t - t') \) in the time domain, the potential energy \( V(x) \) is replaced by its negative second derivative \( -\frac{\partial^{2} V}{\partial x^{2}} \) evaluated along the trajectory \( x(t) \), and the kinetic energy of the wavefunction in position space given by the operator \( -\frac{\hbar^{2}}{2m} \frac{\partial^{2}}{\partial x^{2}} \) is replaced by the acceleration term \( -m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}} \) (which ultimately comes from the same kinetic energy in the classical Lagrangian or Hamiltonian). However, I would be wary of trying to ascribe any deeper meaning to this analogy, though it could be fruitful in showing how quantum & classical intuitions can overlap.

Returning to the larger point, the second derivative of the action is essentially a linear operator \( -\frac{\partial^{2} V}{\partial x^{2}}\bigg|_{x(t)} - m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}} \) when evaluated for a given trajectory \( x(t) \). The negative second derivative operator with respect to time is positive-definite. However, the effective "spring constant" given the trajectory \( -\frac{\partial^{2} V}{\partial x^{2}}\bigg|_{x(t)} \) might not always be negative, and depending on the competition between it and the negative second derivative operator with respect to time, the overall operator might be indefinite, meaning that the corresponding trajectory is a saddle point. The following few sections will discuss specific solvable examples.

Particle Experiencing a Uniform Force

Given the coordinate \( x \) and the potential \[ V(x) = -F_{0} x \] the second derivative of the potential vanishes regardless of \( F_{0} \). This yields \[ \frac{\delta^{2} S}{\delta x(t) \delta x(t')} = -m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}} \delta(t - t') \] in which the second derivative is not only independent of the force \( F_{0} \) but also of the trajectory \( x(t) \). This means the operator is always positive-definite, so making the action stationary really does mean minimizing it. The operator is positive-definite by virtue of the fact that it can be represented as \[ \frac{\delta^{2} S}{\delta x(t) \delta x(t')} = \int_{-\infty}^{\infty} m\omega^{2} \exp(-\mathrm{i}\omega t) \exp(\mathrm{i}\omega t')~\frac{\mathrm{d}\omega}{2\pi} \] in the Fourier basis (though technically this violates the earlier promise of working with only real-valued vector spaces); the eigenvalues are \[ \lambda(\omega) = m\omega^{2} \] which are only zero for a single value of the index \( \omega \) (namely \( \omega = 0 \) in this case).

Particle Experiencing a Harmonic Force

The potential for a particle experiencing both a harmonic restoring force and an external harmonic drive can be written as \[ V(x) = \frac{k}{2} x^{2} - F_{0} x\cos(\omega_{\mathrm{D}} t) \] where \( F_{0} \) is the amplitude of the drive and \( \omega_{\mathrm{D}} \) is its frequency. The second derivative of the action becomes \[ \frac{\delta^{2} S}{\delta x(t) \delta x(t')} = \left(-k - m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\right) \delta(t - t') \] for this system; this is independent of the properties of the drive and of the trajectory \( x(t) \) and only depends on the spring constant of the restoring force. The definiteness of this operator can be seen by expanding it as \[ \frac{\delta^{2} S}{\delta x(t) \delta x(t')} = \int_{-\infty}^{\infty} (m\omega^{2} - k) \exp(-\mathrm{i}\omega t) \exp(\mathrm{i}\omega t')~\frac{\mathrm{d}\omega}{2\pi} \] in the Fourier basis (though technically this violates the earlier promise of working with only real-valued vector spaces). The eigenvalues are \[ \lambda(\omega) = m\omega^{2} - k \] which are strictly positive only for \( \omega > \sqrt{k/m} \) (meaning the frequency index must exceed the natural frequency from the restoring force, independent of the frequency of the driving force). This means that any physical trajectory for this particle corresponds to a saddle point of the action.

An Analogy to the Quantum Harmonic Oscillator

It might be interesting to try to reverse-engineer a potential such that the second derivative of the action gives an operator with the same structure as that of a quantum harmonic oscillator. In particular, this means constructing a potential such that \[ \frac{\delta^{2} S}{\delta x(t) \delta x(t')} = \left(k\omega_{0}^{2} t^{2} - m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\right) \delta(t - t') \] must hold (and this is independent of the trajectory). This can be obtained from \[ V(x) = -\frac{k\omega_{0}^{2}}{2} x^{2} t^{2} \] which is a negative quadratic potential (effectively a potential promoting exponential growth or decay instead of oscillation) whose magnitude increases quadratically with time. The equations of motion are given by \[ m\ddot{x} = k\omega_{0}^{2} xt^{2} \] and its solutions are given in terms of parabolic cylinder functions; interestingly, for different parabolic cylinder function index values, these parabolic cylinder functions are related to Hermite polynomials, which are involved in the position-space representation of the quantum harmonic oscillator Hamiltonian eigenvectors, but those parameter values are not relevant to this particular problem.

Particle in an Exponential Potential

It might be more instructive to consider a potential which yields analytically solvable equations of motion while also yielding an action whose second derivative depends nontrivially on the trajectory. An example of this could be the exponential potential \[ V(x) = V_{0} \exp(x/x_{0}) \] for some energy scale \( V_{0} \) and length scale \( x_{0} \); it is also helpful to define the force scale \( F_{0} \equiv \frac{V_{0}}{x_{0}} \) and the "spring constant" scale \( k_{0} \equiv \frac{V_{0}}{x_{0}^{2}} \) so that \( F = -F_{0} \exp(x/x_{0}) \) and \( k = k_{0} \exp(x/x_{0}) \). The equations of motion are \[ m\ddot{x} = -F_{0} \exp(x/x_{0}) \] and defining the acceleration scale \( a_{0} = \frac{F_{0}}{m} \) and separating variables with \( v \equiv \dot{x} \) yields \( v~\mathrm{d}v = -a_{0}\exp(x/x_{0})~\mathrm{d}x \). If the initial conditions are taken to be \( x(0) = 0 \) and \( \dot{x}(0) = 0 \) then integrating both sides yields \( \frac{\dot{x}^{2}}{2} = a_{0} x_{0} (1 - \exp(x/x_{0})) \) (which can also be derived from the statement that the total energy is \( V_{0} \)). This can be further integrated with the initial conditions to yield \( x(t) = x_{0}\ln\left(1 - \tanh^{2}\left(\frac{t}{\sqrt{2}t_{0}}\right)\right) \) (where \( t_{0} = \sqrt{\frac{x_{0}}{a_{0}}} \) is a time scale to make the units work correctly). This yields \[ \frac{\delta^{2} S}{\delta x(t) \delta x(t')}\bigg|_{x} = \left(-k_{0} \left(1 - \tanh^{2}\left(\frac{t}{\sqrt{2}t_{0}}\right)\right) - m\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\right)\delta(t - t') \] as the second derivative of the action evaluated for the physical trajectory. The equation for finding the eigenvalues in dimensionless form is therefore \[ -\frac{1}{2} \frac{\mathrm{d}^{2} \psi}{\mathrm{d}\tau^{2}} - \operatorname{sech}^{2} (\tau) \psi = \lambda\psi \] where \( \tau = t/(\sqrt{2} t_{0}) \), \( \lambda \) is the eigenvalue, and \( \psi \) is the eigenfunction. This actually corresponds to a solvable one-dimensional quantum mechanical problem; any \( \lambda > 0 \) is allowed, but there is also a single discrete bound state corresponding to \( \lambda = -\frac{1}{2} \). Thus, the action technically reaches a saddle point for the physical trajectory.