## 2013-12-23

### Gibbs Entropy and Two-Level Systems

Today, I was browsing through the MIT news page when I saw this article about how two mathematicians claim to have disproved the notion of negative temperature. My heart sank, because one of the coolest things I remembered learning in 8.044 — Statistical Physics I was the notion of negative temperature existing, being hotter than hot, and being experimentally realizable. I also became confused when the article referred to Gibbs entropy, because the definition I thought was being used for Gibbs entropy was $S = -\sum_j p_j \ln(p_j)$ which is exactly equivalent to the Boltzmann entropy $S = \ln(\Omega)$ where $p_j = \frac{1}{\Omega}$ in the microcanonical ensemble. I figured this would mean that the Gibbs entropy would exactly reproduce negative temperature results in systems with bounded energies such as two-level systems. I wasn't able to read the most recent paper as discussed in the news article, because it is behind a paywall, but I was able to read this article by the same authors, which appears to lay the foundational ideas behind the most recent paper. It seems like on my end, the misconception appears to hinge on what one would call the Gibbs entropy. The formula $S = \ln(\Phi)$ appears to be the correct one for the Gibbs entropy, where $\Phi$ is the total number of states with energy not greater than $E$ and $\Omega = \frac{d\Phi}{dE}$ is the number of states with energy exactly equal to $E$ quantum mechanically (or the number of states with energy within a sufficiently small neighborhood of $E$ in the classical limit). With this in mind, follow the jump to see how this might work for a two-level system and explore the other implications of this new definition of statistical entropy. (UPDATE: Note that in all of this, $k_B = 1$.)

A two-level system is defined as follows: consider a system of $N$ indistinguishable particles fixed on a grid that can have energies of 0 ("off") or $\epsilon$ ("on"). Consider that $N$ is fixed and that the total energy $E$ may be directly varied by the experimenter. This means that $\Omega = \binom{N}{\frac{E}{\epsilon}}$ is the number of states with energy exactly equal to $E$, because the number of particles in the "on" state is $\frac{E}{\epsilon}$, and the rest is combinatorics. The only other restriction is that $0 \leq E \leq N\epsilon$. Given this, it is easy to see that $\Omega$ plotted against $E$ takes the value 1 at $E = 0$ and at $E = N\epsilon$, and reaches a very tall maximum with a very narrow peak region around $E = \frac{N\epsilon}{2}$.

Traditionally, with the Boltzmann entropy, we would say $S = \ln(\Omega)$ which looks sort of like a parabola with zeros at $E = 0$ and $E = N\epsilon$ and a maximum at $E = \frac{N\epsilon}{2}$, so $\frac{1}{T} = \frac{\partial S}{\partial E}$ is positive for $E < \frac{N\epsilon}{2}$, zero at $E = \frac{N\epsilon}{2}$, and is negative for $E > \frac{N\epsilon}{2}$, and $T$ displays the same behavior in sign but of course diverges with both signs when $\frac{1}{T}$ crosses 0. Essentially, the standard interpretation of this is that temperature is a measure of what the probability distribution of states looks like as a function of energy, so positive temperature says that lower energies are more likely, while negative temperature says that higher energies are more likely (i.e. population inversion). This also means that a negative temperature system gains entropy by losing energy to its surroundings, so a negative temperature system is infinitely hotter than a hot positive temperature system (because it will spontaneously give off heat to any positive temperature system).

Now let us try the same system with the Gibbs entropy. The state space volume $\Phi = \sum_{k = 0}^{\frac{E}{\epsilon}} \binom{N}{k}$ which in the continuum limit for large $N$ and/or small $\epsilon$ is $\Phi = \int_{0}^{\frac{E}{\epsilon}} \binom{N}{x} dx$ gives the total number of states where the number of particles that are "on" is not greater than (but not necessarily exactly equal to) $\frac{E}{\epsilon}$. If $\Omega$ looks sort of like a Gaussian, then $\Phi$ looks sort of like the integral of a Gaussian which is the 'S'-shaped error function. For a shifted mean in the Gaussian, this function starts from something close to 0 at $E = 0$, crosses 0.5 at $E = \frac{N\epsilon}{2}$, and would come close to 1 at $E = N\epsilon$. In any case, it is monotonically increasing in $E$. This means that $S = \ln(\Phi)$ will also always increase with $E$, and so $\frac{1}{T} = \frac{\partial S}{\partial E}$ will always be positive and so will $T$. No negative temperature needs to be considered for this system anymore. Does this make sense? Now we are saying the correct measure of states is the cumulative number of states under a certain energy, and with that logic the total number of states under a certain energy should never decrease as that upper energy bound increases. The entropy should then always increase with energy, and temperature will always be positive, meaning that it should always take more energy to increase the number of states in the system (I know that sounds a little redundant/circular).

Does any of this really work though? The authors of the paper set out the satisfaction of thermodynamic relations $dS = \frac{dE}{T} - \sum_j \frac{J_j}{T} dX_j$ as the ultimate goal, and show that their use of Gibbs entropy and the associated definition of temperature satisfy this at all scales. It makes some sense because a laser is an example of a system that exhibits population inversion, yet if I stick a thermometer inside a laser in its inverted state (i.e. prior to relaxation), I most certainly will not measure a negative temperature. That is because, in the words of Wikipedia (because I would not say it better), "this is not the macroscopic temperature of the material, but instead the temperature of only very specific degrees of freedom, that are isolated [emphasis mine] from others and do not exchange energy by virtue of the equipartition theorem". So that's cool. Gibbs temperature really does seem like a better measure of thermodynamic temperature under a larger range of conditions, even if Boltzmann temperature gives a more obvious clue to the statistics of the system through the sign change.

But there's another test that temperature has to pass, and that is the zeroth law of thermodynamics from the statistical perspective. From the 8.044 — Statistical Physics I notes written by the professor of that class that year, the whole point of the microcanonical ensemble is that states of equal energy are equally likely, so if the system is isolated with a fixed energy $E$, then all accessible states of energy $E$ have the same probability $\frac{1}{\Omega}$. The paper does not dispute this and in fact puts this out there (in quantum mechanical density operator form) as a definition for the microcanonical ensemble. Moreover, if the system is partitioned by a wall into systems 1 & 2 such that the wall allows for the exchange of energy but not particles, then the probability that system 1 has energy $E_1$ (and system 2 has energy $E - E_1$) is $p(E_1) = \frac{\Omega_1 (E_1) \Omega_2 (E - E_1)}{\Omega (E)}$ arising again from maximal equal probabilities for equal energies. This probability is maximized as a function of $E_1$ at the same point that $\ln(p(E_1))$ is maximized over $E_1$, and that happens when $\frac{\partial \ln(\Omega_1)}{\partial E_1} = \frac{\partial \ln(\Omega_2)}{\partial (E - E_1)}$ which exactly recovers equality of Boltzmann temperatures rather than Gibbs temperatures, because this says $S = \ln(\Omega)$ rather than $S = \ln(\Phi)$ if $\frac{1}{T} = \frac{\partial S}{\partial E}$ in both cases. There has to be an inconsistency here. Is the notion of maximal probability giving thermodynamic equilibrium now not consistent with the notion of Gibbs temperature? Is the way I partitioned the system and counted states/probabilities inconsistent with Gibbs temperature? Or is the equality of Boltzmann temperatures derivable from a similar relation equating Gibbs temperatures? Does that last statement lead to the idea that the probabilities of microstates can be expressed in terms of $\Phi$ rather than $\Omega$, and that maximizing such probabilities can lead to $\frac{\partial \ln(\Phi_1)}{\partial E_1} = \frac{\partial \ln(\Phi_2)}{\partial (E - E_1)}$ or something like that? I'm wondering if this issue falls into one of the warnings issued by the authors about how neither Boltzmann nor Gibbs entropies accurately reproduce Shannon entropy in the microcanonical ensemble, and so Gibbs entropy is not expected to reproduce probability maximization at thermal equilibrium. That would then seem to violate a tenet put forth right at the beginning of the paper about the form and meaning of the microcanonical density matrix.

As I look at this paper more, though, I'm beginning to realize that while the paper is mathematically sound, there are significant sections of it that don't make a whole lot of physical sense. Look carefully at the examples of the quantum harmonic oscillator or the particle in a box. The math is perfectly correct. However, if I look closely at the energies, these are single-particle energies, yet the total number of states ($n$ for a particle in a box because $n \geq 1$, $n + 1$ for an oscillator as $n \geq 0$) is calculated from these single-particle energies. Thermodynamics and statistical mechanics require large particle numbers $N$ for these analyses to be meaningful. One of the key lessons I remember learning twice over sophomore year (once informally in 8.223 — Classical Mechanics II when I proposed a naïve final project idea combining classical analysis with thermodynamics, and again formally in lecture in 8.044 — Statistical Physics I) was that thermodynamics and associated observables like temperature only make physical sense when a very large number of particles is considered; otherwise, it is in principle and in practice possible to evolve a system of $N$ particles for small $N$ exactly, and the "temperature of a single particle" does not make sense. If the authors were serious about considering a system statistically, the total energy would in fact be the sum of single-particle energies index for each particle. Doing so would present none of the issues that supposedly appear when using the Boltzmann entropy, because in the cases of both the harmonic oscillator and the particle in a box, the energies are unbounded on top. Sure, it's possible to make extensive quantities intensive by considering the energy per particle or the heat capacity per particle as a function of intensive quantities like temperature, but it is physically nonsense to consider temperature or heat capacity for a single particle without considering an ensemble of copies. It is merely accidents of the math that the heat capacities calculated from the Gibbs entropy should coincide with the well-known results derived from using Boltzmann entropy and properly extensive systems. Meanwhile, the Boltzmann entropy results for single particles (not for ensemble systems taken per particle) yield temperatures that look wacky exactly because thermodynamics doesn't work for small numbers of particles. In that way, I would actually warrant that the Boltzmann temperature is more informative than the Gibbs temperature, because the Gibbs temperature gives a false sense that considering a single particle thermodynamically or statistically is acceptable.

Those are the most egregious issues. There are a few others as well. In the paper's discussion of the ideal gas, nothing mathematically unsound was done. However, let us examine the claim that $Nd = 1$ yields a negative temperature and $Nd = 2$ yields infinite temperature. If $N$ and $d$ are integers, then $Nd = 1$ can only occur when $N = 1$ and $d = 1$. This essentially says that a single free particle has a negative temperature, but didn't I say earlier that temperature is not physically well-defined for a single particle? Yeah, it looks like that is coming back to haunt this argument. If $Nd = 2$ then $N = 2$ and $d = 1$ or $N = 1$ and $d = 2$; the latter case fails for the same reason as the previous case ($Nd = 1$), while the former fails for essentially the same reason (two free particles can be exactly solved, so temperature as a derived statistical quantity is still meaningless).

The next discussion in the paper is one of a similar system to the two-state system discussed earlier in this post, where the energies are quantized and bounded on both sides. There, using the Gibbs entropy leads to a temperature which increases with energy, while using the Boltzmann entropy leads to the negative temperature issue for higher energies (corresponding to population inversion). The good thing about this example in the paper is that the energy is properly extensive by being a sum of single-particle energies for all $N$ particles (rather than being a single-particle energy alone). However, given all the issues I have raised earlier, while I do welcome the notion that a temperature could be defined such that it is positive for all energies so that it sidesteps the issue of Boltzmann entropy considering only the isolated degrees of freedom, I am not really convinced that the Gibbs temperature is the proper definition that coincides with what a thermometer would measure when it is stuck inside a population-inverted system.

There are a bunch of myths and facts in the paper after that. The one that I have the biggest issue with is the assertion that the Boltzmann and Gibbs entropies do not reproduce the Shannon entropy in the microcanonical ensemble. This is patently false for the Boltzmann entropy both for the two-level system ($p_j = \frac{1}{\Omega}$) and for the classical ideal gas, because in the former case I showed at the very beginning how the Boltzmann and Shannon entropies are equal in the microcanonical ensemble, while in the latter case the Boltzmann H-function is identical to a continuous Shannon entropy (perhaps modulo an overall sign), using the phase space probability density $\rho$ derived from the BBGKY hierarchy. (Quantum mechanically, the entropy to be considered is the von Neumann entropy, which is essentially the operator version of the H-function or Shannon entropy, using the density matrix rather than the phase space probability density.) Clearly, things will change using the integrated $\Phi$ rather than $\Omega$, so in fact the Boltzmann entropy does reproduce the Shannon entropy, while the Gibbs entropy does not. Moreover, while I am forgetting a lot of the details at the moment, I would recommend reading the second and third chapters of Kardar's book "Statistical Physics of Particles" (I know I'm somewhat biased because I took his class 8.333 — Statistical Mechanics I) to really appreciate the connections between thermodynamic (Boltzmann) and information (Shannon) entropies, showing that the Gibbs entropy is in fact less desirable for this reason among the others mentioned above.

Overall, the mistake the authors make is missing the forest for the trees. They want to be able to create a definition of entropy that simultaneously gives thermodynamic results and is valid for systems of arbitrary size. What they fail to see is that thermodynamics is by definition only valid for systems of very large size and is fully empirical, because in its strictest form classical thermodynamics essentially denies the existence of particles and can thereby only consider bulk materials and bulk properties. To account for deficiencies, statistical mechanics accounts for particles, but it can still only deal with large numbers of particles if it is to reproduce thermodynamic results (hence $N \gg 1$ being the thermodynamic limit); if $N = O(1)$, then what is left is deterministic evolution of a state consisting of a handful of particles which can be determined exactly, so if a statistical treatment is desired even in this small limit, it certainly will not be able to reproduce thermodynamic results like the differential equation for entropy in terms of other variables. This means that the seemingly nice-looking Gibbs results on single-particle systems are really nonsense (while the Boltzmann results make clear that such consideration would lead to nonsense anyway), and this calls into question the validity of other Gibbs results like for the quantized bounded energy system of $N$ particles. Moreover, the decoupling $p_j = \frac{1}{\Omega}$ from $S = \ln(\Phi)$ throws into question the other relations the Gibbs results make between thermodynamics and microstate statistics, whereas the Boltzmann entropy $S = \ln(\Omega)$ has no such issue. That said, I can't find fault with the argument that $\langle a \rangle = \mathrm{trace}(a\rho)$ implies the Gibbs entropy rather than the Boltzmann entropy must be correct; I really need to think harder about this, because this does seem to be a blow to Boltzmann entropy if it is indeed true. In conclusion, I think the authors have a bit more to learn about appropriate limits in thermodynamics and statistical mechanics, and I have a bit more to learn about Gibbs entropy and consistency between quantum expectational values & thermodynamic averages.
Happy Festivus everyone!

(UPDATE: There were a few clarifications I wanted to make earlier, along with a few other things that I thought about.

The first regards the consistency relation that $\langle a \rangle = \mathrm{trace}(a\rho)$ implies that the Gibbs entropy is more correct than the Boltzmann entropy. I still cannot find fault with this argument. Furthermore, I tried it the other way, starting from the definition of the Boltzmann entropy and trying to arrive at the definition of a density matrix expectation value, but I could not do it like I could for the Gibbs entropy. The second regards the interpretation of entropy. If $\frac{1}{T} = \frac{\partial S}{\partial E}$, I could say that temperature gives how many more states become available after an infinitesimal addition of energy, and so energy flows from an object with higher positive temperature to one with lower positive temperature because more states become available to the combined system in the process (even if fewer states are available to the hotter body). This works well for traditional systems like classical ideal gases. For a two-level system in which the object at negative temperature comes in contact with an object at positive temperature, using the Boltzmann definitions, the number of states available to the system is always higher if energy flows from the negative temperature object to the positive temperature one, because in this case both subsystems gain available states through that energy transfer. The problem I have with the argument that in a bounded quantized system the higher energy levels must be distinguishable somehow from the lower energy levels in the counting of states (which would motivate using $\Phi$ rather than $\Omega$) is that the microcanonical rather than the canonical ensemble is used, and if all probabilities for an energy are equally likely and that energy is fixed, then in fact there is no reason to treat the different energies differently. When the authors talk about experimentalists being able to distinguish between high and low energies, that experimentalist is most likely working in a canonical ensemble (because a microcanonical ensemble is empirically very difficult to maintain), in which the different energies are treated differently by virtue of the temperature (rather than the total energy) being the independently controlled variable. This rather undercuts the authors' other arguments about how certain facets of the Boltzmann entropy are not applicable because they only pertain to the canonical ensemble. Moreover, because the energy is fixed, there is no good reason to consider all energies below that one when counting the number of states, because that seems to be reasoning more reminiscent of the canonical ensemble (unless the end goal is $\Omega = \frac{d\Phi}{dE}$ within the microcanonical formulation).

Next, I wanted to clarify a bit about the Boltzmann entropy being an information entropy especially for a classical ideal gas. Solving the BBGKY hierarchy may not be exactly doable. One approximation is the Boltzmann (yes, lots of things were named after him) collisional approximation, in which the phase space density is approximated to be the same before and after a collision and unknowable during the collision. This literally throws information away, so information entropy is gained. This information entropy increase is also exactly the thermodynamic entropy increase that is observed when an ideal gas confined to one part of a container is suddenly (not quasistatically) allowed to enter the other region of the container. Furthermore, this is a fully microcanonical system. So why again does the Boltzmann entropy not correspond to information entropy?

After that, I wanted to discuss a little bit about the Carnot efficiency for negative temperature. I do agree with the authors' statements that population inversion and relaxation in a two-level system at negative temperature would be sudden and not quasistatic, meaning that temperature would be less well-defined, and the efficiency would not be the same as that of a Carnot engine. That said, there is still use in negative temperature. One application of a two-level system is a laser. In stimulated emission, more photons exit than enter. This would certainly seem to produce an efficiency larger than 1, and negative temperature makes this clear from that perspective. On the other hand, the energy required to achieve the population inversion must have come from somewhere. Moreover, if a two-level system comes in contact with something at positive temperature and is allowed to relax, the two-level system is no longer isolated, energy is no longer conserved within the two-level system by construction, and the microcanonical ensemble is no longer correct to begin with. Thus, while I think it certainly is beneficial to caution against blindly using negative temperature in various formulas, I also think the authors' concerns are a bit overblown, especially as the physical interpretations of negative Boltzmann temperature are quite well-established at this point.

Finally, I wanted to conclude this update by reiterating some of the points I made in the original conclusion. More is different. Statistical mechanics depends on there being large numbers of degrees of freedom to work correctly, exactly because statistics itself depends on large sample sizes to be meaningful. Statistical mechanics is heavily based on probability & statistics, so blindly using such tools on small numbers of degrees of freedom will inevitably lead to problems. This should put to rest the notion that Gibbs entropy is somehow better because it can account for the thermodynamics of small particle numbers where Boltzmann entropy cannot; that statement is nonsense because thermodynamics and statistical mechanics are not physically empirically meaningful to begin with for small particle numbers in the first place. I do think the paper raises important points about the consistency of Boltzmann entropy in the microcanonical ensemble with quantum statistical expectation values along with the dangers of relying too heavily on negative temperature to bring obvious physical predictions. Beyond that, though, I think the paper's suggestion of using Gibbs entropy rather than Boltzmann entropy has a lot of issues, and I remain unconvinced of its supposed merits.

I think I will have access to the newer paper when I get back to MIT after break. Also, I very much welcome you to criticize, point out flaws in, reject, or otherwise comment on my arguments. If enough differences arise between now and when I get to read the paper or after I read the paper (i.e. if it turns out that the arguments in this older paper are made much stronger in the newer paper), then I will post a follow-up to this. Until then, this is basically what I have to say about this.

(UPDATE: Oops, it looks like I had forgotten to add another thing. This also means this second update section will be updated as needed until and unless a follow-up post happens. What I had forgotten to add was that the reason why the heat capacities coincidentally look like what they might from Boltzmann entropy/equipartition is because $\Omega$ follows a power law dependence in $E$, so $\Phi$ will as well, and taking the limit of large $N$ would erase any differences between the two approaches. (This is also why the argument that $\frac{Nd}{2}$ is different from $\frac{Nd}{2} - 1$ is nonsense, because statistical mechanics only works in the limit that those two quantities are essentially the same.) If there was a more complicated power series dependence, the Gibbs entropy would fail to reproduce the familiar results.))