I feel that the process of using statistical ensembles to find properties of thermal system is not rigorous enough. There are some operations that need to be defined precisely. Also, it is not generalized enough. Currently, the only generally used statistical ensembles are the microcanonical ensemble, the canonical ensemble, and the grand canonical ensemble, but there are actually other possible ensembles that are potentially useful. Therefore, I feel it necessary to try to have a mathematical formulation.

Mathematical tools and notations

Suppose (Ω,σ(Ω),P)(\Omega,\sigma(\Omega),P) is a probability space. Suppose WW is an affine space. For some map f:ΩWf:\Omega\to W, we define the PP-expectation of ff as

EP ⁣[f]xΩ(f(x)e0)dP(x)+e0,\mathrm E_P\!\left[f\right]\coloneqq\int_{x\in\Omega}\left(f(x)-e_0\right)\mathrm dP(x)+e_0,

where e0We_0\in W is arbitrary. Here the integral is Pettis integral. The expectation is defined if the Pettis integral is defined, and it is then well-defined in that it is independent of the e0e_0 we choose.


Suppose X,YX,Y are Polish spaces. Suppose (Y,σ(Y),μ),(X,σ(X),ν)(Y,\sigma(Y),\mu),(X,\sigma(X),\nu) are measure spaces, where μ\mu and ν\nu are σ-finite Borel measures. Suppose π:YX\pi:Y\to X is a measurable map so that

Aσ(X):ν(A)=0μ ⁣(π1 ⁣(A))=0.\forall A\in\sigma(X):\nu(A)=0\Rightarrow\mu\!\left(\pi^{-1}\!\left(A\right)\right)=0.

Then, for each xXx\in X, there exists a Borel measure μx\mu_x on the measurable subspace (π1(x),σ ⁣(π1(x)))\left(\pi^{-1}(x),\sigma\!\left(\pi^{-1}(x)\right)\right), such that for any integrable function ff on YY,

yYf ⁣(y)dμ(y)=xXdν(x)yπ1(x)f ⁣(y)dμx(y).\int_{y\in Y}f\!\left(y\right)\mathrm d\mu(y) =\int_{x\in X}\mathrm d\nu(x)\int_{y\in\pi^{-1}(x)}f\!\left(y\right)\mathrm d\mu_x(y).

Proof

Proof. Because μ\mu is σ-finite, we have a countable covering of YY by pairwise disjoint measurable sets of finite μ\mu-measure, denoted as {Yi}\left\{Y_i\right\}. Each YiY_i is automatically stroke=‘#currentColor’ and inherits the σ-algebra from YY, and (Yi,σ ⁣(Yi),μ)\left(Y_i,\sigma\!\left(Y_i\right),\mu\right) is a measure space.

Define πi:YiX\pi_i:Y_i\to X as the restriction of π\pi to YiY_i, then πi\pi_i is automatically a measurable map from YiY_i to XX, and for any xXx\in X,

π1(x)=iπi1(x),\pi^{-1}(x)=\bigcup_i\pi_i^{-1}(x),

and the terms in the bigcup are pairwise disjoint.

Let νi\nu_i be a measure on XX defined as

νi(A)μ ⁣(πi1 ⁣(A)).\nu_i(A)\coloneqq\mu\!\left(\pi_i^{-1}\!\left(A\right)\right).

This is a measure because πi\pi_i is a measurable map. According to the disintegration theorem, for each xXx\in X, there exists a Borel measure μi,x\mu_{i,x} on YiY_i such that for ν\nu-almost all xXx\in X, μi,x\mu_{i,x} is concentrated on πi1(x)\pi_i^{-1}(x) (in other words, μi,x ⁣(Yπi1(x))=0\mu_{i,x}\!\left(Y\setminus\pi_i^{-1}(x)\right)=0); and for any integrable function ff on YiY_i,

yYif ⁣(y)dμ(y)=xXdνi(x)yπi1(x)f ⁣(y)dμi,x(y).\int_{y\in Y_i}f\!\left(y\right)\mathrm d\mu(y) =\int_{x\in X}\mathrm d\nu_i(x)\int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_{i,x}(y).

From the condition in the original proposition, we can easily prove that νi\nu_i is absolutely continuous w.r.t. ν\nu. Therefore, we have their Radon–Nikodym derivative

φi(x)dνi(x)dν(x).\varphi_i(x)\coloneqq\frac{\mathrm d\nu_i(x)}{\mathrm d\nu(x)}.

For each xXx\in X, define the measure μx\mu_x on π1(x)\pi^{-1}(x) as

μx(A)iφi ⁣(x)μi,x ⁣(AYi).\mu_x(A)\coloneqq\sum_i\varphi_i\!\left(x\right)\mu_{i,x}\!\left(A\cap Y_i\right).

This is a well-defined measure because the sets AYiA\cap Y_i are pairwise disjoint, and μi,x\mu_{i,x} is well-defined measure on YiY_i.

Then, for any integrable function ff on YY,

yYf ⁣(y)dμ(y)=iyYif ⁣(y)dμ(y)=ixXdνi(x)yπi1(x)f ⁣(y)dμi,x(y)=ixXφi ⁣(x)dν(x)yπi1(x)f ⁣(y)dμi,x(y)=xXdν(x)iyπi1(x)f ⁣(y)dμx(y)=xXdν(x)yπ1(x)f ⁣(y)dμx(y).\begin{align*} \int_{y\in Y}f\!\left(y\right)\mathrm d\mu(y) &=\sum_i\int_{y\in Y_i}f\!\left(y\right)\mathrm d\mu(y)\\ &=\sum_i\int_{x\in X}\mathrm d\nu_i(x)\int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_{i,x}(y)\\ &=\sum_i\int_{x\in X}\varphi_i\!\left(x\right)\mathrm d\nu(x) \int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_{i,x}(y)\\ &=\int_{x\in X}\mathrm d\nu(x)\sum_i\int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_x(y)\\ &=\int_{x\in X}\mathrm d\nu(x)\int_{y\in\pi^{-1}(x)}f\!\left(y\right)\mathrm d\mu_x(y). \end{align*} \square

Here, the family of measures {μx}\left\{\mu_x\right\} is called the disintegration of μ\mu w.r.t. π\pi and ν\nu.


For two vector spaces W1,W2\vec W_1,\vec W_2, we denote W1×W2\vec W_1\times\vec W_2 as the direct sum of them. Also, rather than calling the new vector space their direct sum, I prefer to call it the product vector space of them (not to be confused with the tensor product) so that it is consistent with the notion of product affine spaces, product measure spaces, product topology, etc. Those product spaces are all notated by “×\times” in this article.

Also, “W1\vec W_1” can be an abbreviation of W1×{02}\vec W_1\times\left\{0_2\right\}, where 020_2 is the zero vector in W2\vec W_2.


Suppose WW is an affine space associated with the vector space W\vec W. For any AWA\subseteq W and BWB\subseteq\vec W, we denote A+BA+B as the Minkowski sum of AA and BB, i.e.,

A+B{a+b|aA,bB}.A+B\coloneqq\left\{a+b\,\middle|\,a\in A,\,b\in B\right\}.

This extends the definition of usual Minkowski sums for affine spaces.

By the way, because of the abbreviating “W1\vec W_1” meaning W1×{02}\vec W_1\times\left\{0_2\right\} above, we can abuse the notation and write

W1+W2=W1×W2,\vec W_1+\vec W_2=\vec W_1\times\vec W_2,

where “++” denotes the Minkowski sum. This is true for any two vector spaces W1,W2\vec W_1,\vec W_2 that do not share a non-trivial vector subspace.


In general, it is not necessarily possible to decompose a topology as a product of two topologies. However, it is always possible for locally convex Hausdorff TVSs. We can always decompose the topology of a locally convex Hausdorff TVS as the product of the topologies on a pair of its complementary vector subspaces, one of which is finite-dimensional. This is true because every finite-dimensional subspace in such a space is topologically complemented. The complete statement is the following:

Let W\vec W be a locally convex Hausdorff TVS. For any finite-dimensional subspace W\vec W^\parallel of W\vec W, there is a complement W\vec W^\perp of it such that the topology τ ⁣(W)\tau\!\left(\vec W\right) is the product topology of τ ⁣(W)\tau\!\left(\vec W^\parallel\right) and τ ⁣(W)\tau\!\left(\vec W^\perp\right).

This decomposition is also valid for affine spaces. If an affine space WW is associated with a locally convex Hausdorff TVS W\vec W, then for any finite-dimensional vector subspace W\vec W^\parallel of W\vec W, we can topologically decompose WW into W+WW^\perp+\vec W^\parallel.

Because the product topology of subspace topologies is the same as the subspace topology of the product topology, we can also decompose E+WE^\perp+\vec W^\parallel as the product topological space of EE^\perp and W\vec W^\parallel if EWE^\perp\subseteq W^\perp.

Such decompositions are useful because they allow us to disintegrate Borel measures. If we already have a σ-finite Borel measure on E+WE^\perp+\vec W^\parallel and we can define a σ-finite Borel measure on W\vec W^\parallel, then we can define a measure on EE^\perp by the disintegrating, and we guarantees that the disintegration is also σ-finite and Borel.


When I want to use multi-index notations, I will use “\bullet” to denote the indices. For example,

Σαα.\Sigma\alpha_\bullet\coloneqq\sum_\bullet\alpha_\bullet.

αβαβ.\alpha_\bullet\beta_\bullet\coloneqq\sum_\bullet\alpha_\bullet\beta_\bullet.

αβαβ.\alpha_\bullet^{\beta_\bullet}\coloneqq\prod_\bullet\alpha_\bullet^{\beta_\bullet}.

α!α!.\alpha_\bullet!\coloneqq\prod_\bullet\alpha_\bullet!.

Extensive quantities and macrostates

First, I need to point out that the most central state function of a thermal system is not its energy, but its entropy. The energy is regarded as the central state function in thermodynamics, which can be seen from the fundamental equation of thermodynamics

dU=pdV+TdS+μdN.\mathrm dU=-p\,\mathrm dV+T\,\mathrm dS+\mu\,\mathrm dN.

We also always do the Legendre transformations on the potential function UU to get other potential functions instead of doing the transformation on other extensive quantities. All such practices make us think that SS is just some quantity that is similar to VV and NN, and mathematically we can just regard it as an extensive quantity whose changing is a way of doing work.

However, this is not the case. The entropy SS is different from U,V,NU,V,N in the following sense:

  • The entropy is a derived quantity due to a mathematical construction from the second law of thermodynamics, while U,V,NU,V,N are observable quantities that have solid physical meanings before we introduce anything about thermodynamics.
  • The entropy may change in an isolated system, while U,V,NU,V,N do not.
  • We may have an intuitive understanding of how different systems in contact may exchange U,V,NU,V,N with each other, but SS cannot be “exchanged” in such a sense.
  • In statistical mechanics, U,V,NU,V,N restrict what microstates are possible for a thermal system, but SS serves as a totally different role: it represents something about the probability distribution over all the possible microstates.

Therefore, I would rather rewrite the fundamental equation of thermodynamics as

dS=1TdU+pTdVμTdN.\mathrm dS=\frac1T\,\mathrm dU+\frac pT\,\mathrm dV-\frac\mu T\,\mathrm dN. (1)(1)

Equation 1 embodies how different quantities serve different roles more clearly, but it becomes vague in its own physical meaning. Does it mean different ways of changing the entropy in quasi-static processes? Both mathematically and physically, yes, but it is not a useful interpretation. Because what we are doing is mathematical formulation of physical theories, we do not need to try to assign physical meanings to anything we construct. This new equation is purely mathematical, and the only way we use it is to relate intensive variables to derivatives of SS w.r.t. extensive quantities.

From now on, I will call quantities like U,V,NU,V,N the extensive quantities, not including SS. However, this is not a good statement as part of our mathematical formulation. Considering that there is a good notion of how different systems may exchange values of extensive quantities and that we can scale a system by multiplying the extensive quantities by a factor, we require that the extensive quantities must support at least linear operations… do we?

Well, actually we will see that if we require a space to be a vector space, things would be a little bit complex because sometimes we need to construct a new space of extensive quantities out of the affine subspace of an existing one, which is not a vector space by nature. If we require the space to be a vector space, we need to translate that affine subspace to make it pass through the zero element of the vector space, which is possible but does not give any insight about the physics except adding complication to our construction. Therefore, I will not require the space of extensive quantities to be a vector space, but be an affine space.

You may ask, OK then, but how do we “add” or “scale” extensive quantities if they live on an affine space? First, regarding the addition operation, we will use an abstraction for such operations so that the actual implementation about how do we combine the summands is hidden under this abstraction. We will see that this abstraction is useful because it also applies to other senarios or useful operations that does not necessarily involve any meaningful addition. Regarding the scaling operation, I would argue that now we do not need them. I have generalized the notion of extensive quantities so that now the notion “extensive quantities” includes some quantities that are not really extensive quantities in any traditional sense. They are no longer meant to be scaled because they simply cannot. Actually, rather than calling them extensive quantities, I would like to call them a macrostate, with the only difference from the general notion macrostate being that it has an affine structure so that I can take the ensemble average of it to get its macroscopic value. I would stick to the term “extensive quantities” because they are actual extensive quantities in all my examples and because it is a good way to understand its physical meaning with this name, but you need to keep in mind that what I actually refer to is a macrostate.

There is another difficulty. If we look closely, Equation 1 actually does not make much sense in that NN is quantized (and also UU if we are doing quantum). If we are doing real numbers, we can always translate a quantized quantity to something that is not allowed, which means that we cannot have the full set of operations on the allowed values of the extensive quantities. Therefore, we need to specify a subset on the affine space to represent the allowed values of the extensive quantities.

We also see that Equation 1 is a relation between differentials. Do we need to require that we have differential structure on the space of extensive quantities? Not yet, because it actually is somehow difficult. The same difficulty about the quantized quantities applies. The clever way is to just avoid using the differentials. (Mathematicians are always skeptical about differentiating something while physicists just assume everything is differentiable…) It may seem surprising, but actually differentials are evitable in our mathematical formulation if you do not require intensive variables to be well-defined inside the system itself (actually, they are indeed not well-defined except when you have a system in thermal equilibrium and take the thermaldynamic limit).

If we have to use differentials, we can use the Gateaux derivative. It is general enough to be defined on any locally convex TVS, and it is intuitive when it is linear and continuous.

Although differential structure is not necessary, there is an inevitable structure on the space of extensive quantities. Remember that in canonical and grand canonical ensembles, we allow UU or NN to fluctuate, so we should be able to describe such fluctuations on our space of extensive quantities. To do this, I think it is safe to assume that we can have some topology on the allowed subset to make it a Polish space, just like how probabilists often assume about the probability space they are working on.

A final point. Here is a difference in how physicists and mathematicians describe probability distributions: physicists would use a probability density function while mathematicians would use a probability measure. Mathematically, to have a probability density function, we need to have an underlying measure on our space for a notion of “volume” on the space, and then we can define the probability density function as the Radon–Nikodym derivative of the probability measure w.r.t. the underlying volume measure. Also, for the Radon–Nikodym derivative to exist, the probability measure must be absolutely continuous w.r.t. the volume measure, which means that we have to sacrifice all the probability distributions that are not absolutely continuous to take the probability density function approach. Then, it seems that if we use the probability density function approach, we are introducing an excess measure structure on the space of extensive quantities and losing some possibilities and generalizabilities, but it would turn out that the extra structure is useful. Therefore, I will use the probability density function approach.

Here is our final definition of the space of extensive quantities:

Definition. A space of extensive quantities is a tuple (W,E,λ)(W,E,\lambda), where

  • WW is an affine space associated with a reflexive vector space W\vec W over R\mathbb R, and it is equipped with topology τ(W)\tau(W) that is naturally constructed from the topology τ ⁣(W)\tau\!\left(\vec W\right) on W\vec W;
  • EVE\subseteq V is a topological subspace of WW, and its topology τ(E)\tau(E) makes EE a Polish space; and
  • λ:σ(E)[0,+]\lambda:\sigma(E)\to[0,+\infty] is a non-trivial σ-finite Borel measure, where σ(E)B(E)\sigma(E)\supseteq\mathfrak B(E) is a σ-algebra on EE that contains the Borel σ-algebra on EE.

Here, I also added a requirement of σ-finiteness. This is necessary when constructing product measures. At first I also wanted to require that λ\lambda has some translational invariance, but I then realized that it is not necessary, so I removed it from the definition (but we will see that we need them as a property of baths).

Example. Here is an example of a space of extensive quantities.

WR3,E(0,+)×(0,+)×Z+,λ(A)NZ+area(A(0,+)×(0,+)×{N}).\begin{align*} W&\coloneqq\mathbb R^3,\\ E&\coloneqq(0,+\infty)\times(0,+\infty)\times\mathbb Z^+,\\ \lambda(A)&\coloneqq\sum_{N\in\mathbb Z^+}\operatorname{area}(A\cap(0,+\infty)\times(0,+\infty)\times\{N\}). \end{align*}

Physically we may think of this as the extensive quantities of the system of ideal gas. The three dimensions of WW are energy, volume, and number of particles.

Example. Here is another example of a space of extensive quantities.

WR2,E{(3N/2+n,N)NZ+,nN},λ(A)cardA.\begin{align*} W&\coloneqq\mathbb R^2,\\ E&\coloneqq\{(3N/2+n,N)\,|\,N\in\mathbb Z^+,n\in\mathbb N\},\\ \lambda(A)&\coloneqq\operatorname{card}A. \end{align*}

Physically we may think of this as the extensive quantities of the system of Einstein solid with ω=1\hbar\omega=1. The two dimensions of WW are energy and number of particles.

Thermal systems and the number of microstates

Remember I said above that, in statistical mechanics, U,V,NU,V,N restrict what microstates are possible for a thermal system. We can translate this as such: for each possible values of extensive quantities, denoted as eEe\in E, here is a set of possible microstates, denoted as MeM_e (you can then see why we excluded the entropy from the extensive quantities: otherwise we cannot do such a classification of microstates).

Now the problem is what structures we should add to MeM_e for each eEe\in E. Recall that in statistical mechanics, we study probability distribution over all possible microstates. Therefore, we need to be able to have a probability measure on MeM_e. In other words, MeM_e should be a measurable space. As said before, we can either use a probability measure directly, or use a volume measure together with a probability density function. This time, we seem to have no choice but the probability density function approach because there is a natural notion of volume on MeM_e: the number of microstates.

Wait! There is a problem. Recall that in microcanonical ensemble, we allow the energy to fluctuate. The number of microstates at exactly a certain energy is actually zero in most cases, so we are actually considering those microstates with some certain small range of energy. In other words, we are considering the microstate density: the number of microstates inside unit range of energy. Similarly, we should define a measure on MeM_e to represent the microstate density, which is the number of microstates inside unit volume of extensive quantities, where the “volume” is measured by the measure λ\lambda in the space of the extensive quantities.

This makes our formulation a little bit different from the microcanonical ensemble: our formulation would allow all extensive quantities to fluctuate while the microcanonical ensemble would only allow the energy to fluctuate. This is inevitable because we are treating extensive quantities like energy, volume, and number of particles as the same kind of quantity. It is not preferable to separate a subspace out from our affine space WW to say “these are the quantities that may fluctuate, and those are not.” Therefore, we need to justify why we may allow all extensive quantities to fluctuate. The justification is: mathematically, we are actually not allowing any extensive quantities to fluctuate. There is no actual fluctuation, and we are directly considering the microstate density without involving any change in the extensive quantities. In other words, using the language of microcanonical ensemble, we are considering the area of the surface of the energy shell instead of the volume of the energy shell with a small thickness.

Another important point is that we must make sure that specifying all the extensive quantities should be enough to restrict the system to finite number of microstates. In other words, the total microstate density should be finite for any possible eEe\in E. Also, there should be at least some possible microstates in MeM_e, so the total microstate density should not be zero.

We may then sum up the above discussion to give MeM_e enough structure to make it the set of microstates of a thermal system with the given extensive quantities ee. Then, the disjoint union of all of them (the family of measure spaces) is the thermal system.

Definition. A thermal system is a pair (E,M)\left(\mathcal E,\mathcal M\right), where

  • E(W,E,λ)\mathcal E\coloneqq\left(W,E,\lambda\right) is a space of extensive quantities;
  • MeEMe\mathcal M\coloneqq\bigsqcup_{e\in E}M_e is a family of measure spaces; and
  • For each eEe\in E, MeM_e is a measure space equipped with a measure μe\mu_e such that μe ⁣(Me)\mu_e\!\left(M_e\right) is finite and nonzero.

From now on, I will use a pair (e,m)M(e,m)\in\mathcal M to specify a single microstate, where eEe\in E and mMem\in M_e.


Example. For the thermal system of a solid consisting of spin-12\frac12 particles, where each particle has two possible states with energy 00 and 11, we can construct

WR2,E{(U,N)N×Z+|UN},λ(A)cardA,MU,N{n{0,1}N|ini=U},μU,N(A)cardA.\begin{align*} W&\coloneqq\mathbb R^2,\\ E&\coloneqq\left\{\left(U,N\right)\in\mathbb N\times\mathbb Z^+\,\middle|\,U\le N\right\},\\ \lambda(A)&\coloneqq\operatorname{card}A,\\ M_{U,N}&\coloneqq\left\{n\in\left\{0,1\right\}^N\,\middle|\,\sum_in_i=U\right\},\\ \mu_{U,N}(A)&\coloneqq\operatorname{card}A. \end{align*}

This should be the simplest example of a thermal system.

Example. We may complete the example of the system of ideal gas. Suppose we are considering the system of ideal atomic gas inside a cubic box. The construction of the space of extensive quantities is the same as before. Denote possible values of extensive quantities in coordinates e=(U,V,N)e=(U,V,N). Now the measure spaces MeM_e may be constructed as such:

MU,V,N{()([0,V3]3×R3)N|lexicographic order, ipi22m=U},μU,V,N(A)H6N1(A)h3N.\begin{align*} M_{U,V,N}&\coloneqq\left\{\left(\ldots\right)\in \left(\left[0,\sqrt[3]V\right]^3\times\mathbb R^3\right)^N \,\middle|\,\text{lexicographic order, }\sum_i\frac{\left|\mathbf p_i\right|^2}{2m}=U\right\},\\ \mu_{U,V,N}(A)&\coloneqq\frac{H^{6N-1}(A)}{h^{3N}}. \end{align*}

The “lexicographic order” here means that only those configurations where particle indices coincides with the lexicographic order are included in MeM_e. This is because the particles are indistinguishable, and the order of particles is irrelevant. The lexicographic order restriction is the same as using the quotient of the NN-fold Cartesian product by permutation actions, but then defining μe\mu_e would be difficult. Alternatively, we may still make them ordered, but divide the result by N!N! in the definition of μe\mu_e, but this way is less clear in its physical meaning.

Here HdH^d is the dd dimensional Hausdorff measure. To understand, the expression H6N1(A)H^{6N-1}(A) is just the (6N1)(6N-1)-dimensional “volume” of AA.


Since we have microstate density, why do not we have the true number of microstates? We can define a measure on M\mathcal M to represent the number of microstates.

Definition. The measure of number of microstates is a measure μ:σ(M)[0,+]\mu:\sigma(\mathcal M)\to\left[0,+\infty\right], where

σ(M){eABe|Aσ(E),Beσ(Me)},\sigma(\mathcal M)\coloneqq\left\{\bigsqcup_{e\in A}B_e\,\middle|\,A\in\sigma(E),\,B_e\in\sigma(M_e)\right\},

and the measure is defined by

μ(A)(e,m)Adμe(m)dλ(e).\mu(A)\coloneqq\iint\limits_{(e,m)\in A}\mathrm d\mu_e(m)\,\mathrm d\lambda(e).

The uniqueness of μ\mu is guaranteed by the σ-finiteness of λ\lambda and μe\mu_e. The expression μ(A)\mu(A) is called the number of microstates in AA.

States and the entropy

Here is a central idea in statistical ensembles: a state is a probability distribution on the microstates of a thermal system. It is among the ideas upon which the whole theory of statistical ensembles is built. I will take this idea, too.

As said before, I have taken the probability density approach of defining a probability distribution. Therefore, a state is just a probability density function.

Definition. A state of a thermal system (E,M)(\mathcal E,\mathcal M) is a function p:M[0,+]p:\mathcal M\to\left[0,+\infty\right] such that (M,σ(M),P)(\mathcal M,\sigma(\mathcal M),P) is a probability space, where P:σ(M)[0,1]P:\sigma(\mathcal M)\to\left[0,1\right] is defined by

P(A)Apdμ.P(A)\coloneqq\int_Ap\,\mathrm d\mu. (2)(2)

Two states are the same if they are equal μ\mu-almost everywhere.

A probability space is just a measure space with a normalized measure, and here the physical meaning of pp is the probability density on M\mathcal M, and P(A)P(A) is the probability of finding a microstate in AA.

Note that a state is not necessarily an equilibrium state (thermal state). We will introduce the concept of equilibrium states later.


Now we may introduce the concept of entropy.

I need to clarify that the entropy that we are talking about here is just the entropy in statistical mechanics. The reason I add this clarification is that we may also formally define an entropy in the language of measure theory, which is defined for any probability space and does not depend on any so-called probability density function or a “volume” measure (which is the number of microstates in our case). The definition of this entropy is (if anyone is interested)

SinfosupΠAΠP(A)lnP(A),S^{\mathrm{info}}\coloneqq\sup_\Pi\sum_{A\in\Pi}-P(A)\ln P(A),

where PP is the probability measure on the probability space, and the supremum is taken over all PP-almost partition Π\Pi of the probability space (Π\Pi is a subset of the σ-algebra so that P(AΠA)=1P(\bigcup_{A\in\Pi}A)=1 and P(AB)=0P(A\cap B)=0 for A,BΠA,B\in\Pi). This definition looks intuitive and nice, and not surprisingly it is… not consistent with the entropy in statistical mechanics. The discrepancy happens when we are doing classical statistical mechanics because the entropy defined above will diverge to infinity for those “continuous” probability distributions. A quick check is that the entropy of the uniform distribution over [0,1][0,1] is ++\infty.

Definition. The entropy of a state pp is defined by

S[p]Mplnpdμ.S[p]\coloneqq\int_\mathcal M-p\ln p\,\mathrm d\mu.

Different from extensive quantities, the entropy is a functional of pp. The entropy here is consistent with the entropy in thermodynamics or statistical mechanics.

This definition of entropy is called the Gibbs entropy formula. It agrees with the entropy defined in thermodynamics, but we are unable to show that at this stage because we have not defined temperature or heat yet.

Note that the base of the logarithm is not important, and it is just a matter of unit system. In SI units, the base would be expkB1\exp k_\mathrm B^{-1}, where kBk_\mathrm B is the Boltzmann constant.


Physically, the extensive quantities may be measured macroscopically. The actual values that we get when we measure them are postulated to be the ensemble average. Therefore, for a given state pp, we can define the measured values of extensive quantities by taking the PP-expectation of the extensive quantities.

Definition. For a thermal system (E,M)(\mathcal E,\mathcal M) and a state pp of it, the measured value of extensive quantities of the state pp is the PP-expectation of the EE-valued random variable (e,m)e(e,m)\mapsto e. Explicitly, the definition is

ε[p]EP ⁣[(e,m)e],\varepsilon[p]\coloneqq\mathrm E_P\!\left[\left(e,m\right)\mapsto e\right],

where the probability measure PP on M\mathcal M is defined in Equation 2.

In the definition, it involves taking the PP-expectation of a WW-valued function. This involves doing a Pettis integral, which I claim to exist. It exists because the map (e,m)ee0(e,m)\mapsto e-e_0 must be weakly PP-measurable, and such a function must be Pettis-integrable on a reflexive space.

Note that ε[p]W\varepsilon[p]\in W, and it is not necessarily in EE.

The usage of the measured value of extensive quantities is that we can use it to get the fundamental equation of a thermal system, which describes the relationship between the extensive quantities and the entropy at any equilibrium state. Suppose that we postulate a family of states ptp_t^\circ of the thermal system (or its slices, which will be introduced below), labeld by different tt’s, and call them the possible equilibrium states. Then, we can have the following two equations:

{S=S ⁣[pt],ε=ε ⁣[pt].\begin{cases} S^\circ=S\!\left[p_t^\circ\right],\\ \varepsilon^\circ=\varepsilon\!\left[p_t^\circ\right]. \end{cases} (3)(3)

By cancelling out the tt in the two equations (which may be impossible but assumed to be possible), we can get the fundamental equation in this form:

S=S ⁣(ε).S^\circ=S^\circ\!\left(\varepsilon^\circ\right). (4)(4)

Then, here we get the function S:ERS^\circ:E^\circ\to\mathbb R, where EE^\circ is a subset of WW consisting of all possible measured values of extensive quantities among equilibrium states. If we can possibly define some differential structure on EE^\circ so that we can possibly take the differential of SS^\circ and write something sensible like

dS=i ⁣(ε)(dε),\mathrm dS^\circ=i\!\left(\varepsilon^\circ\right)(\mathrm d\varepsilon^\circ),

where i ⁣(ε)Wi^\circ\!\left(\varepsilon^\circ\right)\in\vec W' is a continuous linear functional, then we can define i ⁣(ε)i^\circ\!\left(\varepsilon^\circ\right) to be the intensive quantities at ε\varepsilon^\circ. A proper comparison with differential geometry is that we may analogly call ii^\circ be a covector field on EE^\circ defined as the differential of the scalar field SS^\circ.

However, as I have said before, I did not postulate there to be any differential structure on EE^\circ, so the intensive quantities should not be generally defined in this way.

Slicing

A good notion about thermal systems is that we can get new thermal systems from existing ones (although they are physically essentially the same system, they have different mathematical structure and contain different amount of information about them). There are two ways of constructing new thermal systems from existing ones:

  • By fixing some extensive quantities. I call this way slicing.
  • By allowing some extensive quantities to change freely. I call this way contracting.

I chose the words “slicing” and “contracting”. They are not present in actual physics textbooks, but I found the notion of them necesesary.

Slicing fixes extensive quantities. How we do it is to pick out a subset of EE and make it our new accessible values of extensive quantities. I find a special way of picking out such a subset is especially useful: picking it from an affine subspace of WW. In this way, we can use a smaller affine space as the underlying space of our new thermal system. Then we see why I chose the word “slicing”: we are slicing the original affine space into parallel pieces, and picking one piece as our new affine space, and picking the corresponding accessible values of extensive quantities and possible microstates within that piece to form our new thermal system.

Definition. A slicing of a space of extensive quantities (W,E,λ)\left(W,E,\lambda\right) is a pair (W,λ)\left(W^\parallel,\lambda^\parallel\right), where

  • WWW^\parallel\subseteq W is an affine subspace of WW;
  • EEWE^\parallel\coloneqq E\cap W^\parallel is non-empty, and it is Polish as a topological subspace of EE; and
  • λ:σ ⁣(E)[0,+)\lambda^\parallel:\sigma\!\left(E^\parallel\right)\to\left[0,+\infty\right) is a non-trivial σ-finite Borel measure on EE^\parallel, where σ ⁣(E)B ⁣(E)\sigma\!\left(E^\parallel\right)\subseteq\mathfrak B\!\left(E^\parallel\right) is a σ-algebra on EE^\parallel that contains the Borel σ-algebra on EE^\parallel.

This constructs a new space of extensive quantities (W,E,λ)\left(W^\parallel,E^\parallel,\lambda^\parallel\right), called a slice of the original space of extensive quantities (W,E,λ)\left(W,E,\lambda\right).

Definition. A slice of a thermal system (E,M)\left(\mathcal E,\mathcal M\right) defined by the slicing (W,λ)\left(W^\parallel,\lambda^\parallel\right) of E\mathcal E is a new thermal system (E,M)\left(\mathcal E^\parallel,\mathcal M^\parallel\right) constructed as such:

  • E(W,E,λ)\mathcal E^\parallel\coloneqq\left(W^\parallel,E^\parallel,\lambda^\parallel\right) is the slice of E\mathcal E corrsponding to the given slicing; and
  • MeEMe\mathcal M^\parallel\coloneqq\bigsqcup_{e\in E^\parallel}M_e.

The idea behind slicing is to make some extensive quantities become extrinsic parameters and not part of the system itself. It would physically mean fixing some extensive quantities. However, here is a problem: if we fix some extensive quantities, the dimension (“dimension” as in “dimensional analysis”) of the volume element in the space of extensive quantities would be changed. In other words, the dimension of λ\lambda does not agree with λ\lambda^\parallel. This is physically not desirable because we want to keep the number of microstates dimensionless so that its logarithm does not depend on the units we use. However, this is not a problem because here is an argument: in any physical construction of a thermal system, it is fine to have non-dimensionless number of microstates, the cost is that the model must not be valid under low temperature; in mathematical construction, dimension is never a thing, so we do not even need to worry about it. In low temperature, we must use quantum statistical mechanics, where all quantities are quantized so that the number of microstates is literally the number of microstates, which must be dimensionless. In high temperature, we do not need the third law of thermodynamics, which is the only law that restricts how we should choose the zero (ground level) of the entropy, and in this case we may freely change our units because it only affects the entropy by an additive constant.

Example. In the example of a system of ideal gas, we may slice the space of extensive quantities to the slice V=1V=1 to fix the volume.

Isolations and the microcanonical ensemble

Here is a special type of slicing. Because a single point is an (zero-dimensional) affine subspace, it may form a slicing. Such a slicing fixes all of the extensive quantities. We may call it an isolating.

A thermal system with a zero-dimensional space of extensive quantities is called an isolated system. The physical meaning of such a system is that it is isolated from the outside so that it cannot exchange any extensive quantities with the outside. We may construct an isolated system out of an existing thermal system by the process of isolating.

Definition. An isolating (at ee^\circ) of a space of extensive quantities (W,E,λ)\left(W,E,\lambda\right) is a slicing (W,λ)\left(W^\parallel,\lambda^\parallel\right) of it, constructed as

W{e},λ(A){1,A={e},0,A=,\begin{align*} W^\parallel&\coloneqq\left\{e^\circ\right\},\\ \lambda^\parallel(A)&\coloneqq\begin{cases}1,&A=\left\{e^\circ\right\},\\0,&A=\varnothing,\end{cases} \end{align*}

where eEe^\circ\in E.

Definition. An isolated system is a thermal system whose underlying affine space of its space of extensive quantities is a single-element set.

Definition. An isolation (at ee^\circ) of a thermal system (E,M)\left(\mathcal E,\mathcal M\right) is the slice of it corresponding to the isolating at ee^\circ of E\mathcal E. An isolation is an isolated system.

Here is an obvious property of isolated systems: the measured value of extensive quantities of any state of an isolated system is ee^\circ, the only possible value of the extensive quantities.


After introducing isolated systems, we can now introduce the equal a priori probability postulate. Although we may alternatively use other set of axioms to develop the theory of statistical ensembles, using the equal a priori probability postulate is a simple and traditional way to do it. Most importantly, this is a way that does not require us to define concepts like the temperature beforehand, which is a good thing for a mathematical formulation because it would require less mathematical structures or objects that are hard to well define at this stage.

Axiom (the equal a priori probability postulate). The equilibrium state of an isolated system is the uniform distribution.

Actually, instead of saying that this is an axiom, we may say that formally this is a definition of equilibrium states. However, I still prefer to call it an axiom because it only defines the equilibrium state of isolated systems rather than any thermal systems.

The equilibrium state of an isolated system (E,M)\left(\mathcal E,\mathcal M\right) may be written mathematically as

p ⁣()1μ ⁣(M).p^\circ\!\left(\cdot\right)\coloneqq\frac1{\mu\!\left(\mathcal M\right)}.

(The circle in the superscript denotes equilibrium state.) After writing this out, we have successfully derived the microcanonical ensemble. We can then calculate the entropy of the state, which is

SS ⁣[p]=lnμ(M).S^\circ\coloneqq S\!\left[p^\circ\right]=\ln\mu(\mathcal M). (5)(5)

Mentioning the entropy, a notable feature about the equilibrium state of an isolated system is that it is the state of the system that has the maximum entropy, and any state different from it has a lower entropy.

Theorem. For an isolated system, for any state pp of it,

S[p]S,S[p]\le S^\circ,

where SS^\circ is the entropy of the equilibrium state of it. The equality holds iff pp is the same state as the equilibrium state.

Proof

Proof. Define a probability measure PP^\circ on M\mathcal M by

P(A)μ(A)μ(M),P^\circ(A)\coloneqq\frac{\mu(A)}{\mu(\mathcal M)},

then (M,σ ⁣(M),P)\left(\mathcal M,\sigma\!\left(\mathcal M\right),P^\circ\right) is a probability space. Any state pp, as a function on M\mathcal M, can be regarded as a random variable in the probability space (M,σ ⁣(M),P)\left(\mathcal M,\sigma\!\left(\mathcal M\right),P^\circ\right).

Define the real function

φ(x){xlnx,x(0,+),0,x=0.\varphi(x)\coloneqq\begin{cases} x\ln x,&x\in\left(0,+\infty\right),\\ 0,&x=0. \end{cases}

It is a convex function, so according to the probabilistic form of Jensen’s inequality,

φ ⁣(EP ⁣[p])EP ⁣[φp].\varphi\!\left(\mathrm E_{P^\circ}\!\left[p\right]\right) \le\mathrm E_{P^\circ}\!\left[\varphi\circ p\right].

In other words,

1μ(M)ln1μ(M)mMp ⁣(m)lnp ⁣(m)dμ ⁣(m)μ(M).\frac1{\mu(\mathcal M)}\ln\frac1{\mu(\mathcal M)} \le\int_{m\in\mathcal M}p\!\left(m\right)\ln p\!\left(m\right) \,\frac{\mathrm d\mu\!\left(m\right)}{\mu(\mathcal M)}.

Then, it follows immediately that S[p]SS[p]\le S^\circ. The equality holds iff φ\varphi is linear on a convex set A[0,+)A\subseteq\left[0,+\infty\right) such that the value of the random variable pp is PP^\circ-almost surely in AA. However, because φ\varphi non-linear on any set with more than two points, the only possibility is that the value of pp is PP^\circ-almost surely a constant, which means that the probability distribution defined by the probability density function pp is equal to the uniform distribution μ\mu-almost everywhere. Therefore, the equality holds iff pp is the same state as the equilibrium state. \square

This theorem is the well-known relation between the entropy and the equilibrium state: for an isolated system, the equilibrium state is the state with the maximum entropy.


By Equation 5, we can now derive the relationship between the entropy and the extensive quantities at equilibrium states by the process of isolating. Define a family of states {pe}eE\left\{p^\circ_e\right\}_{e\in E}, where each state pep^\circ_e is the equilibrium state of the system isolated at ee. Then, we have the fundamental equation

S(e)=lnΩ(e),S^\circ(e)=\ln\Omega(e), (6)(6)

where Ω(e)μe ⁣(Me)\Omega(e)\coloneqq\mu_e\!\left(M_e\right) is called the counting function (I invented the phrase), which is the microscopic characteristic function of microcanonical ensembles. This defines a function S:ERS^\circ:E\to\mathbb R, which may be used to give a fundamental equation in the form of Equation 4, and it is the macroscopic characteristic function of microcanonical ensembles.

We will encounter microscopic or macroscopic characteristic functions for other ensembles later.

Example. In the example of a system of a tank of ideal atomic gas, we have the fundamental equation

S=ln ⁣(1h3NN!VNS3N1 ⁣(2mU)),S^\circ=\ln\!\left(\frac1{h^{3N}N!}V^NS_{3N-1}\!\left(\sqrt{2mU}\right)\right),

where Sn(r)S_n(r) is the surface area of an nn-sphere with radius rr, which is proportional to rnr^n. Taking its derivative w.r.t. U,V,NU,V,N and taking the thermodynamic limit will recover familiar results.

Contracting

I have previously mentioned that the other way of deriving a new system out of an existing one is called contracting. Now we should introduce this concept because it is very useful later when we need to define the contact between subsystems of a composite system (whose definition will be given later).

The idea behind contracting is also to reduce the dimension of the space of extensive quantities. However, rather than making some of the extensive quantities extrinsic parameters, it makes them “intrinsic” within the space of microstates. A vivid analogy is this: imagine a thermal system as many boxes of microstates with each box labeled by specific values of extensive quantities, then we partition those boxes to classify them, and put all the boxes in each partition into one larger box. The new set of larger boxes are labeled by a specific values of fewer extensive quantities, and it is the so-called contraction of the origional set of boxes.

I call it contracting because it is like contracting the affine space of extensive quantities into a flat sheet of its subspace. The way we do this should be described by a projection. A projection in affine space maps the whole space into one of its affine subspace, and the preimage of each point in the subspace is another affine subspace of the original space. The preimages forms a family of parallel affine subspaces labeled by their image under the projection. The family of affine subspaces may be used to define a family of slices of the space of extensive quantities or the thermal system, which are useful when defining the contraction of the space of extensive quantities or the system.

Definition. A contracting of a space of extensive quantities (W,E,λ)\left(W,E,\lambda\right) is given by a tuple (π,λ)\left(\pi,\lambda^\perp\right), where

  • π:WW\pi:W\to W^\perp is a projection map from WW to an affine subspace WW^\perp of WW;
  • Eπ(E)E^\perp\coloneqq\pi(E), the image of EE under π\pi, is equipped with the minimal topology τ ⁣(E)\tau\!\left(E^\perp\right) so that π\pi is continuous, and the topology makes EE^\perp Polish;
  • λ:σ ⁣(E)[0,+]\lambda^\perp:\sigma\!\left(E^\perp\right)\to\left[0,+\infty\right] is a non-trivial σ-finite Borel measure on EE^\perp, where σ ⁣(E)B ⁣(E)\sigma\!\left(E^\perp\right)\supseteq\mathfrak B\!\left(E^\perp\right) is a σ-algebra of EE^\perp that contains the Borel σ-algebra of EE^\perp; and
  • For any Aσ ⁣(E)A\in\sigma\!\left(E^\perp\right), λ(A)=0\lambda^{\perp}(A)=0 iff λ ⁣(π1(A))=0\lambda\!\left(\pi^{-1}(A)\right)=0.

This contracting defines a new space of extensive quantities (W,E,λ)\left(W^\perp,E^\perp,\lambda^\perp\right), called a contraction of the original space of extensive quantities (W,E,λ)\left(W,E,\lambda\right).

Definition. The contractive slicings of a space of extensive quantities (W,E,λ)\left(W,E,\lambda\right) defined by a contracting (π,λ)\left(\pi,\lambda^\perp\right) of it is a family of slicings eW(We,λe)\bigsqcup_{e\in W^\perp}\left(W^\parallel_e,\lambda^\parallel_e\right), where

  • Weπ1(e)W^\parallel_e\coloneqq\pi^{-1}(e) is the preimage of {e}\left\{e\right\} under π\pi, an affine subspace of WW; and
  • λe:σ ⁣(Ee)[0,+]\lambda_e^\parallel:\sigma\!\left(E_e^\parallel\right)\to\left[0,+\infty\right] is a Borel measure; the family of measures is the disintegration of λ\lambda w.r.t. π\pi and λ\lambda^\perp.

Definition. A contraction of a thermal system (E,M)\left(\mathcal E,\mathcal M\right) defined by the contracting (π,λ)\left(\pi,\lambda^\perp\right) of E\mathcal E is a new thermal system (E,M)\left(\mathcal E^\perp,\mathcal M^\perp\right) constructed as such:

  • E(W,E,λ)\mathcal E^\perp\coloneqq\left(W^\perp,E^\perp,\lambda^\perp\right) is the contraction of E\mathcal E corresponding to the given contracting;
  • MeEMe\mathcal M^\perp\coloneqq\bigsqcup_{e\in E^\perp}M_e^\perp, where for each eEe\in E^\perp, MeMeM_e^\perp\coloneqq\mathcal M_e^\parallel; the family of systems (Ee,Me)\left(\mathcal E_e^\parallel,\mathcal M_e^\parallel\right) (labeled by eEe\in E^\perp) are slices of (E,M)\left(\mathcal E,\mathcal M\right) corresponding to the contractive slicings of E\mathcal E defined by the contracting (π,λ)\left(\pi,\lambda^\perp\right); the measure equipped on Me\mathcal M_e^\parallel is the measure of number of microstates of (Ee,Me)\left(\mathcal E_e^\parallel,\mathcal M_e^\parallel\right).

In some cases, the total number of microstates in Me\mathcal M^\parallel_e is not finite for some ee, then the contraction is not defined in this case.

Example. For the thermal system of a solid consisting of spin-12\frac12 particles, define a constracting (π,λ)\left(\pi,\lambda^\perp\right) by

π ⁣(U,N)N,λ ⁣(A)cardA.\begin{align*} \pi\!\left(U,N\right)&\coloneqq N,\\ \lambda^\perp\!\left(A\right)&\coloneqq\operatorname{card}A. \end{align*}

Then the corresponding contraction of the thermal system may be written as a thermal system ((W,E,λ),eEMe)\left(\left(W,E,\lambda\right),\bigsqcup_{e\in E}M_e\right), where

WR,EZ+,λ ⁣(A)cardA,MN{0,1}N,μN ⁣(A)cardA.\begin{align*} W&\coloneqq\mathbb R,\\ E&\coloneqq\mathbb Z^+,\\ \lambda\!\left(A\right)&\coloneqq\operatorname{card}A,\\ M_N&\coloneqq\left\{0,1\right\}^N,\\ \mu_N\!\left(A\right)&\coloneqq\operatorname{card}A. \end{align*}


Different from a slice of a system, a contraction of a system does not have the problem about the dimension (“dimension” as in “dimensional analysis”) of the measure on the space of extensive quantities. Although the dimension of λ\lambda^\perp is different from λ\lambda, the dimension of μe\mu^\perp_e (the measure on MeM^\perp_e) is also different from μ\mu, and they change together in such a way that the resultant μ\mu^\perp (the measure of number of microstates on M\mathcal M^\perp) has the same dimension as μ\mu.

This fact actually hints us that a contraction of a thermal system is essentially the same as the original thermal system in such a sense that the microstates in the two systems are naturally one-to-one connected. Indeed, the natural bijection from M\mathcal M to M\mathcal M^\perp is given by (e,m)(π(e),(e,m))\left(e,m\right)\mapsto\left(\pi(e),\left(e,m\right)\right). It is obvious that for any measurable function ff on M\mathcal M^\perp we have

(e,m)Mf ⁣(π(e),(e,m))dμ(e,m)=(e,m)Mf ⁣(e,m)dμ(e,m).\int_{\left(e,m\right)\in\mathcal M}f\!\left(\pi(e),(e,m)\right)\mathrm d\mu(e,m) =\int_{\left(e,m\right)\in\mathcal M^\perp}f\!\left(e,m\right)\mathrm d\mu^\perp(e,m).

Using this map, we can pull back any function ff^\perp on M\mathcal M^\perp to become a function on M\mathcal M by

f ⁣(e,m)f ⁣(π(e),(e,m))f\!\left(e,m\right)\coloneqq f^\perp\!\left(\pi(e),\left(e,m\right)\right)

and the other way around. I want to call ff the contractional pullback of ff^\perp under π\pi and call ff^\perp the contractional pushforward of ff under π\pi. Specially, we may pull back any state pp^\perp of a contraction to become a state pp on the original thermal system. We will see that pullbacks of states are rather useful.


Obviously, the family of affine subspaces {We}eW\left\{W^\parallel_e\right\}_{e\in W^\perp} are parallel to each other. Therefore, their associated vector subspaces are the same vector subspace W\vec W^\parallel of W\vec W, which is a complement of the vector subspace W\vec W^\perp, the vector space that WW^\perp is associated with. We can write

W=W+W,W=W+W.\vec W=\vec W^\perp+\vec W^\parallel,\quad W=W^\perp+\vec W^\parallel.

Each point in WW can be written in the form of e+se+s, where eWe\in W^\perp and sWs\in\vec W^\parallel. Furthermore, for any eWe\in W^\perp, the map se+ss\mapsto e+s is a bijection from W\vec W^\parallel to WeW^\parallel_e. This bijection can then push forward linear operations from W\vec W^\parallel to WeW^\parallel_e. For example, we can define the action of some continuous linear functional iWi\in\vec W^{\parallel\prime} on a point eWee'\in W^\parallel_e as

i ⁣(e)i ⁣(eπ ⁣(e)),i\!\left(e'\right)\coloneqq i\!\left(e'-\pi\!\left(e'\right)\right), (7)(7)

where π ⁣(e)\pi\!\left(e'\right) is just ee.

However, we need to remember that there is no generally physically meaningful linear structure on WeW^\parallel_e. The linear structure that we have constructed is just for convenience in notations.


An interesting fact about slicing, isolating, and contracting is that: an isolation of a contraction is a contraction of a slice.

Suppose we have a thermal system (E,M)\left(\mathcal E,\mathcal M\right), and by a contracting (π,λ)\left(\pi,\lambda^\perp\right) we derive its contraction (E,M)\left(\mathcal E^\perp,\mathcal M^\perp\right).

Now, consider one of its contractive slices (Ee,Me)\left(\mathcal E^\parallel_{e^\circ},\mathcal M^\parallel_{e^\circ}\right), where eEe^\circ\in E^\perp. Then, we contract this slice by the contracting (π,λ)\left(\pi,\lambda^{\perp\prime}\right), where π\pi is the same π\pi as used above but whose domain is restricted to WeW^\parallel_{e^\circ}, and λ\lambda^{\perp\prime} is the counting measure. Because the whole WeW^\parallel_{e^\circ} is mapped to ee^\circ under π\pi, the contraction becomes an isolated system whose only possible value of extensive quantities is ee^\circ. Its spaces of microstates consist of only one measure space, which is Me\mathcal M^\parallel_{e^\circ}.

On the other hand, consider isolating (E,M)\left(\mathcal E^\perp,\mathcal M^\perp\right) at ee^\circ. Its isolation at ee^\circ is an isolated system whose only possible value of extensive quantities is ee^\circ. Its spaces of microstates consist of only one measure space, which is MeM^\perp_{e^\circ}, which is the same as Me\mathcal M^\parallel_{e^\circ}.

Therefore, an isolation of a contraction is a contraction of a slice.

This fact is useful because it enables us to find the equilibrium state of a slice. Using microcanonical ensemble, we can already find the equilibrium state of any isolated system, so we can find the equilibrium state of an isolation of a contraction. Then, it is the equilibrium state of a contraction of a slice. Then, by the contractional pullback, it is the equilibrium state of a slice.

Thermal contact

Composite systems are systems that are composed of other systems. This is a useful concept because it allows us to treat multiple systems as a whole. The motivation of develop this concept is that we should use it to derive the canonical ensemble and the grand canonical ensemble. In those ensembles, the system is not isolated but in contact with a bath. To consider them as a whole system, we need to define composite systems.

The simplest case of a composite system is where the subsystems are independent of each other. Physically, this means that the subsystems do not have any thermodynamic contact between each other. I would like to call the simplest case a product thermal system just as how mathematicians name their product spaces constructed out of existing spaces.

Definition. The product space of extensive quantities of two spaces of extensive quantities (W(1),E(1),λ(1))\left(W^{(1)},E^{(1)},\lambda^{(1)}\right) and (W(2),E(2),λ(2))\left(W^{(2)},E^{(2)},\lambda^{(2)}\right) is a space of extensive quantities (W,E,λ)\left(W,E,\lambda\right) constructed as such:

  • WW(1)×W(2)W\coloneqq W^{(1)}\times W^{(2)} is the product affine space of W(1)W^{(1)} and W(2)W^{(2)};
  • EE(1)×E(2)E\coloneqq E^{(1)}\times E^{(2)} is the product topological space as well as the product measure space of E(1)E^{(1)} and E(2)E^{(2)}; and
  • λ\lambda is the product measure of λ(1)\lambda^{(1)} and λ(2)\lambda^{(2)}, whose uniqueness is guaranteed by the σ-finiteness of λ(1)\lambda^{(1)} and λ(2)\lambda^{(2)}.

Definition. The product thermal system of two thermal systems (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) and (E(2),M(2))\left(\mathcal E^{(2)},\mathcal M^{(2)}\right) is a thermal system (E,M)\left(\mathcal E,\mathcal M\right) constructed as such:

  • E(W,E,λ)\mathcal E\coloneqq\left(W,E,\lambda\right) is the product space of extensive quantities of E(1)\mathcal E^{(1)} and E(2)\mathcal E^{(2)}; and
  • M(e1,e2)EMe1,e2\mathcal M\coloneqq\bigsqcup_{(e_1,e_2)\in E}M_{e_1,e_2}, where Me1,e2Me1(1)×Me2(2)M_{e_1,e_2}\coloneqq M^{(1)}_{e_1}\times M^{(2)}_{e_2} is the product measure space of Me1(1)M^{(1)}_{e_1} and Me2(2)M^{(2)}_{e_2}, equipped with measure μe1,e2\mu_{e_1,e_2}, the product measure of μe1(1)\mu^{(1)}_{e_1} and μe2(2)\mu^{(2)}_{e_2}.

By this definition, M\mathcal M is naturally identified with M(1)×M(2)\mathcal M^{(1)}\times\mathcal M^{(2)}, and the measure of number of microstates μ\mu on M\mathcal M is in this sense the same as the product measure of μ(1)\mu^{(1)} and μ(2)\mu^{(2)} (the measures of number of microstates on M(1)\mathcal M^{(1)} and M(2)\mathcal M^{(2)}). We can project elements in M\mathcal M back into M(1)\mathcal M^{(1)} and M(2)\mathcal M^{(2)} by the map (e1,e2,m1,m2)(e1,m1)(e_1,e_2,m_1,m_2)\mapsto(e_1,m_1) and the map (e1,e2,m1,m2)(e2,m2)(e_1,e_2,m_1,m_2)\mapsto(e_2,m_2).

This hints us that a probability distribution on M\mathcal M (which may be given by a state pp of (E,M)(\mathcal E,\mathcal M)) can be viewed as a joint probability distribution of the two random variables on M\mathcal M: (e1,e2,m1,m2)(e1,m1)(e_1,e_2,m_1,m_2)\mapsto(e_1,m_1) and (e1,e2,m1,m2)(e2,m2)(e_1,e_2,m_1,m_2)\mapsto(e_2,m_2). As we all know, a joint distribution encodes conditional distributions and marginal distributions. Therefore, given any state of a product thermal system, we can define its conditional states and marginal states of the subsystems. Conditional states are not very useful because they are not physically observed states of subsystems. The physically observed states of subsystems are marginal states, so marginal states are of special interest.

Definition. Given a state pp of the product thermal system (E,M)(\mathcal E,\mathcal M) of (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) and (E(2),M(2))\left(\mathcal E^{(2)},\mathcal M^{(2)}\right), its marginal state of the subsystem (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) is a state p(1)p^{(1)} of the system (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) defined by

p(1) ⁣(e1,m1)(e2,m2)M(2)p ⁣(e1,e2,m1,m2)dμ(2) ⁣(e2,m2).p^{(1)}\!\left(e_1,m_1\right)\coloneqq\int_{\left(e_2,m_2\right)\in\mathcal M^{(2)}} p\!\left(e_1,e_2,m_1,m_2\right)\mathrm d\mu^{(2)}\!\left(e_2,m_2\right).

Physically, if a product thermal system is in equilibrium, then each of its subsystems is in equilibrium as well. Therefore, if pp^\circ is an equilibrium state of the product thermal system, then the marginal states of pp^\circ are equilibrium states of the subsystems.


Now, we need to consider how to describe the thermodynamic contact between subsystems. In the simplest case, where there is no thermodynamic contact between subsystems, the composite system is just the product thermal system of the subsystems, and the dimension of its space of extensive quantities is the sum of the that of the subsystems’. If there is some thermal contact between subsystems, then the dimension of the space of extensive quantities of the composite system will be less than that of the product thermal system. For example, if the subsystems are allowed to exchange energy, then two original extensive quantities (the energy of the first subsystem and that of the second subsystem) will be replaced by a single extensive quantity (the total energy of the composite system). Such a reduction in the dimension of the space of extensive quantities is the same as contracting that we defined above. Therefore, we can define a thermally composite system as a contraction of the product thermal system. Denote the projection map of the contracting as π:WW:(e1,e2)e\pi:W\to W^\perp:(e_1,e_2)\mapsto e. (From now on in this section, composite systems refer to thermally composite system. I will introduce non-thermally composite systems later (in part 2), which describe non-thermal contacts between subsystems and are more complicated.)

Besides being the contraction of the product thermal system, there is an additional requirement. Given the extensive quantities of the composite system and those of one of the subsystems, we should be able to deduce those of the other subsystem. For example, if the subsystems are allowed to exchange energy, then the total energy of the composite system minus the energy of one of the subsystems should be the energy of the other subsystem, which is uniquely determined (if this is an allowed energy). Mathematically, thie means that for any e1W(1)e_1\in W^{(1)} and e2W(2)e_2\in W^{(2)}, the two maps π ⁣(e1,)\pi\!\left(e_1,\cdot\right) and π ⁣(,e2)\pi\!\left(\cdot,e_2\right) are both injections.

Definition. A (thermally) composite thermal system of two thermal systems is the contraction of their product thermal system corresponding to a contracting (π,λ)(\pi,\lambda^\perp), where π:WW:(e1,e2)e\pi:W\to W^\perp:(e_1,e_2)\mapsto e satisfies that for any e1W(1)e_1\in W^{(1)} and e2W(2)e_2\in W^{(2)}, the two maps π ⁣(e1,)\pi\!\left(e_1,\cdot\right) and π ⁣(,e2)\pi\!\left(\cdot,e_2\right) are both injections.

We may define projection maps to get the extensive quantities of the subsystems from those of the composite system:

c(1):WW(1):(e1,e2)e1,c(2):WW(2):(e1,e2)e2.c^{(1)}:W\to W^{(1)}:(e_1,e_2)\mapsto e_1,\quad c^{(2)}:W\to W^{(2)}:(e_1,e_2)\mapsto e_2.

Then, for each eWe\in W^\perp, the two spaces

We(1)c(1) ⁣(We),We(2)c(2) ⁣(We)W^{\parallel(1)}_e\coloneqq c^{(1)}\!\left(W_e^\parallel\right),\quad W^{\parallel(2)}_e\coloneqq c^{(2)}\!\left(W_e^\parallel\right)

are respectively affine subspaces of W(1)W^{(1)} and W(2)W^{(2)}, where Weπ1 ⁣(e)W_e^\parallel\coloneqq\pi^{-1}\!\left(e\right). The two affine subspaces are actually isomorphic to each other because of our additional requirement on the projection map π\pi. Because π ⁣(e1,)\pi\!\left(e_1,\cdot\right) is an injection, for any e1We(1)e_1\in W^{\parallel(1)}_e there is a unique e2We(2)e_2\in W^{\parallel(2)}_e such that π ⁣(e1,e2)=e\pi\!\left(e_1,e_2\right)=e, and vice versa. This gives a correspondence between the two affine subspaces. In other words, for each eWe\in W^\perp, there is a unique bijection ρe:We(1)We(2)\rho_e:W^{\parallel(1)}_e\to W^{\parallel(2)}_e such that

e1We(1):π ⁣(e1,e2)=ee2=ρe ⁣(e1).\forall e_1\in W^{\parallel(1)}_e: \pi\!\left(e_1,e_2\right)=e\Leftrightarrow e_2=\rho_e\!\left(e_1\right). (8)(8)

The bijection ρe\rho_e is an affine isomorphism from We(1)W^{\parallel(1)}_e to We(2)W^{\parallel(2)}_e.

What is more, c(1)c^{(1)} is an affine isomorphism from WeW^{\parallel}_e to We(1)W^{\parallel(1)}_e, and c(2)c^{(2)} is an affine isomorphism from WeW^{\parallel}_e to We(2)W^{\parallel(2)}_e. The three affine spaces We,We(1),We(2)W^{\parallel}_e,W^{\parallel(1)}_e,W^{\parallel(2)}_e are then mutually isomorphic.

Example. Suppose we have two thermal systems, each of them have two extensive quantities called the energy and the number of particles. We write them as (U1,N1)\left(U_1,N_1\right) and (U2,N2)\left(U_2,N_2\right). They are in thermal contact so that they can exchange energy but not particles. Then, the extensive quantities of the composite system may be written as (U/2,U/2,N1,N2)\left(U/2,U/2,N_1,N_2\right), with π:(U1,U2)(U/2,U/2)\pi:\left(U_1,U_2\right)\mapsto\left(U/2,U/2\right) defined as

π ⁣(U1,U2)(U1+U22,U1+U22).\pi\!\left(U_1,U_2\right)\coloneqq\left(\frac{U_1+U_2}2,\frac{U_1+U_2}2\right).

The isomorphism ρU/2,U/2,N1,N2\rho_{U/2,U/2,N_1,N_2} is then

ρU/2,U/2,N1,N2 ⁣(U1,N1)=(UU1,N2).\rho_{U/2,U/2,N_1,N_2}\!\left(U_1,N_1\right)=\left(U-U_1,N_2\right).

The contracting is not unique. For example, (U1,U2)(3U/4,U/4)\left(U_1,U_2\right)\mapsto\left(3U/4,U/4\right) is another valid projection for constructing the composite thermal system, and it has exactly the same physical meaning as the one I constructed above.


The isomorphism from WeW^{\parallel}_e can push forward the measure λe\lambda^\parallel_e on EeE^\parallel_e to a new measure λe(1)\lambda^{\parallel(1)}_e on Ee(1)E^{\parallel(1)}_e. Then, (We(1),λe(1))\left(W^{\parallel(1)}_e,\lambda^{\parallel(1)}_e\right) is a slicing of (W(1),E(1),λ(1))\left(W^{(1)},E^{(1)},\lambda^{(1)}\right), and we can get a slice (Ee(1),Me(1))\left(\mathcal E^{\parallel(1)}_e,\mathcal M^{\parallel(1)}_e\right) of (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) out of this slicing. I would like to call this slice the compositing slice of (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) at ee. Similarly, we define compositing slices of (E(2),M(2))\left(\mathcal E^{(2)},\mathcal M^{(2)}\right), denoted as (Ee(2),Me(2))\left(\mathcal E^{\parallel(2)}_e,\mathcal M^{\parallel(2)}_e\right).

Similarly to how we can define marginal states of subsystems of a product thermal system, we can define marginal states of the compositing slices given a state of a contractive slice of the composite system. However, this time, there is a key difference: the subsystems (compositing slices) have isomorphic and completely dependent (deterministic) extensive quantities instead of having completely independent extensive quantities. Taken this into account, we can define marginal states of compositing slices as follows:

p(1) ⁣(e1,m1)m2Mρe(e1)(2)p ⁣(e1,ρe(e1),m1,m2)dμρe(e1)(2) ⁣(m2),p^{\parallel(1)}\!\left(e_1,m_1\right) \coloneqq\int_{m_2\in M^{(2)}_{\rho_e(e_1)}}p^\parallel\!\left(e_1,\rho_e(e_1),m_1,m_2\right) \mathrm d\mu^{(2)}_{\rho_e(e_1)}\!\left(m_2\right), (9)(9)

where p(1)p^{\parallel(1)} is a state of (Ee(1),Me(1))\left(\mathcal E^{\parallel(1)}_e,\mathcal M^{\parallel(1)}_e\right), and pp^\parallel is a state of (Ee,Me)\left(\mathcal E^{\parallel}_e,\mathcal M^{\parallel}_e\right) (a contractive slice of the composite system).


There is an additional property that ρe\rho_e has.

As we all know, an affine map is a linear map combined with a translation:

ρe ⁣(e1)=ρ ⁣(e1e0)+ρe ⁣(e0),\rho_e\!\left(e_1\right)=\vec\rho\!\left(e_1-e_0\right)+\rho_e\!\left(e_0\right), (10)(10)

where e0e_0 is a fixed point in We(1)W^{\parallel(1)}_e, and ρ:We(1)We(2)\vec\rho:\vec W^{\parallel(1)}_e\to \vec W^{\parallel(2)}_e is a linear map that is independent of the choice of e0e_0. Because ρe\rho_e is a bijection, ρ\vec\rho is also a bijection, and is thus a linear isomorphism from We(1)\vec W^{\parallel(1)}_e to We(2)\vec W^{\parallel(2)}_e.

Because different slices We(1)W^{\parallel(1)}_e with different ee are parallel to each other, actually We(1)\vec W^{\parallel(1)}_e is the same vector subspace of W(1)\vec W^{(1)} for any eWe\in W^\perp. We can write it as W(1)\vec W^{\parallel(1)}. Similarly, We(2)\vec W^{\parallel(2)}_e is the same vector subspace W(2)\vec W^{\parallel(2)} of W(2)\vec W^{(2)} for any eWe\in W^\perp. Therefore, we can say ρ\vec\rho is a linear isomorphism from W(1)\vec W^{\parallel(1)} to W(2)\vec W^{\parallel(2)}.

Then, here is the interesting claim:

Theorem. The linear map ρ\vec\rho defined above is independent of the choice of ee.

Proof

Proof. Because π\pi is an affine map, we have

π ⁣(e1,e2)=π ⁣(e1e0,e2ρe ⁣(e0))+π ⁣(e0,ρe ⁣(e0)),\pi\!\left(e_1,e_2\right) =\vec\pi\!\left(e_1-e_0,e_2-\rho_e\!\left(e_0\right)\right)+\pi\!\left(e_0,\rho_e\!\left(e_0\right)\right),

where eWe\in W^\perp is fixed, e0We(1)e_0\in W^{\parallel(1)}_e is also fixed, and π:WW\vec\pi:\vec W\to\vec W^\perp is a linear map that is independent of the choice of ee and e0e_0.

Let e2ρe ⁣(e1)e_2\coloneqq\rho_e\!\left(e_1\right) in the equation above, and we have

π ⁣(e1,ρe ⁣(e1))=π ⁣(e1e0,ρe ⁣(e1)ρe ⁣(e0))+π ⁣(e0,ρe ⁣(e0)).\pi\!\left(e_1,\rho_e\!\left(e_1\right)\right) =\vec\pi\!\left(e_1-e_0,\rho_e\!\left(e_1\right)-\rho_e\!\left(e_0\right)\right) +\pi\!\left(e_0,\rho_e\!\left(e_0\right)\right).

According to Equation 8 and 10, we have

e=π ⁣(e1e0,ρ ⁣(e1e0))+e.e=\vec\pi\!\left(e_1-e_0,\vec\rho\!\left(e_1-e_0\right)\right)+e.

In other words,

π ⁣(s1,ρ ⁣(s1))=0,\vec\pi\!\left(s_1,\vec\rho\!\left(s_1\right)\right)=0, (11)(11)

where s1W(1)s_1\in\vec W^{\parallel(1)} is an arbitrary vector.

Prove by contradition. Assume that ρ\vec\rho is dependent on the choice of ee, then there exists two choices of ee such that we have two different ρ\vec\rho’s, denoted as ρ\vec\rho and ρ\vec\rho'. Because they are different maps, there exists an s1W(1)s_1\in\vec W^{\parallel(1)} such that ρ(s1)ρ(s1)\vec\rho(s_1)\ne\vec\rho'(s_1).

On the other hand, we have

π ⁣(s1,ρ ⁣(s1))=0,π ⁣(s1,ρ ⁣(s1))=0.\vec\pi\!\left(s_1,\vec\rho\!\left(s_1\right)\right)=0,\quad \vec\pi\!\left(s_1,\vec\rho'\!\left(s_1\right)\right)=0.

Subtract the two equations, and because of the linearity of ll, we have

π ⁣(0,δ)=0,\vec\pi\!\left(0,\delta\right)=0,

where δρ(s1)ρ(s1)\delta\coloneqq\vec\rho(s_1)-\vec\rho'(s_1) is a nonzero vector. Then, we have

π ⁣(e1,e2+δ)π ⁣(e1,e2)=π(0,δ)=0,\pi\!\left(e_1,e_2+\delta\right)-\pi\!\left(e_1,e_2\right)=\vec\pi(0,\delta)=0,

which contradicts with the requirement that π ⁣(e1,)\pi\!\left(e_1,\cdot\right) is injective. \square

Besides, because ρ\vec\rho is a linear isomorphism from W(1)\vec W^{\parallel(1)} to W(2)\vec W^{\parallel(2)}, the map i1i1ρ1i_1\mapsto i_1\circ\vec\rho^{-1} is a linear isomorphism from W(1)\vec W^{\parallel(1)\prime} to W(2)\vec W^{\parallel(2)\prime}. The inverse of this isomorphism is i2i2ρi_2\mapsto i_2\circ\vec\rho.

As we know, i1i_1 and i2i_2 are actually intensive quantities. The physical meaning of them being each other’s image/preimage under this isomorphism is that, if the two subsystems in thermal contact have intensive quantities i1-i_1 and i2i_2 respectively, then they are in equilibrium with each other. Therefore, I would like to call this pair of intensive quantities to be anticonsistent.


Since we have a family of slices called the compositing slices of a subsystem, can we make them the contractive slices of some contracting of the subsystem? Well, it depends. The first difficulty is that We(1)W^{\parallel(1)}_e may be the same subspace of W(1)W^{(1)} for different eWe\in W^\perp and thus make Ee(1)E^{\parallel(1)}_e equipped with possibly different measures.

Anyway, ignore this at this stage. Let me first construct a subspace W(1)W^{\perp(1)} and a projection π(1):W(1)W(1)\pi^{(1)}:W^{(1)}\to W^{\perp(1)} so that We(1)W^{\parallel(1)}_e are preimages of points in W(1)W^{\perp(1)}, and then see what will happen.

Since any vector subspace has a complement, we can pick a subspace of W(1)\vec W^{(1)} that is a complement of W(1)\vec W^{\parallel(1)} and call it W(1)\vec W^{\perp(1)}. Any vector in W(1)\vec W^{(1)} can be uniquely decomposed into the sum of a vector in W(1)\vec W^{\perp(1)} and a vector in W(1)\vec W^{\parallel(1)}.

Then, we pick some fixed e1W(1)e_1\in W^{(1)}, and it can be used to generate an affine subspace W(1)e1+W(1)W^{\perp(1)}\coloneqq e_1+\vec W^{\perp(1)} of W(1)W^{(1)}. Then, each point in W(1)W^{(1)} can be uniquely decomposed into the sum of a point in W(1)W^{\perp(1)} and a vector in W(1)\vec W^{\parallel(1)}. Such unique decompositions can be encoded into a projection map π(1):W(1)W(1)\pi^{(1)}:W^{(1)}\to W^{\perp(1)}.

It seems that we are now halfway to the construction of our contracting. However, before we proceed, I would like to prove a property of W(1)W^{\perp(1)} we construct:

Theorem. The map π\pi is an affine isomorphism from the product affine space W(1)×W(2)W^{\perp(1)}\times W^{(2)} to WW^\perp.

Proof

Proof. The map π\pi is itself affine, so we just need to prove that it is injective and surjective.

To prove it is injective, suppose that we have two points (e1,e2)(e_1,e_2) and (e1,e2)(e_1',e_2') in W(1)×W(2)W^{\perp(1)}\times W^{(2)}, such that

π ⁣(e1,e2)=π ⁣(e1,e2)=:e.\pi\!\left(e_1,e_2\right)=\pi\!\left(e_1',e_2'\right)=:e.

Then, we have

(e1,e2),(e1,e2)We.\left(e_1,e_2\right),\left(e_1',e_2'\right)\in W^\parallel_e.

Therefore, e1,e1We(1)e_1,e_1'\in W^{\parallel(1)}_e, so

e1e1W(1).e_1-e_1'\in\vec W^{\parallel(1)}.

On the other hand, because e1,e1W(1)e_1,e_1'\in W^{\perp(1)}, we have

e1e1W(1).e_1-e_1'\in\vec W^{\perp(1)}.

Because W(1)\vec W^{\perp(1)} is a complement of W(1)\vec W^{\parallel(1)}, the only possible case is that e1=e1e_1=e_1'. Then, due to π ⁣(e1,)\pi\!\left(e_1,\cdot\right) being injective, e2=e2e_2=e_2'. Therefore, (e1,e2)=(e1,e2)\left(e_1,e_2\right)=\left(e_1',e_2'\right). Therefore, π\pi is injective if its domain is restricted to W(1)×W(2)W^{\perp(1)}\times W^{(2)}.

To prove it is surjective, suppose eWe\in W^\perp. Because π\pi is surjective from WW to WW^\perp, there exists some (e1,e2)W\left(e_1',e_2'\right)\in W such that

π ⁣(e1,e2)=e.\pi\!\left(e_1',e_2'\right)=e.

According to Equation 8, this is equivalently

e2=ρe ⁣(e1).e_2'=\rho_e\!\left(e_1'\right).

We can uniquely decompose e1W(1)e_1'\in W^{(1)} into the sum of a point e1W(1)e_1\in W^{\perp(1)} and a vector δW(1)\delta\in\vec W^{\parallel(1)}. Then, according to Equation 10, we have

e2=ρe ⁣(e1+δ)=ρe ⁣(e1)+ρ ⁣(δ).e_2'=\rho_e\!\left(e_1+\delta\right)=\rho_e\!\left(e_1\right)+\vec\rho\!\left(\delta\right).

Thus e2e2ρ ⁣(δ)=ρe ⁣(e1)e_2\coloneqq e_2'-\vec\rho\!\left(\delta\right)=\rho_e\!\left(e_1\right). According to Equation 8, this is equivalently

π ⁣(e1,e2)=e.\pi\!\left(e_1,e_2\right)=e.

Therefore, (e1,e2)W(1)×W(2)\left(e_1,e_2\right)\in W^{\perp(1)}\times W^{(2)} is the desired point in W(1)×W(2)W^{\perp(1)}\times W^{(2)} that is mapped to ee under π\pi. Therefore, π\pi is surjective if its domain is restricted to W(1)×W(2)W^{\perp(1)}\times W^{(2)}. \square

Then, it seems that if we need a measure on E(1)E^{\perp(1)} that is consistent with our theory, the product measure of it and that on E(2)E^{(2)} should be equal to that on EE^\perp. However, it is not always possible to find such a measure. This is our second difficulty.

Therefore, in order to construct a contracting, we need to following assumptions:

  • For different eEe\in E^\perp, λe(1)\lambda^{\parallel(1)}_e is the same measure whenever We(1)W^{\parallel(1)}_e is the same subspace.
  • There exists a measure λ(1)\lambda^{\perp(1)} on E(1)E^{\perp(1)} so that λ\lambda^\perp is the pushforward of the product measure of λ(1)\lambda^{\perp(1)} and λ(2)\lambda^{(2)} under π\pi.

Given those assumptions, if we define λe1(1)\lambda^{\parallel(1)\prime}_{e_1} to be the measures from the disintegration of λ(1)\lambda^{(1)} w.r.t. π(1)\pi^{(1)} and λ(1)\lambda^{\perp(1)} (just the way we constructed the measures in constructive slicings), then we can verify that they are actually the same as λe(1)\lambda^{\parallel(1)}_e defined before, for any ee in the image of π ⁣(e1,)\pi\!\left(e_1,\cdot\right). You can verify this easily by the following check (not a rigorous proof), where \otimes denotes product measures or integration:

λ=λ{λe}=λ(1)λ(2){λe}.\lambda=\lambda^{\perp}\otimes\left\{\lambda^\parallel_e\right\} =\lambda^{\perp(1)}\otimes\lambda^{(2)}\otimes\left\{\lambda^\parallel_e\right\}.

On the other hand,

λ=λ(1)λ(2)=λ(1){λe1(1)}λ(2).\lambda=\lambda^{(1)}\otimes\lambda^{(2)} =\lambda^{\perp(1)}\otimes\left\{\lambda^{\parallel(1)\prime}_{e_1}\right\}\otimes\lambda^{(2)}.

Comparing them, we have

{λe1(1)}={λe}={λe(1)}.\left\{\lambda^{\parallel(1)\prime}_{e_1}\right\}=\left\{\lambda^\parallel_e\right\} =\left\{\lambda^{\parallel(1)}_e\right\}.

An explicit verification is more tedious and is omitted here.

Those assumptions are very strong, so we do not want to assume them. Without those assumptions, we still have a well-constructed W(1)W^{\perp(1)} and π(1)\pi^{(1)} so that We(1)W^{\parallel(1)}_e are preimages of points in W(1)W^{\perp(1)} under π\pi. Then, we can use similar tricks as Equation 7 to define the action of any continuous linear functional i1W(1)i_1\in\vec W^{\parallel(1)\prime} on a point e1W(1)e_1\in W^{(1)} as

i1 ⁣(e1)i1 ⁣(e1π(1) ⁣(e1)).i_1\!\left(e_1\right)\coloneqq i_1\!\left(e_1-\pi^{(1)}\!\left(e_1\right)\right).

We can also do the same thing on W(2)W^{(2)}. Then, an interesting thing to notice is that if we have e1W(1)e_1\in W^{(1)} and e2W(2)e_2\in W^{(2)} such that

eπ ⁣(e1,e2)=π ⁣(π(1) ⁣(e1),π(2) ⁣(e2)),e\coloneqq\pi\!\left(e_1,e_2\right) =\pi\!\left(\pi^{(1)}\!\left(e_1\right),\pi^{(2)}\!\left(e_2\right)\right),

then we have

i1 ⁣(e1)=i2 ⁣(e2),i_1\!\left(e_1\right)=i_2\!\left(e_2\right),

where i1W(1)i_1\in\vec W^{\parallel(1)\prime} and i2W(2)i_2\in\vec W^{\parallel(2)\prime} are anticonsistent to each other.

Example. In the example of two thermal systems that can exchange energy but not number of particles, we may choose

π(1) ⁣(U1,N1)(0,N1),π(2) ⁣(U2,N2)(0,N2).\pi^{(1)}\!\left(U_1,N_1\right)\coloneqq\left(0,N_1\right),\quad \pi^{(2)}\!\left(U_2,N_2\right)\coloneqq\left(0,N_2\right).

Such projections are not unique, but this is the simplest one and also the most natural one considering their physical meanings.


We have newly defined some vector spaces. There are interesting relations between them:

Theorem.

Wπ ⁣(W(1)+W(2))=π ⁣(W(1))=π ⁣(W(2)).\vec W^{\perp\parallel}\coloneqq\vec\pi\!\left(\vec W^{\parallel(1)}+\vec W^{\parallel(2)}\right) =\vec\pi\!\left(\vec W^{\parallel(1)}\right)=\vec\pi\!\left(\vec W^{\parallel(2)}\right).

Proof

Proof. Obviously π ⁣(W(2))π ⁣(W(1)×W(2))\vec\pi\!\left(\vec W^{\parallel(2)}\right)\subseteq \vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right), so we just need to prove that π ⁣(W(1)×W(2))π ⁣(W(2))\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right) \subseteq\vec\pi\!\left(\vec W^{\parallel(2)}\right). To prove this, we just need to prove that for any

sπ ⁣(s1,s2)π ⁣(W(1)×W(2)),s\coloneqq\vec\pi\!\left(s_1,s_2\right)\in\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right),

where s1W(1)s_1\in\vec W^{\parallel(1)} and s2W(2)s_2\in\vec W^{\parallel(2)}, we have sπ ⁣(W(2))s\in\vec\pi\!\left(\vec W^{\parallel(2)}\right). To prove this, subtract Equation 11 from the definition of ss, and we have

s=π ⁣(0,s2ρ ⁣(s1))π ⁣(W(2)).s=\vec\pi\!\left(0,s_2-\vec\rho\!\left(s_1\right)\right)\in\vec\pi\!\left(\vec W^{\parallel(2)}\right).

Therefore, π ⁣(W(1)×W(2))π ⁣(W(2))\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right) \subseteq\vec\pi\!\left(\vec W^{\parallel(2)}\right). Similarly, π ⁣(W(1)×W(2))π ⁣(W(1))\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right) \subseteq\vec\pi\!\left(\vec W^{\parallel(1)}\right). Therefore, we proved the theorem. \square

Here we defined a new vector space W\vec W^{\perp\parallel}. Obviously it is a subspace of W\vec W^\perp. Because π(s1,)\vec\pi(s_1,\cdot) and π(,s2)\vec\pi(\cdot,s_2) are injective, π\vec\pi is a linear isomorphism from W(1)\vec W^{\parallel(1)} to W\vec W^{\perp\parallel} and a linear isomorphism from W(2)\vec W^{\parallel(2)} to W\vec W^{\perp\parallel}.

Here is another interesting thing about this vector space:

Theorem. Suppose e,eWe,e'\in W^\perp. Iff We(1)=We(1)W^{\parallel(1)}_e=W^{\parallel(1)}_{e'} and We(2)=We(2)W^{\parallel(2)}_e=W^{\parallel(2)}_{e'}, then eeWe'-e\in\vec W^{\perp\parallel}.

Proof

Proof. First, prove the “if” direction.

Because We(1)=We(1)W^{\parallel(1)}_e=W^{\parallel(1)}_{e'}, we have c(1) ⁣(π1 ⁣(e))=c(1) ⁣(π1 ⁣(e))c^{(1)}\!\left(\pi^{-1}\!\left(e\right)\right)=c^{(1)}\!\left(\pi^{-1}\!\left(e'\right)\right). In other words,

xπ1(e):s2W(2):x+(0,s2)π1(e).\forall x\in\pi^{-1}(e):\exists s_2\in\vec W^{(2)}:x+\left(0,s_2\right)\in\pi^{-1}(e').

Equivalently, this means

π(x)=es2W(2):π ⁣(x+(0,s2))=e.\pi(x)=e\Rightarrow\exists s_2\in\vec W^{(2)}:\pi\!\left(x+\left(0,s_2\right)\right)=e'.

Note that π ⁣(x+(0,s2))=π(x)+π ⁣(0,s2)\pi\!\left(x+\left(0,s_2\right)\right)=\pi(x)+\vec\pi\!\left(0,s_2\right), which is just e+π ⁣(0,s2)e+\vec\pi\!\left(0,s_2\right), and we have

s2W(2):ee=π ⁣(0,s2).\exists s_2\in\vec W^{(2)}:e'-e=\vec\pi\!\left(0,s_2\right).

Similarly,

s1W(1):ee=π ⁣(s1,0).\exists s_1\in\vec W^{(1)}:e'-e=\vec\pi\!\left(s_1,0\right).

Subtract the two equations, and we have

0=π ⁣(s1,s2),0=\vec\pi\!\left(s_1,-s_2\right),

which means

(s1,s2)π1(0)=W.\left(s_1,-s_2\right)\in\vec\pi^{-1}(0)=\vec W^\parallel.

Therefore,

s1c(1) ⁣(W)=W(1).s_1\in c^{(1)}\!\left(\vec W^\parallel\right)=\vec W^{\parallel(1)}.

Therefore,

ee=π ⁣(s1,0)π ⁣(W(1))=W.e'-e=\vec\pi\!\left(s_1,0\right)\in\vec\pi\!\left(\vec W^{\parallel(1)}\right) =\vec W^{\perp\parallel}.

Now, prove the “only if” direction.

Because eeW=π ⁣(W(2))e'-e\in\vec W^{\perp\parallel}=\vec\pi\!\left(\vec W^{\parallel(2)}\right), there exists s2W(2)s_2\in\vec W^{\parallel(2)} such that

e=e+π ⁣(0,s2).e'=e+\vec\pi\!\left(0,s_2\right).

Therefore, obviously we have c(1) ⁣(π1 ⁣(e))=c(1) ⁣(π1 ⁣(e))c^{(1)}\!\left(\pi^{-1}\!\left(e\right)\right)=c^{(1)}\!\left(\pi^{-1}\!\left(e'\right)\right), and thus We(1)=We(1)W^{\parallel(1)}_e=W^{\parallel(1)}_{e'}.

Similarly, we can prove that We(2)=We(2)W^{\parallel(2)}_e=W^{\parallel(2)}_{e'}. \square

This means that, given both We(1)W^{\parallel(1)}_e and We(2)W^{\parallel(2)}_e, we can determine ee upto a vector in W\vec W^{\perp\parallel}.

Because we already have W\vec W^{\perp\parallel}, we can define a new affine subspace Wπ ⁣(W(1)×W(2))W^{\perp\perp}\coloneqq\pi\!\left(W^{\perp(1)}\times W^{\perp(2)}\right) so that W=W+WW^\perp=W^{\perp\perp}+\vec W^{\perp\parallel}, and each point in WW^\perp can be uniquely decomposed as a sum of a point in WW^{\perp\perp} and a vector in W\vec W^{\perp\parallel}. We can prove this easily. Such decomposition can be encoded into a projection π:WW\pi^\perp:W^\perp\to W^{\perp\perp} so that for any eWe\in W^\perp, we have eπ(e)We-\pi^\perp(e)\in\vec W^{\perp\parallel}. Also, we can easily prove that π\pi is an affine isomorphism from W(1)×W(2)W^{\perp(1)}\times W^{\perp(2)} to WW^{\perp\perp}.

Now that we have defined many affine spaces and vector spaces, here is a diagram of the relation between (some of) them (powered by quiver):

Diagrarm

Example. In the example of two thermal systems that can exchange energy but not number of particles, we may have

π ⁣(U2,U2,N1,N2)=(0,0,N1,N2).\pi^\perp\!\left(\frac U2,\frac U2,N_1,N_2\right)=\left(0,0,N_1,N_2\right).

Baths

Baths are a special class of thermal systems. They are systems that have some of their intensive quantities well-defined and constant.

According to Equation 6, to make the intensive quantities constant, lnΩ(e)\ln\Omega(e) should be linear in ee. If we just require some of the intensive quantities to be constant, we need to make it be linear when ee moves in directions in some certain vector subspace.

The requirement above is required by the microcanonical ensemble, which does not involve change in extensive quantities. An intuitive requirement is that λ\lambda is also translationally invariant in such directions.

Then, here comes the definition of a bath:

Definition. A thermal system (E,M)(\mathcal E,\mathcal M) is called a (W,i)\left(\vec W^\parallel,i\right)-bath, where E=(W,E,λ)\mathcal E=(W,E,\lambda) and M=eWMe\mathcal M=\bigsqcup_{e\in W}M_e, if

  • W\vec W^\parallel is a vector subspace of W\vec W and is a Polish reflexive space;
  • For any eEe\in E and sWs\in\vec W^\parallel, e+sEe+s\in E.
  • λ\lambda is invariant under translations in W\vec W^\parallel; in other words, for any sWs\in\vec W^\parallel and Aσ(E)A\in\sigma(E), we have λ(A+s)=λ(A)\lambda(A+s)=\lambda(A);
  • iWi\in\vec W^{\parallel\prime} is a continuous linear functional on W\vec W^\parallel, called the constant intensive quantities of the bath; and
  • For any eEe\in E and sWs\in\vec W^\parallel,

lnμe+s ⁣(Me+s)=i(s)+lnμe ⁣(Me).\ln\mu_{e+s}\!\left(M_{e+s}\right)=i(s)+\ln\mu_e\!\left(M_e\right).


An important notice is that W\vec W^\parallel must be finite-dimensional because a metrizable TVS with a non-trivial σ-finite translationally quasi-invariant Borel measure must be finite-dimensional (Feldman, 1966).

We can then define the non-trivial σ-finite translationally invariant Borel measure on W\vec W^\parallel, denoted as λ\lambda^\parallel. It is unique up to a positive constant factor.


We may construct an affine subspace WW^\perp for the bath so that every point in WW can be uniquely decomposed into the sum of a point in WW^\perp and a vector in W\vec W^\parallel. Then, we have a projection map π:WW\pi:W\to W^\perp so that for any eWe\in W we have eπ(e)We-\pi(e)\in\vec W^\parallel. Then, obviously, μe ⁣(Me)\mu_e\!\left(M_e\right) must be in the form

μe ⁣(Me)=f ⁣(π(e))ei(eπ(e)),\mu_e\!\left(M_e\right)=f\!\left(\pi(e)\right)\mathrm e^{i(e-\pi(e))}, (12)(12)

where f:WR+f:W^\perp\to\mathbb R^+ is some function. The eplicit formula of ff is f(e)μe ⁣(Me)f(e)\coloneqq\mu_e\!\left(M_e\right).

Further, we may require that WW^\perp is associated with a topological complement of W\vec W^\parallel (this is because W\vec W is locally convex and Hausdorff and W\vec W^\parallel is finite-dimensional). Then, by the mathematical tools that were introduced in the beginning, we can disintegrate the measure λ\lambda w.r.t. λ\lambda^\parallel to get a measure λ\lambda^\perp on WW^\perp (it is the same for any element in W\vec W^\parallel because λ\lambda is W\vec W^\parallel-translationally invariant). Then, λ\lambda is the product measure of λ\lambda^\perp and λ\lambda^\parallel. In other words, for any measurable function f:ERf:E\to\mathbb R, we have

Efdλ=eEsWf ⁣(e+s)dλ ⁣(e)dλ ⁣(s).\int_Ef\,\mathrm d\lambda= \int_{e\in E^\perp}\int_{s\in\vec W^\parallel}f\!\left(e+s\right) \mathrm d\lambda^\perp\!\left(e\right)\mathrm d\lambda^\parallel\!\left(s\right).

Thermal ensembles

Different from microcanonical ensembles, thermal ensembles are ensembles where the system we study is in thermal contact with a bath. For example, canonical ensembles and grand canonical ensembles are thermal ensembles. There are also non-thermal ensembles, which will be introduced later after we introduce non-thermal contacts (in part 2).

The thermal ensemble of a thermal system is the ensemble of the composite system of the system in question (subsystem 1) and a (W(2),iρ1)\left(\vec W^{\parallel(2)},-i\circ\vec\rho^{-1}\right)-bath (subsystem 2), where iW(1)i\in\vec W^{\parallel(1)\prime} is a parameter, with an extra requirement:

s2W(2),Aσ(E):λ ⁣(π ⁣(A+s2))=λ ⁣(π ⁣(A)).\forall s_2\in\vec W^{\parallel(2)},A\in\sigma(E): \lambda^\perp\!\left(\pi\!\left(A+s_2\right)\right)=\lambda^\perp\!\left(\pi\!\left(A\right)\right). (13)(13)

The physical meaning of ii is the intensive variables that the system is fixed at by contacting the bath.

This composite system is called the composite system for the W(1)\vec W^{\parallel(1)}-ensemble. It is called that because we will see that the only important thing that distinguishes different thermal ensembles is the choice of W(1)\vec W^{\parallel(1)}, and the choices of π,λ,W(1),W(2)\pi,\lambda^\perp,W^{\perp(1)},W^{\perp(2)} are not important.

Definition. The composite system for the W(1)\vec W^{\parallel(1)}-ensemble of the system (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) is the composite system of (E(1),M(1))\left(\mathcal E^{(1)},\mathcal M^{(1)}\right) and (E(2),M(2))\left(\mathcal E^{(2)},\mathcal M^{(2)}\right), where

  • (E(2),M(2))\left(\mathcal E^{(2)},\mathcal M^{(2)}\right) is a (W(2),iρ1)\left(\vec W^{\parallel(2)},-i\circ\vec\rho^{-1}\right)-bath, where iW(1)i\in\vec W^{\parallel(1)\prime} is a parameter called the fixed intensive quantities;
  • Equation 13 holds.

From the properties of a bath, we can derive a useful property of λe(1)\lambda^{\parallel(1)}_e.

Because λe(1)\lambda^{\parallel(1)}_e is the pullback of λe(2)\lambda^{\parallel(2)}_e under ρe\rho_e, but λe(2)\lambda^{\parallel(2)}_e is just the same λ(2)\lambda^{\parallel(2)} for all ee (although λe(2)\lambda^{\parallel(2)}_e is defined on We(2)W^{\parallel(2)}_e but λ(2)\lambda^{\parallel(2)} is defined on W(2)\vec W^{\parallel(2)}), we have λe(1)\lambda^{\parallel(1)}_e is the same as long as We(1)W^{\parallel(1)}_e is the same. This means that we are able to be consistent with different compositing slices of our subsystem.


As we have claimed before, the isolation of a contraction is the same as the full contraction of a contractive slice. Therefore, we can use the microcanonical ensemble to find the equilibrium state of any contractive slice. Then, we can use the marginal state of each contractive slice to get the equilibrium state of each compositing slice in the subsystem.

Because the equal a priori probability postulate, the equilibrium state pep^{\parallel\circ}_e on the contractive slice (Ee,Me)\left(\mathcal E^\parallel_e,\mathcal M^\parallel_e\right) is

pe ⁣(e1,e2,m1,m2)=1μe ⁣(Me)1,p^{\parallel\circ}_e\!\left(e_1,e_2,m_1,m_2\right) =\frac1{\mu^\parallel_e\!\left(\mathcal M^\parallel_e\right)}\propto1,

where μe\mu^\parallel_e is the measure of the number of microstates on Me\mathcal M^\parallel_e. Here \propto means that the factor is only related to ee. We just need “\propto” instead of “==” because we can always normalize a probability density function.

Substitute this into Equation 9, and we get that the equilibrium state pe(1)p^{\parallel\circ(1)}_e on the compositing slice (Ee(1),Me(1))\left(\mathcal E^{\parallel(1)}_e,\mathcal M^{\parallel(1)}_e\right) is

pe(1) ⁣(e1,m1)μρe(e1)(2) ⁣(Mρe(e1)(2))=f ⁣(π(2) ⁣(ρe ⁣(e1)))e(iρ1)(ρe(e1)π(2)(ρe(e1)))ei(e1).\begin{align*} p^{\parallel\circ(1)}_e\!\left(e_1,m_1\right) &\propto\mu^{(2)}_{\rho_e(e_1)}\!\left(M^{(2)}_{\rho_e(e_1)}\right) \nonumber\\ &=f\!\left(\pi^{(2)}\!\left(\rho_e\!\left(e_1\right)\right)\right) \mathrm e^{\left(-i\circ\vec\rho^{-1}\right)\left(\rho_e(e_1)-\pi^{(2)}(\rho_e(e_1))\right)} \nonumber\\ &\propto\mathrm e^{-i(e_1)}. \end{align*} (14)(14)

Here we utilized Equation 12 and the fact that for any e1We(1)e_1\in W^{\parallel(1)}_e, π(2) ⁣(ρe(e1))=π(2) ⁣(We(2))\pi^{(2)}\!\left(\rho_e(e_1)\right)=\pi^{(2)}\!\left(W^{\parallel(2)}_e\right) is the same and is only related to ee. Note that we have already illustrated that λe(1)\lambda^{\parallel(1)}_e is the same as long as We(1)W^{\parallel(1)}_e is the same, so we can normalize pe(1)p^{\parallel\circ(1)}_e to get the same state as long as We(1)W^{\parallel(1)}_e is the same, avoiding any inconsistency.

Before we proceed to normalize pe(1)p^{\parallel\circ(1)}_e, I would like to talk about what is just enough information to determine λe(1)\lambda^{\parallel(1)}_e. First, we need to know how different ee can still make We(1)W^{\parallel(1)}_e the same. We already know that WW^\perp is just W+WW^{\perp\perp}+\vec W^{\perp\parallel}, and the component in W\vec W^{\perp\parallel} does not affect We(1)W^{\parallel(1)}_e and We(2)W^{\parallel(2)}_e, so we only need to know no more than π(e)\pi^\perp(e). Then, because WW^{\perp\perp} is isomorphic to W(1)×W(2)W^{\perp(1)}\times W^{\perp(2)} but the corresponding change in W(2)W^{\perp(2)} does not affect We(1)W^{\parallel(1)}_e, we only need to know the component π(1) ⁣(e1)=π(1) ⁣(π1(e))\pi^{(1)}\!\left(e_1\right)=\pi^{(1)}\!\left(\pi^{-1}(e)\right), where e1e_1 is just the e1e_1 in Equation 14. The space We(1)W^{\parallel(1)}_e is just π(1)1 ⁣(e1)\pi^{(1)-1}\!\left(e_1\right).

Besides these information (components of ee) is useless, there is other useless information. I have previously mentioned that the choices of λ\lambda^\perp, λ(2)\lambda^{\perp(2)} etc. are also irrelevant. We can see this by noting that λ(1)\lambda^{\parallel(1)} is always the non-trivial translationally invariant σ-finite Borel measure on We(1)W^{\parallel(1)}_e, which is unique up to a constant postive factor (and exists because it is finite-dimensional). This is not related to the choices of λ\lambda^\perp, λ(2)\lambda^{\perp(2)} etc. By this, we reduced the only thing that we need to care about into three ones λ(1)\lambda^{(1)}, λ(1)\lambda^{\perp(1)}, and λ(1)\lambda^{\parallel(1)}, and their relation is given by the following:

E(1)fdλ(1)=e1E(1)dλ(1) ⁣(e1)s1Ee1(1)f ⁣(e1+s1)dλ(1) ⁣(s1),\int_{E^{(1)}}f\,\mathrm d\lambda^{(1)}= \int_{e_1\in E^{\perp(1)}}\mathrm d\lambda^{\perp(1)}\!\left(e_1\right) \int_{s_1\in\vec E^{\parallel(1)}_{e_1}} f\!\left(e_1+s_1\right)\mathrm d\lambda^{\parallel(1)}\!\left(s_1\right),

where E(1)π(1) ⁣(E(1))E^{\perp(1)}\coloneqq\pi^{(1)}\!\left(E^{(1)}\right) and Ee1(1)(E(1)e1)W(1)\vec E^{\parallel(1)}_{e_1}\coloneqq\left(E^{(1)}-e_1\right)\cap\vec W^{\parallel(1)} is the region of s1W(1)s_1\in\vec W^{\parallel(1)} in which e1+s1e_1+s_1 is in E(1)E^{(1)}.

Next, what we need to do is to normalize Equation 14. The denominator in the normalization factor, which we could call the partition function Z:e1E(1)Ie1(1)RZ:\bigsqcup_{e_1\in E^{\perp(1)}}I^{(1)}_{e_1}\to\mathbb R, is

Z ⁣(e1,i)s1Ee1(1)m1Me1+s1(1)ei(s1)dλ(1) ⁣(s1)dμe1+s1(1) ⁣(m1)=s1Ee1(1)Ω(1) ⁣(e1+s1)ei(s1)dλ(1) ⁣(s1),\begin{align*} Z\!\left(e_1,i\right)&\coloneqq\int_{s_1\in\vec E^{\parallel(1)}_{e_1}} \int_{m_1\in M^{(1)}_{e_1+s_1}} \mathrm e^{-i\left(s_1\right)}\,\mathrm d\lambda^{\parallel(1)}\!\left(s_1\right) \mathrm d\mu^{(1)}_{e_1+s_1}\!\left(m_1\right)\\ &=\int_{s_1\in\vec E^{\parallel(1)}_{e_1}} \Omega^{(1)}\!\left(e_1+s_1\right) \mathrm e^{-i\left(s_1\right)}\,\mathrm d\lambda^{\parallel(1)}\!\left(s_1\right), \end{align*}

where Ie1W(1)I_{e_1}\subseteq\vec W^{\parallel(1)\prime} is the region of ii in which the integral converges. It is possible that Ie1=I_{e_1}=\varnothing for all e1E(1)e_1\in E^{\perp(1)}, and in this case the thermal ensemble is not defined.


Because we have got rid of arguments about the bath and the composite system, we can now define the partition function without the “(1)(1)” superscript:

Z ⁣(e,i)=sEeΩ ⁣(e+s)ei(s)dλ ⁣(s),eE,iIeW.Z\!\left(e,i\right)=\int_{s\in\vec E^{\parallel}_e} \Omega\!\left(e+s\right) \mathrm e^{-i\left(s\right)}\,\mathrm d\lambda^{\parallel}\!\left(s\right),\quad e\in E^\perp,\quad i\in I_e\subseteq\vec W^{\parallel\prime}.

By looking at the definition, we may see that the partition function is just the partial Laplace transform of Ω\Omega.

Note that the partition function is unique only up to a positive constant factor because we can choose another λ\lambda^\parallel by multiplying a positive constant factor.

The partition function has very good properties.

Theorem. For any eEe\in E^\perp, IeI_e is convex.

Proof

Proof. Suppose i,iIei,i'\in I_e. The functional iii'-i defines a hyperplane HKer ⁣(ii)H\coloneqq\operatorname{Ker}\!\left(i'-i\right). The hyperplane separate W\vec W^\parallel into two half-spaces H+H^+ and HH^- defined as

H±{sW|i ⁣(s)i ⁣(s)0}.H^\pm\coloneqq\left\{s\in\vec W^\parallel\,\middle|\,i'\!\left(s\right)-i\!\left(s\right)\gtrless0\right\}.

By definition, Z ⁣(e,i)Z\!\left(e,i\right) and Z ⁣[e,i]Z\!\left[e,i'\right] both converge. Let t[0,1]t\in\left[0,1\right], and we have

Z ⁣(e,i+t(ii))=(sEeH++sEeH)Ω ⁣(e+s)ei(s)t(i(s)i(s))dλ ⁣(s)sEeH+Ω ⁣(e+s)ei(s)dλ ⁣(s)+sEeHΩ ⁣(e+s)ei(s)dλ ⁣(s)<.\begin{align*} Z\!\left(e,i+t\left(i'-i\right)\right) &=\left(\int_{s\in\vec E^{\parallel}_e\cap H^+}+\int_{s\in\vec E^{\parallel}_e\cap H^-}\right) \Omega\!\left(e+s\right) \mathrm e^{-i(s)-t(i'(s)-i(s))}\,\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &\le\int_{s\in\vec E^{\parallel}_e\cap H^+}\Omega\!\left(e+s\right) \mathrm e^{-i(s)}\,\mathrm d\lambda^{\parallel}\!\left(s\right) +\int_{s\in\vec E^{\parallel}_e\cap H^-}\Omega\!\left(e+s\right) \mathrm e^{-i'(s)}\,\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &<\infty. \end{align*}

Therefore, Z ⁣[e,i+t(ii)]Z\!\left[e,i+t\left(i'-i\right)\right] converges. \square

Being convex is good because it means that IeI_e is not too shattered. It is connected, and its interior IntIe\operatorname{Int}I_e and closure ClIe\operatorname{Cl}I_e look very much like IeI_e itself. Also, every point in IeI_e is a limit point of IeI_e. This makes it possible to talk about the limits and derivatives of Z ⁣(e,i)Z\!\left(e,i\right) w.r.t. ii.

Since IeI_e is a region in a finite-dimensional space W\vec W^{\parallel\prime}, we may define the derivatives w.r.t. ii in terms of partial derivatives to components of ii. To define the components of ii, we need first a basis on W\vec W^\parallel, which sets a coordinate system although actually we should finally derive coordinate-independent conclusions.

Suppose we have a basis on W\vec W^\parallel. Then, for any sWs\in\vec W^\parallel, we can write its components as ss_\bullet, and for any iWi\in\vec W^{\parallel\prime}, we can write its components as ii_\bullet. The subscript “\bullet” here can act as dummy indices (for multi-index notation). For example, we can write i(s)=isi(s)=i_\bullet s_\bullet. I do not use superscript and subscript to distinguish vectors and linear functionals because it is just for multi-index notation and because I am going to use them to label multi-index objects that are neither vectors nor linear functionals.

Theorem. For any eEe\in E^\perp, Z ⁣(e,i)Z\!\left(e,i\right) is CC^\infty w.r.t. ii on IntIe\operatorname{Int}I_e.

Proof

Proof. By the definition of the interior of a region, for any iIntIei\in\operatorname{Int}I_e and any pWp\in\vec W^{\parallel\prime}, there exists δi,p>0\delta_{i,p}>0 such that i+δi,ppIei+\delta_{i,p}p\in I_e.

By Leibniz’s integral rule, the partial derivatives of Z ⁣(e,i)Z\!\left(e,i\right) w.r.t. ii (if existing) are given by

ΣαZ ⁣(e,i)αi=sEeΩ ⁣(e+s)(s)αei(s)dλ ⁣(s)sEeΩ ⁣(e+s)sαei(s)dλ ⁣(s)\begin{align*} \frac{\partial^{\Sigma\alpha_\bullet}Z\!\left(e,i\right)}{\partial^{\alpha_\bullet}i_\bullet} &=\int_{s\in\vec E^{\parallel}_e} \Omega\!\left(e+s\right)\left(-s_\bullet\right)^{\alpha_\bullet} \mathrm e^{-i\left(s\right)}\,\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &\le\int_{s\in\vec E^{\parallel}_e} \Omega\!\left(e+s\right)\left|s_\bullet\right|^{\alpha_\bullet} \mathrm e^{-i\left(s\right)}\,\mathrm d\lambda^{\parallel}\!\left(s\right) \end{align*}

where α\alpha_\bullet is some natural numbers indexed by \bullet. Now we just need to prove that this integral converges for any iIntIei\in\operatorname{Int}I_e.

Because of the inequality

alnxbxa(lnab1),a,b,x>0,a\ln x-bx\le a\left(\ln\frac ab-1\right),\quad a,b,x>0,

where the equality holds when x=a/bx=a/b, we have

sα(αeb)αebΣs,b>0\left|s_\bullet\right|^{\alpha_\bullet} \le\left(\frac{\alpha_\bullet}{\mathrm eb}\right)^{\alpha_\bullet}\mathrm e^{b\Sigma\left|s_\bullet\right|}, \quad b>0

There are 2dimW2^{\dim\vec W^\parallel} orthants in W\vec W^\parallel. We can label each of them by a string σ\sigma_\bullet of ±1\pm1 of length dimW\dim\vec W^\parallel. Then, each orthant can be denoted as OσO_\sigma. Then, we have

sOσ:σs=Σs.\forall s\in O_\sigma:\sigma_\bullet s_\bullet=\Sigma\left|s_\bullet\right|.

Therefore,

sOσ:sα(αeb)αebσs,b>0.\forall s\in O_\sigma:\left|s_\bullet\right|^{\alpha_\bullet} \le\left(\frac{\alpha_\bullet}{\mathrm eb}\right)^{\alpha_\bullet}\mathrm e^{b\sigma_\bullet s_\bullet}, \quad b>0.

Let bδi,σb\coloneqq\delta_{i,-\sigma}, where σ:sσs\sigma:s\mapsto\sigma_\bullet s_\bullet is a linear functional. Then,

sOσ:sαei(s)(αeδi,σ)αe(iδi,σσ)(s).\forall s\in O_\sigma:\left|s_\bullet\right|^{\alpha_\bullet}\mathrm e^{-i(s)} \le\left(\frac{\alpha_\bullet}{\mathrm e\delta_{i,-\sigma}}\right)^{\alpha_\bullet} \mathrm e^{-\left(i-\delta_{i,-\sigma}\sigma\right)(s)}.

Because iδi,σσIei-\delta_{i,-\sigma}\sigma\in I_e, we have

ΣαZ ⁣(e,i)αiσ(αeδi,σ)αsEeOσΩ ⁣(e+s)e(iδi,σσ)(s)dλ ⁣(s)<.\frac{\partial^{\Sigma\alpha_\bullet}Z\!\left(e,i\right)}{\partial^{\alpha_\bullet}i_\bullet} \le\sum_\sigma\left(\frac{\alpha_\bullet}{\mathrm e\delta_{i,-\sigma}}\right)^{\alpha_\bullet} \int_{s\in\vec E^{\parallel}_e\cap O_\sigma}\Omega\!\left(e+s\right) \mathrm e^{-\left(i-\delta_{i,-\sigma}\sigma\right)(s)}\, \mathrm d\lambda^{\parallel}\!\left(s\right)<\infty.

Therefore, the partial derivatives exist. \square


The next step is to find the macroscopic quantities. The equilibrium states are

pe ⁣(e,m)=ei(e)Z ⁣(π(e),i).p_e^{\parallel\circ}\!\left(e,m\right) =\frac{\mathrm e^{-i\left(e\right)}}{Z\!\left(\pi(e),i\right)}.

where ZZ is the partition function. Here the role of ee becomes the label parameter in Equation 3. The measured value of extensive quantities under equilibrium is then

ε=1Z ⁣(e,i)sEe(e+s)ei(s)Ω ⁣(e+s)dλ ⁣(s)=e+1Z ⁣(e,i)sEesei(s)Ω ⁣(e+s)dλ ⁣(s)=e+lnZ ⁣(e,i)i.\begin{align*} \varepsilon^\circ &=\frac1{Z\!\left(e,i\right)}\int_{s\in\vec E^{\parallel}_e} \left(e+s\right)\mathrm e^{-i\left(s\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &=e+\frac1{Z\!\left(e,i\right)}\int_{s\in\vec E^{\parallel}_e} s\mathrm e^{-i\left(s\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &=e+\frac{\partial\ln Z\!\left(e,i\right)}{\partial i}. \end{align*}

The entropy under equilibrium is then

S=sEeei(s)Z ⁣(e,i)lnei(s)Z ⁣(e,i)Ω ⁣(e+s)dλ ⁣(s)=1Z ⁣(e,i)sEei ⁣(s)ei(s)Ω ⁣(e+s)dλ ⁣(s)+lnZ ⁣(e,i)=i ⁣(lnZ ⁣(e,i)i)+lnZ ⁣(e,i).\begin{align*} S^\circ &=\int_{s\in\vec E^{\parallel}_e} \frac{\mathrm e^{-i(s)}}{Z\!\left(e,i\right)}\ln\frac{\mathrm e^{-i(s)}}{Z\!\left(e,i\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &=-\frac1{Z\!\left(e,i\right)}\int_{s\in\vec E^{\parallel}_e} i\!\left(s\right)\mathrm e^{-i\left(s\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right) +\ln Z\!\left(e,i\right)\\ &=-i\!\left(\frac{\partial\ln Z\!\left(e,i\right)}{\partial i}\right)+\ln Z\!\left(e,i\right). \end{align*}

By this two equations, we can eliminate the parameter ee and get the fundamental equation in the form of Equation 4:

S=i ⁣(ε)+lnZ ⁣(π ⁣(ε),i).S^\circ=i\!\left(\varepsilon^\circ\right)+\ln Z\!\left(\pi\!\left(\varepsilon^\circ\right),i\right).

We can see that SS^\circ decouples into two terms, one of which is only related to the W\vec W^\parallel component of ε\varepsilon^\circ, and the other of which is only related to the WW^\perp component of ε\varepsilon^\circ. What is good is that we have a good notion of derivative w.r.t. the first term, and it is ii. Therefore, the intensive quantities corresponding to change of extensive quantities in the subspace W\vec W^\parallel is well defined and is constant ii, which is just what we have been calling the fixed intensive quantities. The other components of the intensive quantities are not guaranteed to be well-defined because Z ⁣(,i)Z\!\left(\cdot,i\right) is not guaranteed to have good enough properties.


This articled is continued in part 2.