Jekyll2023-06-03T09:37:04-07:00https://ulysseszh.github.io/feed.xmlUlysses’ tripHere we are at the awesome (awful) blog written by UlyssesZhan!UlyssesZhanThe distribution when indistinguishable balls are put into boxes2023-05-09T12:19:26-07:002023-05-09T12:19:26-07:00https://ulysseszh.github.io/math/2023/05/09/stars-bars-geometric

If there are 200 typographical errors randomly distributed in a 500 page manuscript, find the probability that a given page contains exactly 3 errors.

We can abstract this type of problems as follows:

Suppose there are $n$ distinguishable boxes and $k$ indistinguishable balls. Now, we randomly put the balls into the boxes. For each of the boxes, what is the probability that it contains $m$ balls?

For example, if the first page contains 3 errors, the second page contains 197 errors, and the rest of the pages contain no errors, then the situation corresponds to the situation where the first box contains 3 balls, the second box contains 197 balls, and the rest of the boxes contain no balls. The balls are indistinguishable because we can only determine how many errors are on each page but not which errors are on the page.

To deal with the problem, we simply need to find these two numbers:

• the number of ways to put $k$ indistinguishable balls into $n$ distinguishable boxes, and
• the number of ways to put $k-m$ indistinguishable balls into $n-1$ distinguishable boxes.

The latter corresponds to the number of ways to put the balls into the boxes provided that we already know that the given box contains $m$ balls. After we find these two numbers, their ratio is the probability in question.

To find the number of ways to put $k$ indistinguishable balls into $n$ distinguishable boxes, we can use the stars and bars method. To see this, we write a special example. Here is an example of $n=4$ and $k=6$:

${}|{}\star{}\star{}|{}\star{}|{}\star{}\star{}\star{},$

which corresponds to the distribution $0,2,1,3$. We can see that there are $n-1$ bars and $k$ stars. Therefore, the number of ways to put the balls is the same as the number of ways to choose the $k$ positions of the stars among $n+k-1$ positions. Therefore, the number of ways is

$N_{n,k}=\binom{n+k-1}{k}=\frac{\left(n+k-1\right)!}{k!\left(n-1\right)!}.$

Therefore, the final probability of the given box containing $m$ balls is

$P_{n,k}(m)=\frac{N_{n-1,k-m}}{N_{n,k}} =\frac{\left(n-1\right)k!\left(n+k-m-2\right)!}{\left(k-m\right)!\left(n+k-1\right)!}.$

Another easy way to derive this result is by using the generating function. The number $N_{n,k}$ is just the coefficient of $x^k$ in the expansion of the generating function $\left(1+x+x^2+\cdots\right)^n$. The generating function is just $\left(1-x\right)^{-n}$, which can be easily expanded by using the binomial theorem.

We are now interested in the limit $n,k\to\infty$ with $\lambda:=k/n$ fixed. By Stirling’s approximation, we have

$P_{n,k}(m)\sim\left(n-1\right) \frac{k^{k+1/2}\left(n+k-m-2\right)^{n+k-m-2+1/2}}{\left(k-m\right)^{k-m+1/2}\left(n+k-1\right)^{n+k-1+1/2} } \mathrm e^{k-m+n+k-1-k-n-k+m+2}.$

The $1/2$’s in the exponents can just be dropped because you may find that if we extract the $1/2$’s, the factor tends to unity. The exponential is just constant $\mathrm e$. Therefore, we have

\begin{align*} P_{n,k}(m)&\sim\left(n-1\right) \frac{\left(\lambda n\right)^{\lambda n}\left(n+\lambda n-m-2\right)^{n+\lambda n-m-2} } {\left(\lambda n-m\right)^{\lambda n-m}\left(n+\lambda n-1\right)^{n+\lambda n-1}}\mathrm e\\ &=\left(\tfrac{n+\lambda n-m-2}{n+\lambda n-1}\right)^n \left(\tfrac{\left(n+\lambda n-m-2\right)\lambda n}{\left(\lambda n-m\right)\left(n+\lambda n-1\right)}\right)^{\lambda n} \left(\tfrac{\lambda n-m}{n+\lambda n-m-2}\right)^m \tfrac{\left(n-1\right)\left(n+\lambda n-1\right)}{\left(n+\lambda n-m-2\right)^2}\mathrm e\\ &\to\mathrm e^{-\frac{m+1}{\lambda+1}}\,\mathrm e^m\, \mathrm e^{-\frac{m+1}{\lambda+1}\lambda}\left(\tfrac\lambda{\lambda+1}\right)^m\tfrac1{\lambda+1}\mathrm e\\ &=\left(\tfrac\lambda{\lambda+1}\right)^m\tfrac1{\lambda+1}. \end{align*}

This is just the geometric distribution with parameter $p=1/(\lambda+1)=n/(k+n)$.

If you want to simulate the number of balls in a box, here is a simple way to do this. First, because each box is the same, we can just focus on the first box without loss of generality. Then, we just need to randomly generate the positions of the $n-1$ bars among the $n+k-1$ positions, and then return the index of the first bar (which is the number of balls in the first box).

We can then write the following Ruby code to simulate the number of balls in the first box:

1
2
3
def simulate n, k
(n-1).times.inject(npkm1 = n+k-1) { |bar, i| [rand(npkm1 - i), bar].min }
end


Compare the simulated result with the theoretical result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def frequency m, n, k, trials
trials.times.count { simulate(n, k) == m } / trials.to_f
end

def truth m, n, k
(n-1) * (k-m+1..k).reduce(1,:*) / (n+k-m-1..n+k-1).reduce(1,:*).to_f
end

def approx m, n, k
n*k**m / ((n+k)**(m+1)).to_f
end

srand 1108
m, n, k = 3, 5000, 8000
p frequency m, n, k, 10000 # => 0.0902
p truth m, n, k # => 0.08965012972626446
p approx m, n, k # => 0.08963271594131858

]]>
UlyssesZhan
Labeled break, next, and redo in Ruby2023-05-07T13:36:16-07:002023-05-07T13:36:16-07:00https://ulysseszh.github.io/programming/2023/05/07/ruby-jump-labelMany languages support breaking out of nested loops. There are some typical ways of doing this:

• Some languages can name loops by providing a label for the loop. In those languages, you can use break together with a label to specify which loop to break out of. Examples: Perl, Java, JavaScript, and some others.
• Some languages can specify the number of layers of loops to break out of. In those languages, you can use break together with a number to specify how many layers of loops to break out of. The only example that I know is C#.
• Some languages have goto statements. You can easily break from loops to wherever you want by using goto (actually breaking out of nested loops is among the only recommended cases for using goto). Examples: C, C++.

However, in most other languages, it is not easy to break out of nested loops. A typical solution is this:

1
2
3
4
5
6
7
8
9
10
outer_loop do
break_outer = false
inner_loop do
if condition
break_outer = true
break
end
end
break if break_outer
end


In languages with exceptions, another possible workaround is to use exceptions (the catch–throw control flow):

1
2
3
4
5
6
7
catch :outer_loop do
outer_loop do
inner_loop do
throw :outer_loop if condition
end
end
end


I wrote a simple module to better use this workaround.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class JumpLabel < StandardError
{break: true, next: true, redo: false}.each do |reason, has_args|
define_method reason do |*args|
@reason = reason
@arg = args.size == 1 ? args.first : args if has_args
raise self
end
end
end

class Module
def register_label *method_names
method_names.each do |name|
old = instance_method name
define_method name do |*args, **opts, &block|
return old.bind_call self, *args, **opts unless block
old.bind_call self, *args, **opts do |*bargs, **bopts, &bblock|
block.call *bargs, **bopts, jump_label: label = JumpLabel.new, &bblock
rescue JumpLabel => catched_label
raise catched_label unless catched_label == label
case label.reason
when :break then break label.arg
when :next then next label.arg
when :redo then redo
end
end
end
end
end
end


Example usage:

1
2
3
4
5
6
7
8
9
Integer.register_label :upto, :downto
1.upto 520 do |i, jump_label:|
print i
1.downto -1314 do |j|
print j
jump_label.break 8 if j == 0
end
end.tap { puts _1 }
# => 1108

]]>
UlyssesZhan

## The general model

The model is as follows. There are $n$ agents (nations), they can trade some type of good, and they use the same currency. Every agent may produce or consume the good. The benefit function of the $j$th agent is $B_j$, and the cost function is $C_j$. The amount of export from the $j$th agent to the $k$th agent is $T_{j,k}$. The amount of trade cost by the $j$th agent is $S_j$. Now, we want to find the amount $Q_j$ that every agent produce and the amount $T_{j,k}$ that every agent import from other agents. Assume that $S$ is only related to $T$ and does not depend on $Q$. Also, assume that there is no externality (i.e. whenever $j\ne k$, $\partial_kB_j=0$ and $\partial_kC_j=0$). Also, assume that every agent is rational and with perfect information.

Now, consider the profit $\Pi_j$ of the $j$th agent. Subtract the cost from the benefit, and we have

$\textstyle \Pi_j=B_j\!\left(Q_j+\sum_kT_{j,k}\right)-C_j\!\left(Q_j\right)-S_j\!\left(T\right).$

According to the fundamental theorem of welfare economics, $T$ and $Q$ is Pareto optimal under market equilibrium. We assume that this case happens at the stationary point of the social benefit, and the social benefit is the sum of the profit of every agent. We can then get the equations

\begin{align*} &0=\frac{\partial}{\partial Q_l}\sum_j\Pi_j =B_l'\!\left(Q_l+\sum_kT_{l,k}\right)-C_l'\!\left(Q_l\right),\quad\forall l;\\ &0=\frac{\partial}{\partial T_{l,k}}\sum_j\Pi_j =B_l'\!\left(Q_l+\sum_kT_{l,k}\right)-B_m'\!\left(Q_m+\sum_kT_{m,k}\right) -\sum_j\frac{\partial S_j}{\partial T_{l,m}}\!\left(T\right),\quad\forall l<m. \end{align*}

Here are $n+\frac{n\left(n-1\right)}2$ equations, and exactly $Q$ and $T$ have $n+\frac{n\left(n-1\right)}2$ degrees of freedom in total (note that $T$ is anti-symmetric). In principle, we are able to solve $Q$ and $T$.

For the case where there is no trade cost, we can see that the domestic prices are all equal, and the price may be called the world price.

However, given $S=0$, the equations above are not independent. Actually, there are only $2n-1$ independent equations (all $2n$ components of $B’$ and $C’$ are equal). This means that, for $n>2$, the free trade with zero trade cost is an indeterminate system.

This phenomenon looks counter-intuitive, but it is actually understandable: under zero trade cost, every two agents may trade arbitrary amount of goods under the same world price, this provides extra degrees of freedom to the model. To be specific, if $(Q,T)$ is a solution to the model, then $(Q,T+\Delta T)$ is also a solution, where the anti-symmetric matrix $\Delta T$ satisfies

$\sum_k\Delta T_{j,k}=0,\quad\forall j,$

where there are $n-1$ independent equations in the $n$ equations. Therefore, the total number of degrees of freedom in the solution of the model is

$n+\frac{n\left(n-1\right)}2-\left(\frac{n\left(n-1\right)}2-\left(n-1\right)\right)=2n-1.$

Now, the useful quantities that we can solve is the production and the net-import $T_j:=\sum_kT_{j,k}$ of every agent. Note that the net-import actually has only $n-1$ degrees of freedom because of the restriction $\sum_jT_j=0$.

## The middleman (re-exportation)

It is worth pointing out that the existence of the middleman or re-exportation is completely due to the presence of trade cost. Here we consider a simplified problem: there are three agents playing respectively as the producer, the retailer, and the customer. The producer does not consume (the benefit is $0$); the customer does not produce (the cost and the marginal cost is infinity); and the retailer does not produce or consume. Assume that the trade between any two of them does not bring cost to the third one. Then, the social benefit is

$\Pi=B\!\left(T_{\mathrm c,\mathrm r}+T_{\mathrm c,\mathrm p}\right) -C\!\left(T_{\mathrm c,\mathrm p}+T_{\mathrm r,\mathrm p}\right) -S_\mathrm c\!\left(T_{\mathrm c,\mathrm r},T_{\mathrm c,\mathrm p}\right) -S_\mathrm r\!\left(T_{\mathrm c,\mathrm r},T_{\mathrm r,\mathrm p}\right) -S_\mathrm p\!\left(T_{\mathrm c,\mathrm p},T_{\mathrm r,\mathrm p}\right).$]]>
UlyssesZhan
A measure-theoretic formulation of statistical ensembles (part 2)2023-05-01T16:26:42-07:002023-05-01T16:26:42-07:00https://ulysseszh.github.io/physics/2023/05/01/measure-ensemble-2This article follows part 1.

## Introduction

In part 2, I will focus on non-thermal ensembles.

Before I proceed, I need to clarify that almost all ensembles that we actually use in physics are thermal ensembles, including the microcanonical ensemble, the canonical ensemble, and the grand canonical ensemble (the microcanonical ensemble can be considered as a special case of thermal ensemble where $\vec W^\parallel$ is the trivial).

The theory of thermal ensembles is built by letting the system in question be in thermal contact with a bath. Similarly, if we let the system in question be in non-thermal contact with a bath, we can get the theory of non-thermal ensembles. An example of non-thermal ensembles that is actually used in physics is the isoenthalpic–isobaric ensemble, where we let the system in question be in non-thermal contact with a pressure bath.

However, we will see that it is harder to measure-theoretically develop the theory of non-thermal ensembles if we continue to use the same method as in the theory of thermal ensembles.

## Introducing non-thermal contact with an example

A thermal contact is a contact between thermal system that conducts heat (while exchanging some extensive quantities). A non-thermal contact is a contact between thermal system that does not conduct heat (while exchanging some extensive quantities). For reversible processes, thermodynamically and mathematically, heat is equivalent to a form of work, where the entropy is the displacement and where the temperature is the force. However, this is not true for non-reversible processes because of the Clausius theorem. This should have something to do with the fact that entropy is different from other extensive quantities (as is illustracted in part 1).

First, I may introduce how we may cope with the reversible processes of two subsystems in non-thermal contact in thermodynamics. As an example, consider a tank of monatomic ideal gas separated into two parts by a thermally non-conductive, massless, incompressible plate in the middle that can move. The two parts can then adiabatically exchange energy ($U$) and volume ($V$) but not number of particles ($N$). For one of the parts, we have

$0=\delta Q=\mathrm dU+p\,\mathrm dV=\mathrm dU+\frac{2U}{3V}\,\mathrm dV,$

which is good and easy to deal with because it is simply a differential 1-form.

However, this convenience is not possible for non-reversible processes because then we do not have the simple relation $p=2U/3V$. Actually, the pressure is only well-defined for equilibrium states, and it is impossible to define a pressure that makes sense during the whole non-reversible process, which involves non-equilibrium states. Therefore, although it seems that the “thermally non-conductive” condition imposes a stronger restriction on what states can the composite system reach without external sources, it actually does not because the energy exchanged by the subsystems when they exchange volume is actually arbitrary (as long as it does not violate the second law of thermodynamics) if the process is not reversible.

The possible states of the non-thermally composite system then cannot be simply described by a vector subspace of $W^{(1)}\times W^{(2)}$. If we try to use the same approach as constructing the thermally composite system to construct the non-thermally composite system, the attempt will fail.

Continuing with our example of a tank of gas. Although the pressure is not determined in the non-reversible process, there is one thing that is certain: the pressure on the plate by the gas on one side is equal to the pressure on the plate by the gas on the other side. This is because the plate must be massless (otherwise its kinetic energy would be an external source of energy; also, remember that it is incompressible: this means that it cannot be an external source of volume). Therefore, the relation between the volume exchanged and the energy exchanged is determined as long as at least one side of the plate is undergoing a reversible process because then the reversible side has determined pressure, which determines the pressure of the other side.

This is the key idea of formulating the non-thermal ensembles without formulating the non-thermally composite system. In a thermal or non-thermal ensemble, the composite system consists of two subsystems, one of which is the system in question, and the other is the bath which we are in control of. We can let the bath have zero relaxation time (the time for it to reach thermal equilibrium) so that any process of it is reversible. Then, the pressure (or generally, any other intensive quantities that we are in control of times the temperature) is determined (and actually constant), and we can express the non-conductivity restriction as

$\mathrm dU+p\,\mathrm dV=0,$

where $p$ is the pressure, which is a constant. This is a homogeneous linear equation on $\vec W^{\parallel(1)}$ (whose vectors are denoted as $(\mathrm dU,\mathrm dV)$ in our case) which defines a vector subspace of $\vec W^{\parallel(1)}$, which we call $\vec W^{\parallel\parallel(1)}$. The dimension of $\vec W^{\parallel\parallel(1)}$ is that of $\vec W^{\parallel(1)}$ minus one. The physical meaning of $\vec W^{\parallel\parallel(1)}$ in this example is the hyperplane of fixed enthalpy.

Note that our bath actually has the fixed intensive quantities $i=\left(1/T,p/T\right)\in\vec W^{\parallel(1)\prime}$, we can rewrite the above equation as

$\begin{equation} \label{eq: W star parallel} \vec W^{\parallel\parallel(1)} =\left\{s_1\in\vec W^{\parallel(1)}\,\middle|\,i\!\left(s_1\right)=0\right\}. \end{equation}$

Wait! What does $T$ do here? It is supposed to mean the temperature of the bath, but the temperature of the bath is irrelevant since the contact is non-thermal. Actually, it is. The temperature of the bath serves as an overall constant factor of $i$, which does not affect $\vec W^{\parallel\parallel(1)}$ as long as it is not zero or infinite. So far, this means that the temperature of the bath is not necessarily fixed, so the actual number of fixed intensive quantities is the dimension of $\vec W^{\parallel(1)\prime}$ minus one, which is the same as the dimension of $\vec W^{\parallel\parallel(1)}$. Later we will see that anything that is relevant to the temperature of the bath will finally be irrelevant to our problem. This seems magical, but you will see the sense in that after we introduce another way of developing the non-thermal ensembles (that do not involve baths and non-thermal contact) later.

We can define a complement of $\vec W^{\parallel\parallel(1)}$ in $\vec W^{\parallel(1)}$ as $\vec W^{\parallel\perp(1)}$. Then, we have $\vec W^{\parallel(1)}=\vec W^{\parallel\parallel(1)}+\vec W^{\parallel\perp(1)}$. The space $\vec W^{\parallel\perp(1)}$ is a one-dimensional vector space.

For convenience, define $W^{\star\perp(1)}:=W^{\perp(1)}+\vec W^{\parallel\perp(1)}$. The vector space $\vec W^{\star\perp(1)}$ associated with it is a complement of $\vec W^{\parallel\parallel(1)}$ in $\vec W^{(1)}$. To make the notation look more consistent, we can use $\vec W^{\star\parallel(1)}$ as an alias of $\vec W^{\parallel\parallel(1)}$. They are the same vector space, but $\vec W^{\star\parallel(1)}$ emphasizes that it is a subspace of $\vec W^{(1)}$, and $\vec W^{\parallel\parallel(1)}$ emphasizes that it is a subspace of $\vec W^{\parallel(1)}$. Then, we have $W^{(1)}=W^{\star\perp(1)}+\vec W^{\star\parallel(1)}$. Every point in $W^{(1)}$ can be uniquely written as a sum of a point in $W^{\star\perp(1)}$ and a vector in $\vec W^{\star\parallel(1)}$. We can describe the decomposition by a projection $\pi^{\star(1)}:W^{(1)}\to W^{\star\perp(1)}$.

We will heavily use the “$\star$” on the superscripts of symbols. Any symbol that was labeled “$\star$” is dependent on $i$ (but independent on an overall constant factor on $i$). You can regard those symbols to have an invisible “$i$” in the subscript so that you can keep in mind that they are dependent on $i$.

Example. Suppose we have a tank of gas with three extensive quantities $U,V,N$. It is in non-thermal contact with a pressure bath with pressure $p$ so that it can exchange $U$ and $V$ with the bath. Then, the projection $\pi^{\star(1)}$ projects macrostates with the same enthalpy and number of particles into the same point. Because a complement of a vector subspace is not determined, there are multiple possible ways of constructing the projection. One possible way is

$\pi^{\star(1)}\!\left(U,V,N\right):=\left(U+pV,0,N\right).$

Here the fixed intensive quantity $p$ is involved. Note that this projection is still valid for different temperatures of the bath, so an overall constant factor of $i$ does not affect the projection.

## Non-thermal contact with a bath

Now, after introducing non-thermal contact with an example, we can now formulate the non-thermal contact with a bath.

Suppose we have a system $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$. The main approach is constructing a composite system out of the composite system for the $\vec W^{\parallel(1)}$-ensemble.

The composite system for the $\vec W^{\parallel(1)}$-ensemble was introduced in part 1. We denote the bath that is in contact with our system as $\left(\mathcal E^{(2)},\mathcal M^{(2)}\right)$.

Consider this projection $\pi^\star:W\to W^{\star\perp}$ (where $W^{\star\perp}$ is an affine subspace of $W$ and the range of $\pi^\star$):

$\begin{equation} \label{eq: pi star} \pi^\star\!\left(e_1,e_2\right) :=\left(\pi^{\star(1)}\!\left(e_1\right), \rho_{\pi(e_1,e_2)}\!\left(\pi^{\star(1)}\!\left(e_1\right)\right)\right). \end{equation}$

To ensure that it is well-defined, we need to guarantee that $\pi^{\star(1)}\!\left(e_1\right)\in W^{\parallel(1)}_{\pi(e_1,e_2)}$ for any $e_1,e_2$, and this is true.

The two spaces $W^{\star\perp}$ and $W^{\perp}$ do not have any direct relation. The only relation between them is that the dimension of $W^{\star\perp}$ is one plus the dimension of $W^{\perp}$ (if they are finite-dimensional).

What is good about the projection $\pi^\star$ is that it satisfies $\vec W^{\star\parallel(1)}=\vec c^{(1)}\!\left(\vec\pi^\star(0)\right)$. This makes our notation consistent if we construct another composite system out of $\pi^\star$. Now, consider the composite system of $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ and $\left(\mathcal E^{(2)},\mathcal M^{(2)}\right)$ under the projection $\pi^\star$. In the notation of the spaces and mappings that are involved in the newly constructed composite system, we write “$\star$” in the superscript.

Just like how $\vec W^{\star\parallel(1)}$ is a subspace of $\vec W^{(1)}$, $\vec W^{\star\parallel(2)}$ is also a subspace of $\vec W^{(2)}$. This means that both $\vec\rho^{-1}\circ\vec\rho^\star$ and $\vec\rho\circ\vec\rho^{\star-1}$ are well-defined. The former maps $\vec W^{\star\parallel(1)}$ to another subspace of $\vec W^{(1)}$, and the latter maps $\vec W^{\star\parallel(2)}$ to another subspace of $\vec W^{(2)}$.

We can think the construction of the new composite system as replacing the “plate” between the subsystems in the original composite system from a “thermally conductive plate” to a “thermally non-conductive plate”. Suppose that in the new situation, the intensive quantities “felt” by subsystem 1 is $i^\star\in\vec W^{\star\parallel(1)\prime}$. Then, because the bath is still the same bath in the two situations, we have

$-i^\star\circ\vec\rho^{\star-1}=-i\circ\vec\rho^{-1}.$

Therefore,

$\begin{equation} \label{eq: i star} i^\star:=i\circ\vec\rho^{-1}\circ\vec\rho^\star \end{equation}$

would be a good definition of $i^\star$. However, actually $i^\star$ is trivial:

$\begin{equation} \label{eq: i star = 0} i^\star=0. \end{equation}$

This is because \ref{eq: pi star} shows that $\rho\!\left(W^{\star\parallel(1)}_e\right)=W^{\star\parallel(2)}_e$, and thus

$\vec\rho^{-1}\!\left(\vec\rho^\star\!\left(\vec W^{\star\parallel(1)}\right)\right) =\vec W^{\star\parallel(1)},$

which is the kernel of $i$ by definition.

Because $i^\star$ is trivial, it is irrelevant to the temperature of the bath because it is zero no matter what temperature the bath is at.

Example. Suppose a system described by $U_1,V_1,N_1$ is in non-thermal contact with a pressure bath, and they can exchange energy and volume. The projection $\pi$ is

$\pi\!\left(U_1,V_1,N_1,U_2,V_2,N_2\right) =\left(\frac{U_1+U_2}2,\frac{V_1+V_2}2,N_1,\frac{U_1+U_2}2,\frac{V_1+V_2}2,N_2\right).$

Then, the projection $\pi^\star$ can be

$\pi^\star\!\left(U_1,V_1,N_1,U_2,V_2,N_2\right) =\left(U_1+pV_1,0,N_1,U_2-pV_1,V_1+V_2,N_2\right).$

By choosing a different $\pi^{\star(1)}$ or a different $\pi$, we can get a different $\pi^\star$. They physically mean the same composite system.

The space $W^\perp$ is four-dimensional, and the space $W^{\star\perp}$ is five-dimensional. We can denote the five degrees of freedom as $U,V,H_1,N_1,N_2$, where $U:=U_1+U_2$ is the total energy, $V:=V_1+V_2$ is the total volume, and $H_1:=U_1+pV_1$ is the enthalpy of subsystem 1. Then, the projection $\pi^\star$ can be written as

$\pi^\star\!\left(U_1,V_1,N_1,U_2,V_2,N_2\right) =\left(H_1,0,N_1,U-H_1,V,N_2\right).$

We can get $W^{\star\parallel}_e$ by finding the inverse of the projection, where $e:=\left(H_1,0,N_1,U-H_1,V,N_2\right)$:

$W^{\star\parallel}_e:=\pi^{\star-1}\!\left(e\right) =\left\{\left(H_1-pV_1,V_1,N_1,U-H_1+pV_1,V-V_1,N_2\right)\middle|\,V_1\in\mathbb R\right\}.$

Because it is parameterized by one real parameter $V_1$, it is a one-dimensional affine subspace of $W$. Projecting it under $c^{(1)}$ and $c^{(2)}$ will respectively give us $W^{\star\parallel(1)}_e$ and $W^{\star\parallel(2)}_e$:

$W^{\star\parallel(1)}_e :=\left\{\left(H_1-pV_1,V_1,N_1\right)\middle|\,V_1\in\mathbb R\right\},$ $W^{\star\parallel(2)}_e :=\left\{\left(U-H_1+pV_1,V-V_1,N_2\right)\middle|\,V_1\in\mathbb R\right\}.$

The affine isomorphism $\rho^\star_e$ is then naturally

$\rho^\star_e\!\left(H_1-pV_1,V_1,N_1\right)=\left(U-H_1+pV_1,V-V_1,N_2\right).$

Its vectoric form is then

$\vec\rho^\star\!\left(-p\,\mathrm dV_1,\mathrm dV_1,0\right) =\left(p\,\mathrm dV_1,-\mathrm dV_1,0\right).$

Our fixed intensive quantities are $i$, which is defined as $i\!\left(\mathrm dU_1,\mathrm dV_1,0\right)=\frac1T\,\mathrm dU_1+\frac pT\,\mathrm dV_1$. We can then get $i^\star$ by

$i^\star:=i\circ\vec\rho^{-1}\circ\vec\rho^\star =\left(-p\,\mathrm dV_1,\mathrm dV_1,0\right)\mapsto0.$

This is consistent with Equation \ref{eq: i star = 0}.

## Non-thermal ensembles (bath version)

Now, we can define the non-thermal contact with a bath to be the same as the thermal contact with a bath under $\pi^\star$. Utilizing this definition, we can define the composite system for non-thermal ensembles.

Definition. A composite system for the non-thermal $\vec W^{\parallel(1)}$-ensemble of the system $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ with fixed intensive quantities $i$ is the same as the composite system for the thermal $\vec W^{\star\parallel(1)}$-ensemble with fixed intensive quantities $i^\star=0$ (given by Equation \ref{eq: i star = 0}), where $\vec W^{\star\parallel(1)}$ is defined by Equation \ref{eq: W star parallel}.

This definition looks very neat. Also, just like how we define the domain of fixed intensive quantities of a thermal ensemble, we can define the domain of fixed intensive quantities of a non-thermal ensemble to consist of those values that makes the integral in the definition of the partition function converge.

Because we already derived the formula of the partition function in part 1 that does not involve information about the bath anymore, we can drop the “$(1)$” in the superscripts. The partition function of the non-thermal ensemble is then

$Z^\star\!\left(e,i^\star\right)=\int_{s\in\vec E^{\star\parallel}_e} \Omega\!\left(e+s\right) \mathrm e^{-i^\star\left(s\right)}\,\mathrm d\lambda^{\parallel}\!\left(s\right),\quad e\in E^{\star\perp},\quad i^\star\in I^\star_e\subseteq\vec W^{\star\parallel\prime}.$

Here, the $i^\star$ is not fixed at the trivial value $0$ (I abused the notation here) but actually is an independent variable serving as one of the arguments of the partition function that takes values in $I^\star_e$ (which is not the domain of fixed intensive quantities of the non-thermal ensemble that was mentioned above). However, the only meaningful information about this non-thermal ensemble is in the behavior of $Z^\star$ at $i^\star=0$ instead of any arbitrary $i^\star\in I^\star_e$, but we do not know whether $0\in I^\star_e$ or not. This is then a criterion of judge whether $i$ is in the domain of fixed intensive quantities of the non-thermal ensemble or not. To be clear, we define

$J:=\left\{i\in\vec W^{\parallel\prime}\,\middle|\, \exists e\in E^{\star\perp}:0\in I^\star_{e}\right\}.$

A problem about this formulation is that it is possible to have two $i$s that share the same thermal equilibrium state. In that case, the non-thermal ensemble is not defined.

Because $i^\star=0$, the observed extensive quantities in thermal equilibrium are just

$\begin{equation} \label{eq: epsilon^circ} \varepsilon^\circ =e+\left.\frac{\partial\ln Z^\star\!\left(e,i^\star\right)}{\partial i^\star}\right|_{i^\star=0} =e+\frac{\int_{s\in\left(E-e\right)\cap\vec W^{\star\parallel}} s\Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)} {\int_{s\in\left(E-e\right)\cap\vec W^{\star\parallel}} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)}, \end{equation}$

and the entropy in thermal equilibrium is just

$\begin{equation} \label{eq: S^circ} S^\circ=\ln Z^\star\!\left(e,0\right) =\ln\int_{s\in\left(E-e\right)\cap\vec W^{\star\parallel}} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right). \end{equation}$

We can cancel the parameter $e$ by Equation \ref{eq: epsilon^circ} and \ref{eq: S^circ} to get

$\begin{equation} \label{eq: S^circ vs epsilon^circ} S^\circ=\ln Z^\star\!\left(\pi^\star\!\left(\varepsilon^\circ\right),0\right) =\ln\int_{s\in\left(E-\varepsilon^\circ\right)\cap\vec W^{\star\parallel}} \Omega\!\left(\varepsilon^\circ+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right). \end{equation}$

What is interesting about Equation \ref{eq: S^circ vs epsilon^circ} is that it actually does not guarantee the intensive variables to be defined in $\vec W^\parallel$. Physically this means that the temperature is not necessarily defined, unlike the case of thermal ensembles (this is because the thermal contact makes the temperature the same as the bath and thus defined). The thing that is guaranteed is that the intensive variables are defined in $\vec W^{\star\parallel}$ and they must be zero. Therefore, whenever the intensive variables are defined in $\vec W^\parallel$, it must be parallel to $i$ (and remains the same if we scale $i$ by an arbitrary non-zero factor). Physically, this means that the system must have the same intensive variables as the bath up to different temperatures.

## Non-thermal ensembles (non-bath version)

It may seem surprising that we can define non-thermal ensembles without a bath. How is it possible to fix some features about the intensive variables without a bath? The inspiration is looking at Equation \ref{eq: W star parallel}. We can make a guess here: if we contract the system along $\vec W^{\star\parallel}$, the contraction satisfy the equal a priori probability principle. We make this guess because of the following arguments:

• Mathematically, contraction is a legal new system, so it should also satisfy the axioms that we proposed before.
• Physically, because the temperature of the bath is arbitrary, the different accessible macrostates should not be too different because otherwise the temperature would matter (as appears in the expression of the partition function).

After finding the equilibrium state of the contraction, we can use the contractional pullback to find the equilibrium state of the original system.

If you do it right, you should get the same answer as Equation \ref{eq: S^circ vs epsilon^circ}.

## Summary

The only axiom that we used is the equal a priori probability principle. Then, we formulated three types of ensembles: microcanonical, thermal, and non-thermal.

]]>
UlyssesZhan
Summarizing my methods for studying2023-04-05T11:38:17-07:002023-04-05T11:38:17-07:00https://ulysseszh.github.io/misc/2023/04/05/study-methodsThis post is in Chinese.

## 受众

### 初学阶段

• 有非常系统的知识体系. 你可以将这个专业领域划分为多个细分领域, 每个细分领域内又有比较固定的学习顺序. 各个细分领域之间并没有固定的学习顺序, 但是不同细分领域内的知识可以 “相互解锁”.
• 你遇到的几乎所有的学术问题 (作业, 考试等, 不包括 project 之类) 都并没有本质上的复杂性. 大多数问题都能在一天内解决.
• 学习资源非常丰富. 你可以找到大量教材. 因为文献充足, Wikipedia 上一般也有比较完善的词条. 因为学的人多, Stack Exchange 上一般也有人问你会遇到的问题, 问了之后也一般都有人能回答.
• 大多数知识都是 “可记忆的”. 当然, 理论上来说任何知识, 只要你肯背, 都是可记忆的, 但是这里我所说的可记忆是指理解意义上的记忆: 长期上, 如果你曾经理解过一个知识, 那么你在很久以后仍然可以理解它; 短期上, 如果你理解了一个知识, 那么你很可能可以在不去专门背诵的情况下记住它.
• 你学习到的知识基本上都是别人告诉你的, 而不是你自己创造或发现的.

## 学习的 pace

### 做题 (作业与考试)

“忘记考试” 并不是一件说起来简单做起来难的事情. 它说起来简单, 做起来也简单: 你只需要不去理睬 “考试” 这件事的存在就可以了. 有这么几句被说烂了的话: 上学是为了学习, 而不是为了考试; 考试是为了检验学习成果, 而不是学习的目的. 这几句话并非说得不对, 而且确实说到了点子上, 但大家实在听得太多了, 以至于这些话已经显得空洞而毫无意义了. 如果你是一个陷入在考试中的学生, 那么你最好重新审视一下这些话的意义, 不能被来自中小学的思维惯性左右.

## 直觉

### 记法和符号

• 符号语言多用于读和写, 自然语言多用于听和说. 换句话说, 符号语言是 non-oral 的. 这个关键性的区别导致了你不能通过符号语言来与自己 “交流”.
• 符号语言是专家发明的, 自然语言是由社会中的所有人类互相交流时自发产生而塑造的.
• 符号语言的能指和所指之间的对应关系相比较自然语言更加固定, 因此表意更清晰.
• 符号语言是非线性书写的, 自然语言是线性书写的. 这里线性指的是将字符排列在一条线上, 知道化学中的 line notation 的同学应该都知道是什么意思.

## 生活与娱乐

]]>
UlyssesZhan
A measure-theoretic formulation of statistical ensembles (part 1)2023-03-30T21:49:51-07:002023-03-30T21:49:51-07:00https://ulysseszh.github.io/physics/2023/03/30/measure-ensembleI feel that the process of using statistical ensembles to find properties of thermal system is not rigorous enough. There are some operations that need to be defined precisely. Also, it is not generalized enough. Currently, the only generally used statistical ensembles are the microcanonical ensemble, the canonical ensemble, and the grand canonical ensemble, but there are actually other possible ensembles that are potentially useful. Therefore, I feel it necessary to try to have a mathematical formulation.

## Mathematical tools and notations

Suppose $(\Omega,\sigma(\Omega),P)$ is a probability space. Suppose $W$ is an affine space. For some map $f:\Omega\to W$, we define the $P$-expectation of $f$ as

$\mathrm E_P\!\left[f\right]:=\int_{x\in\Omega}\left(f(x)-e_0\right)\mathrm dP(x)+e_0,$

where $e_0\in W$ is arbitrary. Here the integral is Pettis integral. The expectation is defined if the Pettis integral is defined, and it is then well-defined in that it is independent of the $e_0$ we choose.

Suppose $X,Y$ are Polish spaces. Suppose $(Y,\sigma(Y),\mu),(X,\sigma(X),\nu)$ are measure spaces, where $\mu$ and $\nu$ are σ-finite Borel measures. Suppose $\pi:Y\to X$ is a measurable map so that

$\forall A\in\sigma(X):\nu(A)=0\Rightarrow\mu\!\left(\pi^{-1}\!\left(A\right)\right)=0.$

Then, for each $x\in X$, there exists a Borel measure $\mu_x$ on the measurable subspace $\left(\pi^{-1}(x),\sigma\!\left(\pi^{-1}(x)\right)\right)$, such that for any integrable function $f$ on $Y$,

$\int_{y\in Y}f\!\left(y\right)\mathrm d\mu(y) =\int_{x\in X}\mathrm d\nu(x)\int_{y\in\pi^{-1}(x)}f\!\left(y\right)\mathrm d\mu_x(y).$
Proof

Proof. Because $\mu$ is σ-finite, we have a countable covering of $Y$ by pairwise disjoint measurable sets of finite $\mu$-measure, denoted as $\left\{Y_i\right\}$. Each $Y_i$ is automatically stroke=’#currentColor’ and inherits the σ-algebra from $Y$, and $\left(Y_i,\sigma\!\left(Y_i\right),\mu\right)$ is a measure space.

Define $\pi_i:Y_i\to X$ as the restriction of $\pi$ to $Y_i$, then $\pi_i$ is automatically a measurable map from $Y_i$ to $X$, and for any $x\in X$,

$\pi^{-1}(x)=\bigcup_i\pi_i^{-1}(x),$

and the terms in the bigcup are pairwise disjoint.

Let $\nu_i$ be a measure on $X$ defined as

$\nu_i(A):=\mu\!\left(\pi_i^{-1}\!\left(A\right)\right).$

This is a measure because $\pi_i$ is a measurable map. According to the disintegration theorem, for each $x\in X$, there exists a Borel measure $\mu_{i,x}$ on $Y_i$ such that for $\nu$-almost all $x\in X$, $\mu_{i,x}$ is concentrated on $\pi_i^{-1}(x)$ (in other words, $\mu_{i,x}\!\left(Y\setminus\pi_i^{-1}(x)\right)=0$); and for any integrable function $f$ on $Y_i$,

$\int_{y\in Y_i}f\!\left(y\right)\mathrm d\mu(y) =\int_{x\in X}\mathrm d\nu_i(x)\int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_{i,x}(y).$

From the condition in the original proposition, we can easily prove that $\nu_i$ is absolutely continuous w.r.t. $\nu$. Therefore, we have their Radon–Nikodym derivative

$\varphi_i(x):=\frac{\mathrm d\nu_i(x)}{\mathrm d\nu(x)}.$

For each $x\in X$, define the measure $\mu_x$ on $\pi^{-1}(x)$ as

$\mu_x(A):=\sum_i\varphi_i\!\left(x\right)\mu_{i,x}\!\left(A\cap Y_i\right).$

This is a well-defined measure because the sets $A\cap Y_i$ are pairwise disjoint, and $\mu_{i,x}$ is well-defined measure on $Y_i$.

Then, for any integrable function $f$ on $Y$,

\begin{align*} \int_{y\in Y}f\!\left(y\right)\mathrm d\mu(y) &=\sum_i\int_{y\in Y_i}f\!\left(y\right)\mathrm d\mu(y)\\ &=\sum_i\int_{x\in X}\mathrm d\nu_i(x)\int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_{i,x}(y)\\ &=\sum_i\int_{x\in X}\varphi_i\!\left(x\right)\mathrm d\nu(x) \int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_{i,x}(y)\\ &=\int_{x\in X}\mathrm d\nu(x)\sum_i\int_{y\in\pi_i^{-1}(x)}f\!\left(y\right)\mathrm d\mu_x(y)\\ &=\int_{x\in X}\mathrm d\nu(x)\int_{y\in\pi^{-1}(x)}f\!\left(y\right)\mathrm d\mu_x(y).&\square \end{align*}

Here, the family of measures $\left\{\mu_x\right\}$ is called the disintegration of $\mu$ w.r.t. $\pi$ and $\nu$.

For two vector spaces $\vec W_1,\vec W_2$, we denote $\vec W_1\times\vec W_2$ as the direct sum of them. Also, rather than calling the new vector space their direct sum, I prefer to call it the product vector space of them (not to be confused with the tensor product) so that it is consistent with the notion of product affine spaces, product measure spaces, product topology, etc. Those product spaces are all notated by “$\times$” in this article.

Also, “$\vec W_1$” can be an abbreviation of $\vec W_1\times\left\{0_2\right\}$, where $0_2$ is the zero vector in $\vec W_2$.

Suppose $W$ is an affine space associated with the vector space $\vec W$. For any $A\subseteq W$ and $B\subseteq\vec W$, we denote $A+B$ as the Minkowski sum of $A$ and $B$, i.e.,

$A+B:=\left\{a+b\,\middle|\,a\in A,\,b\in B\right\}.$

This extends the definition of usual Minkowski sums for affine spaces.

By the way, because of the abbreviating “$\vec W_1$” meaning $\vec W_1\times\left\{0_2\right\}$ above, we can abuse the notation and write

$\vec W_1+\vec W_2=\vec W_1\times\vec W_2,$

where “$+$” denotes the Minkowski sum. This is true for any two vector spaces $\vec W_1,\vec W_2$ that do not share a non-trivial vector subspace.

In general, it is not necessarily possible to decompose a topology as a product of two topologies. However, it is always possible for locally convex Hausdorff TVSs. We can always decompose the topology of a locally convex Hausdorff TVS as the product of the topologies on a pair of its complementary vector subspaces, one of which is finite-dimensional. This is true because every finite-dimensional subspace in such a space is topologically complemented. The complete statement is the following:

Let $\vec W$ be a locally convex Hausdorff TVS. For any finite-dimensional subspace $\vec W^\parallel$ of $\vec W$, there is a complement $\vec W^\perp$ of it such that the topology $\tau\!\left(\vec W\right)$ is the product topology of $\tau\!\left(\vec W^\parallel\right)$ and $\tau\!\left(\vec W^\perp\right)$.

This decomposition is also valid for affine spaces. If an affine space $W$ is associated with a locally convex Hausdorff TVS $\vec W$, then for any finite-dimensional vector subspace $\vec W^\parallel$ of $\vec W$, we can topologically decompose $W$ into $W^\perp+\vec W^\parallel$.

Because the product topology of subspace topologies is the same as the subspace topology of the product topology, we can also decompose $E^\perp+\vec W^\parallel$ as the product topological space of $E^\perp$ and $\vec W^\parallel$ if $E^\perp\subseteq W^\perp$.

Such decompositions are useful because they allow us to disintegrate Borel measures. If we already have a σ-finite Borel measure on $E^\perp+\vec W^\parallel$ and we can define a σ-finite Borel measure on $\vec W^\parallel$, then we can define a measure on $E^\perp$ by the disintegrating, and we guarantees that the disintegration is also σ-finite and Borel.

When I want to use multi-index notations, I will use “$\bullet$” to denote the indices. For example,

$\Sigma\alpha_\bullet:=\sum_\bullet\alpha_\bullet.$ $\alpha_\bullet\beta_\bullet:=\sum_\bullet\alpha_\bullet\beta_\bullet.$ $\alpha_\bullet^{\beta_\bullet}:=\prod_\bullet\alpha_\bullet^{\beta_\bullet}.$ $\alpha_\bullet!:=\prod_\bullet\alpha_\bullet!.$

## Extensive quantities and macrostates

First, I need to point out that the most central state function of a thermal system is not its energy, but its entropy. The energy is regarded as the central state function in thermodynamics, which can be seen from the fundamental equation of thermodynamics

$\mathrm dU=-p\,\mathrm dV+T\,\mathrm dS+\mu\,\mathrm dN.$

We also always do the Legendre transformations on the potential function $U$ to get other potential functions instead of doing the transformation on other extensive quantities. All such practices make us think that $S$ is just some quantity that is similar to $V$ and $N$, and mathematically we can just regard it as an extensive quantity whose changing is a way of doing work.

However, this is not the case. The entropy $S$ is different from $U,V,N$ in the following sense:

• The entropy is a derived quantity due to a mathematical construction from the second law of thermodynamics, while $U,V,N$ are observable quantities that have solid physical meanings before we introduce anything about thermodynamics.
• The entropy may change in an isolated system, while $U,V,N$ do not.
• We may have an intuitive understanding of how different systems in contact may exchange $U,V,N$ with each other, but $S$ cannot be “exchanged” in such a sense.
• In statistical mechanics, $U,V,N$ restrict what microstates are possible for a thermal system, but $S$ serves as a totally different role: it represents something about the probability distribution over all the possible microstates.

Therefore, I would rather rewrite the fundamental equation of thermodynamics as

$\begin{equation} \label{eq: fundamental} \mathrm dS=\frac1T\,\mathrm dU+\frac pT\,\mathrm dV-\frac\mu T\,\mathrm dN. \end{equation}$

Equation \ref{eq: fundamental} embodies how different quantities serve different roles more clearly, but it becomes vague in its own physical meaning. Does it mean different ways of changing the entropy in quasi-static processes? Both mathematically and physically, yes, but it is not a useful interpretation. Because what we are doing is mathematical formulation of physical theories, we do not need to try to assign physical meanings to anything we construct. This new equation is purely mathematical, and the only way we use it is to relate intensive variables to derivatives of $S$ w.r.t. extensive quantities.

From now on, I will call quantities like $U,V,N$ the extensive quantities, not including $S$. However, this is not a good statement as part of our mathematical formulation. Considering that there is a good notion of how different systems may exchange values of extensive quantities and that we can scale a system by multiplying the extensive quantities by a factor, we require that the extensive quantities must support at least linear operations… do we?

Well, actually we will see that if we require a space a vector space, things would be a little bit complex because sometimes we need to construct a new space of extensive quantities out of the affine subspace of an existing one, which is not a vector space by nature. If we require the space to be a vector space, we need to translate that affine subspace to make it pass through the zero element of the vector space, which is possible but does not give any insight about the physics except adding complicationg to our construction. Therefore, I will not require the space of extensive quantities to be a vector space, but be an affine space.

You may ask, OK then, but how do we “add” or “scale” extensive quantities if they live one an affine space? First, regarding the addition operation, we will use an abstraction for such operations so that the actual implementation about how do we combine the summands is hidden under this abstraction. We will see that this abstraction is useful because it also applies to other senarios or useful operations that does not necessarily involve any meaningful addition. Regarding the scaling operation, I would argue that now we do not need them. I have generalized the notion of extensive quantities so that now the notion “extensive quantities” includes some quantities that are not really extensive quantities in any traditional sense. They are no longer meant to be scaled because they simply cannot. Actually, rather than calling them extensive quantities, I would like to call them a macrostate, with the only difference from the general notion macrostate being that it has an affine structure so that I can take the ensemble average of it to get its macroscopic value. I would stick to the term “extensive quantities” because they are actual extensive quantities in all my examples and because it is a good way to understand its physical meaning with this name, but you need to keep in mind that what I actually refer to is a macrostate.

There is another difficulty. If we look closely, Equation \ref{eq: fundamental} actually does not make much sense in that $N$ is quantized (and also $U$ if we are doing quantum). If we are doing real numbers, we can always translate a quantized quantity to something that is not allowed, which means that we cannot have the full set of operations on the allowed values of the extensive quantities. Therefore, we need to specify a subset on the affine space to represent the allowed values of the extensive quantities.

We also see that Equation \ref{eq: fundamental} is a relation between differentials. Do we need to require that we have differential structure on the space of extensive quantities? Not yet, because it actually is somehow difficult. The same difficulty about the quantized quantities applies. The clever way is to just avoid using the differentials. (Mathematicians are always skeptical about differentiating something while physicists just assume everything is differentiable…) It may seem surprising, but actually differentials are evitable in our mathematical formulation if you do not require intensive variables to be well-defined inside the system itself (actually, they are indeed not well-defined except when you have a system in thermal equilibrium and take the thermaldynamic limit).

If we have to use differentials, we can use the Gateaux derivative. It is general enough to be defined on any locally convex TVS, and it is intuitive when it is linear and continuous.

Although differential structure is not necessary, there is an inevitable structure on the space of extensive quantities. Remember that in canonical and grand canonical ensembles, we allow $U$ or $N$ to fluctuate, so we should be able to describe such fluctuations on our space of extensive quantities. To do this, I think it is safe to assume that we can have some topology on the allowed subset to make it a Polish space, just like how probabilists often assume about the probability space they are working on.

A final point. Here is a difference in how physicists and mathematicians describe probability distributions: physicists would use a probability density function while mathematicians would use a probability measure. Mathematically, to have a probability density function, we need to have an underlying measure on our space for a notion of “volume” on the space, and then we can define the probability density function as the Radon–Nikodym derivative of the probability measure w.r.t. the underlying volume measure. Also, for t he Radon–Nikodym derivative to exist, the probability measure must be absolutely continuous w.r.t. the volume measure, which means that we have to sacrifice all the probability distributions that are not absolutely continuous to take the probability density function approach. Then, it seems that if we use the probability density function approach, we are introducing an excess measure structure on the space of extensive quantities and losing some possibilities and generalizabilities, but it would turn out that the extra structure is useful. Therefore, I will use the probability density function approach.

Here is our final definition of the space of extensive quantities:

Definition. A space of extensive quantities is a tuple $(W,E,\lambda)$, where

• $W$ is an affine space associated with a reflexive vector space $\vec W$ over $\mathbb R$, and it is equipped with topology $\tau(W)$ that is naturally constructed from the topology $\tau\!\left(\vec W\right)$ on $\vec W$;
• $E\subseteq V$ is a topological subspace of $W$, and its topology $\tau(E)$ makes $E$ a Polish space; and
• $\lambda:\sigma(E)\to[0,+\infty]$ is a non-trivial σ-finite Borel measure, where $\sigma(E)\supseteq\mathfrak B(E)$ is a σ-algebra on $E$ that contains the Borel σ-algebra on $E$.

Here, I also added a requirement of σ-finiteness. This is necessary when constructing product measures. At first I also wanted to require that $\lambda$ has some translational invariance, but I then realized that it is not necessary, so I removed it from the definition (but we will see that we need them as a property of baths).

Example. Here is an example of a space of extensive quantities.

\begin{align*} W&:=\mathbb R^3,\\ E&:=(0,+\infty)\times(0,+\infty)\times\mathbb Z^+,\\ \lambda(A)&:=\sum_{N\in\mathbb Z^+}\operatorname{area}(A\cap(0,+\infty)\times(0,+\infty)\times\{N\}). \end{align*}

Physically we may think of this as the extensive quantities of the system of ideal gas. The three dimensions of $W$ are energy, volume, and number of particles.

Example. Here is another example of a space of extensive quantities.

\begin{align*} W&:=\mathbb R^2,\\ E&:=\{(3N/2+n,N)\,|\,N\in\mathbb Z^+,n\in\mathbb N\},\\ \lambda(A)&:=\operatorname{card}A. \end{align*}

Physically we may think of this as the extensive quantities of the system of Einstein solid with $\hbar\omega=1$. The two dimensions of $W$ are energy and number of particles.

## Thermal systems and the number of microstates

Remember I said above that, in statistical mechanics, $U,V,N$ restrict what microstates are possible for a thermal system. We can translate this as such: for each possible values of extensive quantities, denoted as $e\in E$, here is a set of possible microstates, denoted as $M_e$ (you can then see why we excluded the entropy from the extensive quantities: otherwise we cannot do such a classification of microstates).

Now the problem is what structures should we add to $M_e$ for each $e\in E$. Recall that in statistical mechanics, we study probability distribution over all possible microstates. Therefore, we need to be able to have a probability measure on $M_e$. In other words, $M_e$ should be a measurable space. As said before, we can either use a probability measure directly, or use a volume measure together with a probability density function. This time, we seem to have no choice but the probability density function approach because there is a natural notion of volume on $M_e$: the number of microstates.

Wait! There is a problem. Recall that in microcanonical ensemble, we allow the energy to fluctuate. The number of microstates at exactly a certain energy is actually zero in most cases, so we are actually considering those microstates with some certain small range of energy. In other words, we are considering the microstate density: the number of microstates inside unit range of energy. Similarly, we should define a measure on $M_e$ to represent the microstate density, which is the number of microstates inside unit volume of extensive quantities, where the “volume” is measured by the measure $\lambda$ in the space of the extensive quantities.

This makes our formulation a little bit different from the microcanonical ensemble: our formulation would allow all extensive quantities to fluctuate while the microcanonical ensemble would only allow the energy to fluctuate. This is inevitable because we are treating extensive quantities like energy, volume, and number of particles as the same kind of quantity. It is not preferable to separate a subspace out from our affine space $W$ to say “these are the quantities that may fluctuate, and those are not.” Therefore, we need to justify why we may allow all extensive quantities to fluctuate. The justification is: mathematically, we are actually not allowing any extensive quantities to fluctuate. There is no actual fluctuation, and we are directly considering the microstate density without involving any change in the extensive quantities. In other words, using the language of microcanonical ensemble, we are considering the area of the surface of the energy shell instead of the volume of the energy shell with a small thickness.

Another important point is that we must make sure that specifying all the extensive quantities should be enough to restrict the system to finite number of microstates. In other words, the total microstate density should be finite for any possible $e\in E$. Also, there should be at least some possible microstates in $M_e$, so the total microstate density should not be zero.

We may them sum up the above discussion to give $M_e$ enough structure to make it the set of microstates of a thermal system with the given extensive quantities $e$. Then, the disjoint union of all of them (the family of measure spaces) is the thermal system.

Definition. A thermal system is a pair $\left(\mathcal E,\mathcal M\right)$, where

• $\mathcal E:=\left(W,E,\lambda\right)$ is a space of extensive quantities;
• $\mathcal M:=\bigsqcup_{e\in E}M_e$ is a family of measure spaces; and
• For each $e\in E$, $M_e$ is a measure space equipped with a measure $\mu_e$ such that $\mu_e\!\left(M_e\right)$ is finite and nonzero.

From now on, I will use a pair $(e,m)\in\mathcal M$ to specify a single microstate, where $e\in E$ and $m\in M_e$.

Example. For the thermal system of a solid consisting of spin-$\frac12$ particles, where each particle has two possible states with energy $0$ and $1$, we can construct

\begin{align*} W&:=\mathbb R^2,\\ E&:=\left\{\left(U,N\right)\in\mathbb N\times\mathbb Z^+\,\middle|\,U\le N\right\},\\ \lambda(A)&:=\operatorname{card}A,\\ M_{U,N}&:=\left\{n\in\left\{0,1\right\}^N\,\middle|\,\sum_in_i=U\right\},\\ \mu_{U,N}(A)&:=\operatorname{card}A. \end{align*}

This should be the simplest example of a thermal system.

Example. We may complete the example of the system of ideal gas. Suppose we are considering the system of ideal atomic gas inside a cubic box. The construction of the space of extensive quantities is the same as before. Denote possible values of extensive quantities in coordinates $e=(U,V,N)$. Now the measure spaces $M_e$ may be constructed as such:

\begin{align*} M_{U,V,N}&:=\left\{\left(\ldots\right)\in \left(\left[0,\sqrtV\right]^3\times\mathbb R^3\right)^N \,\middle|\,\text{lexicographic order, }\sum_i\frac{\left|\mathbf p_i\right|^2}{2m}=U\right\},\\ \mu_{U,V,N}(A)&:=\frac{H^{6N-1}(A)}{h^{3N}}. \end{align*}

The “lexicographic order” here means that only those configurations where particle indices coincides with the lexicographic order are included in $M_e$. This is because the particles are indistinguishable, and the order of particles is irrelevant. The lexicographic order restriction is the same as using the quotient of the $N$-fold Cartesian product by permutation actions, but then defining $\mu_e$ would be difficult. Alternatively, we may still make them ordered, but divide the result by $N!$ in the definition of $\mu_e$, but this way is less clear in its physical meaning.

Here $H^d$ is the $d$ dimensional Hausdorff measure. To understand, the expression $H^{6N-1}(A)$ is just the $(6N-1)$-dimensional “volume” of $A$.

Since we have microstate density, why do not we have the true number of microstates? We can define a measure on $\mathcal M$ to represent the number of microstates.

Definition. The measure of number of microstates is a measure $\mu:\sigma(\mathcal M)\to\left[0,+\infty\right]$, where

$\sigma(\mathcal M):=\left\{\bigsqcup_{e\in A}B_e\,\middle|\,A\in\sigma(E),\,B_e\in\sigma(M_e)\right\},$

and the measure is defined by

$\mu(A):=\iint\limits_{(e,m)\in A}\mathrm d\mu_e(m)\,\mathrm d\lambda(e).$

The uniqueness of $\mu$ is guaranteed by the σ-finiteness of $\lambda$ and $\mu_e$. The expression $\mu(A)$ is called the number of microstates in $A$.

## States and the entropy

Here is a central idea in statistical ensembles: a state is a probability distribution on the microstates of a thermal system. It is among the ideas upon which the whole theory of statistical ensembles is built. I will take this idea, too.

As said before, I have taken the probability density approach of defining a probability distribution. Therefore, a state is just a probability density function.

Definition. A state of a thermal system $(\mathcal E,\mathcal M)$ is a function $p:\mathcal M\to\left[0,+\infty\right]$ such that $(\mathcal M,\sigma(\mathcal M),P)$ is a probability space, where $P:\sigma(\mathcal M)\to\left[0,1\right]$ is defined by

$\begin{equation} \label{eq: probability measure} P(A):=\int_Ap\,\mathrm d\mu. \end{equation}$

Two states are the same if they are equal $\mu$-almost everywhere.

A probability space is just a measure space with a normalized measure, and here the physical meaning of $p$ is the probability density on $\mathcal M$, and $P(A)$ is the probability of finding a microstate in $A$.

Note that a state is not necessarily an equilibrium state (thermal state). We will introduce the concept of equilibrium states later.

Now we may introduce the concept of entropy.

I need to clarify that the entropy that we are talking about here is just the entropy in statistical mechanics. The reason I add this clarification is that we may also formally define an entropy in the language of measure theory, which is defined for any probability space and does not depend on any so-called probability density function or a “volume” measure (which is the number of microstates in our case). The definition of this entropy is (if anyone is interested)

$S^{\mathrm{info}}:=\sup_\Pi\sum_{A\in\Pi}-P(A)\ln P(A),$

where $P$ is the probability measure on the probability space, and the supremum is taken over all $P$-almost partition $\Pi$ of the probability space ($\Pi$ is a subset of the σ-algebra so that $P(\bigcup_{A\in\Pi}A)=1$ and $P(A\cap B)=0$ for $A,B\in\Pi$). This definition looks intuitive and nice, and not surprisingly it is… not consistent with the entropy in statistical mechanics. The discrepancy happens when we are doing classical statistical mechanics because the entropy defined above will diverge to infinity for those “continuous” probability distributions. A quick check is that the entropy of the uniform distribution over $[0,1]$ is $+\infty$.

Definition. The entropy of a state $p$ is defined by

$S[p]:=\int_\mathcal M-p\ln p\,\mathrm d\mu.$

Different from extensive quantities, the entropy is a functional of $p$. The entropy here is consistent with the entropy in thermodynamics or statistical mechanics.

This definition of entropy is called the Gibbs entropy formula. It agrees with the entropy defined in thermodynamics, but we are unable to show that at this stage because we have not defined temperature or heat yet.

Note that the base of the logarithm is not important, and it is just a matter of unit system. In SI units, the base would be $\exp k_\mathrm B^{-1}$, where $k_\mathrm B$ is the Boltzmann constant.

Physically, the extensive quantities may be measured macroscopically. The actual values that we get when we measure them are postulated to be the ensemble average. Therefore, for a given state $p$, we can define the measured values of extensive quantities by taking the $P$-expectation of the extensive quantities.

Definition. For a thermal system $(\mathcal E,\mathcal M)$ and a state $p$ of it, the measured value of extensive quantities of the state $p$ is the $P$-expectation of the $E$-valued random variable $(e,m)\mapsto e$. Explicitly, the definition is

$\varepsilon[p]:=\mathrm E_P\!\left[\left(e,m\right)\mapsto e\right],$

where the probability measure $P$ on $\mathcal M$ is defined in Equation \ref{eq: probability measure}.

In the definition, it involves taking the $P$-expectation of a $W$-valued function. This involves doing a Pettis integral, which I claim to exist. It exists because the map $(e,m)\mapsto e-e_0$ must be weakly $P$-measurable, and such a function must be Pettis-integrable on a reflexive space.

Note that $\varepsilon[p]\in W$, and it is not necessarily in $E$.

The usage of the measured value of extensive quantities is that we can use it to get the fundamental equation of a thermal system, which describes the relationship between the extensive quantities and the entropy at any equilibrium state. Suppose that we postulate a family of states $p_t^\circ$ of the thermal system (or its slices, which will be introduced below), labeld by different $t$’s, and call them the possible equilibrium states. Then, we can have the following two equations:

$\begin{equation} \label{eq: fundamental equation before} \begin{cases} S^\circ=S\!\left[p_t^\circ\right],\\ \varepsilon^\circ=\varepsilon\!\left[p_t^\circ\right]. \end{cases} \end{equation}$

By cancelling out the $t$ in the two equations (which may be impossible but assumed to be possible), we can get the fundamental equation in this form:

$\begin{equation} \label{eq: fundamental equation} S^\circ=S^\circ\!\left(\varepsilon^\circ\right). \end{equation}$

Then, here we get the function $S^\circ:E^\circ\to\mathbb R$, where $E^\circ$ is a subset of $W$ consisting of all possible measured values of extensive quantities among equilibrium states. If we can possibly define some differential structure on $E^\circ$ so that we can possibly take the differential of $S^\circ$ and write something sensible like

$\mathrm dS^\circ=i\!\left(\varepsilon^\circ\right)(\mathrm d\varepsilon^\circ),$

where $i^\circ\!\left(\varepsilon^\circ\right)\in\vec W’$ is a continuous linear functional, then we can define $i^\circ\!\left(\varepsilon^\circ\right)$ to be the intensive quantities at $\varepsilon^\circ$. A proper comparison with differential geometry is that we may analogly call $i^\circ$ be a covector field on $E^\circ$ defined as the differential of the scalar field $S^\circ$.

However, as I have said before, I did not postulate there to be any differential structure on $E^\circ$, so the intensive quantities should not be generally defined in this way.

## Slicing

A good notion about thermal systems is that we can get new thermal systems from existing ones (although they are physically essentially the same system, they have different mathematical structure and contain different amount of information about them). There are two ways of constructing new thermal systems from existing ones:

• By fixing some extensive quantities. I call this way slicing.
• By allowing some extensive quantities to change freely. I call this way contracting.

I chose the words “slicing” and “contracting”. They are not present in actual physics textbooks, but I found the notion of them necesesary.

Slicing fixes extensive quantities. How we do it is to pick out a subset of $E$ and make it our new accessible values of extensive quantities. I find a special way of picking out such a subset is especially useful: picking it from an affine subspace of $W$. In this way, we can use a smaller affine space as the underlying space of our new thermal system. Then we see why I chose the word “slicing”: we are slicing the original affine space into parallel pieces, and picking one piece as our new affine space, and picking the corresponding accessible values of extensive quantities and possible microstates within that piece to form our new thermal system.

Definition. A slicing of a space of extensive quantities $\left(W,E,\lambda\right)$ is a pair $\left(W^\parallel,\lambda^\parallel\right)$, where

• $W^\parallel\subseteq W$ is an affine subspace of $W$;
• $E^\parallel:=E\cap W^\parallel$ is non-empty, and it is Polish as a topological subspace of $E$; and
• $\lambda^\parallel:\sigma\!\left(E^\parallel\right)\to\left[0,+\infty\right)$ is a non-trivial σ-finite Borel measure on $E^\parallel$, where $\sigma\!\left(E^\parallel\right)\subseteq\mathfrak B\!\left(E^\parallel\right)$ is a σ-algebra on $E^\parallel$ that contains the Borel σ-algebra on $E^\parallel$.

This constructs a new space of extensive quantities $\left(W^\parallel,E^\parallel,\lambda^\parallel\right)$, called a slice of the original space of extensive quantities $\left(W,E,\lambda\right)$.

Definition. A slice of a thermal system $\left(\mathcal E,\mathcal M\right)$ defined by the slicing $\left(W^\parallel,\lambda^\parallel\right)$ of $\mathcal E$ is a new thermal system $\left(\mathcal E^\parallel,\mathcal M^\parallel\right)$ constructed as such:

• $\mathcal E^\parallel:=\left(W^\parallel,E^\parallel,\lambda^\parallel\right)$ is the slice of $\mathcal E$ corrsponding to the given slicing; and
• $\mathcal M^\parallel:=\bigsqcup_{e\in E^\parallel}M_e$.

The idea behind slicing is to make some extensive quantities become extrinsic parameters and not part of the system itself. It would physically mean fixing some extensive quantities. However, here is a problem: if we fix some extensive quantities, the dimension (“dimension” as in “dimensional analysis”) of the volume element in the space of extensive quantities would be changed. In other words, the dimension of $\lambda$ does not agree with $\lambda^\parallel$. This is physically not desirable because we want to keep the number of microstates dimensionless so that its logarithm does not depend on the units we use. However, this is not a problem because here is an argument: in any physical construction of a thermal system, it is fine to have non-dimensionless number of microstates, the cost is that the model must not be valid under low temperature; in mathematical construction, dimension is never a thing, so we do not even need to worry about it. In low temperature, we must use quantum statistical mechanics, where all quantities are quantized so that the number of microstates is literally the number of microstates, which must be dimensionless. In high temperature, we do not need the third law of thermodynamics, which is the only law that restricts how we should choose the zero (ground level) of the entropy, and in this case we may freely change our units because it only affects the entropy by an additive constant.

Example. In the example of a system of ideal gas, we may slice the space of extensive quantities to the slice $V=1$ to fix the volume.

## Isolations and the microcanonical ensemble

Here is a special type of slicing. Because a single point is an (zero-dimensional) affine subspace, it may form a slicing. Such a slicing fixes all of the extensive quantities. We may call it an isolating.

A thermal system with a zero-dimensional space of extensive quantities is called an isolated system. The physical meaning of such a system is that it is isolated from the outside so that it cannot exchange any extensive quantities with the outside. We may construct an isolated system out of an existing thermal system by the process of isolating.

Definition. An isolating (at $e^\circ$) of a space of extensive quantities $\left(W,E,\lambda\right)$ is a slicing $\left(W^\parallel,\lambda^\parallel\right)$ of it, constructed as

\begin{align*} W^\parallel&:=\left\{e^\circ\right\},\\ \lambda^\parallel(A)&:=\begin{cases}1,&A=\left\{e^\circ\right\},\\0,&A=\varnothing,\end{cases} \end{align*}

where $e^\circ\in E$.

Definition. An isolated system is a thermal system whose underlying affine space of its space of extensive quantities is a single-element set.

Definition. An isolation (at $e$) of a thermal system $\left(\mathcal E,\mathcal M\right)$ is the slice of it corresponding to the isolation at $e^\circ$ of $\mathcal E$.

Here is an obvious property of isolated systems: the measured value of extensive quantities of any state of an isolated system is $e^\circ$, the only possible value of the extensive quantities.

After introducing isolated systems, we can now introduce the equal a priori probability postulate. Although we may alternatively use other set of axioms to develop the theory of statistical ensembles, using the equal a priori probability postulate is a simple and traditional way to do it. Most importantly, this is a way that does not require us to define concepts like the temperature beforehand, which is a good thing for a mathematical formulation because it would require less mathematical structures or objects that are hard to well define at this stage.

Axiom (the equal a priori probability postulate). The equilibrium state of an isolated system is the uniform distribution.

Actually, instead of saying that this is an axiom, we may say that formally this is a definition of equilibrium states. However, I still prefer to call it an axiom because it only defines the equilibrium state of isolated systems rather than any thermal systems.

The equilibrium state of an isolated system $\left(\mathcal E,\mathcal M\right)$ may be written mathematically as

$p^\circ\!\left(\cdot\right):=\frac1{\mu\!\left(\mathcal M\right)}.$

(The circle in the superscript denotes equilibrium state.) After writing this out, we have successfully derived the microcanonical ensemble. We can then calculate the entropy of the state, which is

$\begin{equation} \label{eq: microcanonical entropy} S^\circ:=S\!\left[p^\circ\right]=\ln\mu(\mathcal M). \end{equation}$

Mentioning the entropy, a notable feature about the equilibrium state of an isolated system is that it is the state of the system that has the maximum entropy, and any state different from it has a lower entropy.

Theorem. For an isolated system, for any state $p$ of it,

$S[p]\le S^\circ,$

where $S^\circ$ is the entropy of the equilibrium state of it. The equality holds iff $p$ is the same state as the equilibrium state.

Proof

Proof. Define a probability measure $P^\circ$ on $\mathcal M$ by

$P^\circ(A):=\frac{\mu(A)}{\mu(\mathcal M)},$

then $\left(\mathcal M,\sigma\!\left(\mathcal M\right),P^\circ\right)$ is a probability space. Any state $p$, as a function on $\mathcal M$, can be regarded as a random variable in the probability space $\left(\mathcal M,\sigma\!\left(\mathcal M\right),P^\circ\right)$.

Define the real function

$\varphi(x):=\begin{cases} x\ln x,&x\in\left(0,+\infty\right),\\ 0,&x=0. \end{cases}$

It is a convex function, so according to the probabilistic form of Jensen’s inequality,

$\varphi\!\left(\mathrm E_{P^\circ}\!\left[p\right]\right) \le\mathrm E_{P^\circ}\!\left[\varphi\circ p\right].$

In other words,

$\frac1{\mu(\mathcal M)}\ln\frac1{\mu(\mathcal M)} \le\int_{m\in\mathcal M}p\!\left(m\right)\ln p\!\left(m\right) \,\frac{\mathrm d\mu\!\left(m\right)}{\mu(\mathcal M)}.$

Then, it follows immediately that $S[p]\le S^\circ$. The equality holds iff $\varphi$ is linear on a convex set $A\subseteq\left[0,+\infty\right)$ such that the value of the random variable $p$ is $P^\circ$-almost surely in $A$. However, because $\varphi$ non-linear on any set with more than two points, the only possibility is that the value of $p$ is $P^\circ$-almost surely a constant, which means that the probability distribution defined by the probability density function $p$ is equal to the uniform distribution $\mu$-almost everywhere. Therefore, the equality holds iff $p$ is the same state as the equilibrium state. $\square$

This theorem is the well-known relation between the entropy and the equilibrium state.

By Equation \ref{eq: microcanonical entropy}, we can now derive the relationship between the entropy and the extensive quantities at equilibrium states by the process of isolating. Define a family of states $\left\{p^\circ_e\right\}_{e\in E}$, where each state $p^\circ_e$ is the equilibrium state of the system isolated at $e$. Then, we have the fundamental equation

$\begin{equation} \label{eq: mce fundamental eq} S^\circ(e)=\ln\Omega(e), \end{equation}$

where $\Omega(e):=\ln\mu_e\!\left(M_e\right)$ is called the counting function (I invented the phrase), which is the microscopic characteristic function of microcanonical ensembles. This defines a function $S^\circ:E\to\mathbb R$, which may be used to give a fundamental equation in the form of Equation \ref{eq: fundamental equation}, and it is the macroscopic characteristic function of microcanonical ensembles.

We will encounter microscopic or macroscopic characteristic functions for other ensembles later.

Example. In the example of a system of a tank of ideal atomic gas, we have the fundamental equation

$S^\circ=\ln\!\left(\frac1{h^{3N}N!}V^NS_{3N-1}\!\left(\sqrt{2mU}\right)\right),$

where $S_n(r)$ is the surface area of an $n$-sphere with radius $r$, which is proportional to $r^n$. Taking its derivative w.r.t. $U,V,N$ and taking the thermodynamic limit will recover familiar results.

## Contracting

I have previously mentioned that the other way of deriving a new system out of an existing one is called contracting. Now we should introduce this concept because it is very useful later when we need to define the contact between subsystems of a composite system (whose definition will be given later).

The idea behind contracting is also to reduce the dimension of the space of extensive quantities. However, rather than making some of the extensive quantities extrinsic parameters, it makes them “intrinsic” within the space of microstates. A vivid analogy is this: imagine a thermal system as many boxes of microstates with each box labeled by specific values of extensive quantities, then we partition those boxes to classify them, and put all the boxes in each partition into one larger box. The new set of larger boxes are labeled by a specific values of fewer extensive quantities, and it is the so-called contraction of the origional set of boxes.

I call it contracting because it is like contracting the affine space of extensive quantities into a flat sheet of its subspace. The way we do this should be described by a projection. A projection in affine space maps the whole space into one of its affine subspace, and the preimage of each point in the subspace is another affine subspace of the original space. The preimages forms a family of parallel affine subspaces labeled by their image under the projection. The family of affine subspaces may be used to define a family of slices of the space of extensive quantities or the thermal system, which are useful when defining the contraction of the space of extensive quantities or the system.

Definition. A contracting of a space of extensive quantities $\left(W,E,\lambda\right)$ is given by a tuple $\left(\pi,\lambda^\perp\right)$, where

• $\pi:W\to W^\perp$ is a projection map from $W$ to an affine subspace $W^\perp$ of $W$;
• $E^\perp:=\pi(E)$, the image of $E$ under $\pi$, is equipped with the minimal topology $\tau\!\left(E^\perp\right)$ so that $\pi$ is continuous, and the topology makes $E^\perp$ Polish;
• $\lambda^\perp:\sigma\!\left(E^\perp\right)\to\left[0,+\infty\right]$ is a non-trivial σ-finite Borel measure on $E^\perp$, where $\sigma\!\left(E^\perp\right)\supseteq\mathfrak B\!\left(E^\perp\right)$ is a σ-algebra of $E^\perp$ that contains the Borel σ-algebra of $E^\perp$; and
• For any $A\in\sigma\!\left(E^\perp\right)$, $\lambda^{\perp}(A)=0$ iff $\lambda\!\left(\pi^{-1}(A)\right)=0$.

This contracting defines a new space of extensive quantities $\left(W^\perp,E^\perp,\lambda^\perp\right)$, called a contraction of the original space of extensive quantities $\left(W,E,\lambda\right)$.

Definition. The contractive slicings of a space of extensive quantities $\left(W,E,\lambda\right)$ defined by a contracting $\left(\pi,\lambda^\perp\right)$ of it is a family of slicings $\bigsqcup_{e\in W^\perp}\left(W^\parallel_e,\lambda^\parallel_e\right)$, where

• $W^\parallel_e:=\pi^{-1}(e)$ is the preimage of $\left\{e\right\}$ under $\pi$, an affine subspace of $W$; and
• $\lambda_e^\parallel:\sigma\!\left(E_e^\parallel\right)\to\left[0,+\infty\right]$ is a Borel measure; the family of measures is the disintegration of $\lambda$ w.r.t. $\pi$ and $\lambda^\perp$.

Definition. A contraction of a thermal system $\left(\mathcal E,\mathcal M\right)$ defined by the contracting $\left(\pi,\lambda^\perp\right)$ of $\mathcal E$ is a new thermal system $\left(\mathcal E^\perp,\mathcal M^\perp\right)$ constructed as such:

• $\mathcal E^\perp:=\left(W^\perp,E^\perp,\lambda^\perp\right)$ is the contraction of $\mathcal E$ corresponding to the given contracting;
• $\mathcal M^\perp:=\bigsqcup_{e\in E^\perp}M_e^\perp$, where for each $e\in E^\perp$, $M_e^\perp:=\mathcal M_e^\parallel$; the family of systems $\left(\mathcal E_e^\parallel,\mathcal M_e^\parallel\right)$ (labeled by $e\in E^\perp$) are slices of $\left(\mathcal E,\mathcal M\right)$ corresponding to the contractive slicings of $\mathcal E$ defined by the contracting $\left(\pi,\lambda^\perp\right)$; the measure equipped on $\mathcal M_e^\parallel$ is the measure of number of microstates of $\left(\mathcal E_e^\parallel,\mathcal M_e^\parallel\right)$.

In some cases, the total number of microstates in $\mathcal M^\parallel_e$ is not finite for some $e$, then the contraction is not defined in this case.

Example. For the thermal system of a solid consisting of spin-$\frac12$ particles, define a constracting $\left(\pi,\lambda^\perp\right)$ by

\begin{align*} \pi\!\left(U,N\right)&:=N,\\ \lambda^\perp\!\left(A\right)&:=\operatorname{card}A. \end{align*}

Then the corresponding contraction of the thermal system may be written as a thermal system $\left(\left(W,E,\lambda\right),\bigsqcup_{e\in E}M_e\right)$, where

\begin{align*} W&:=\mathbb R,\\ E&:=\mathbb Z^+,\\ \lambda\!\left(A\right)&:=\operatorname{card}A,\\ M_N&:=\left\{0,1\right\}^N,\\ \mu_N\!\left(A\right)&:=\operatorname{card}A. \end{align*}

Different from a slice of a system, a contraction of a system does not have the problem about the dimension (“dimension” as in “dimensional analysis”) of the measure on the space of extensive quantities. Although the dimension of $\lambda^\perp$ is different from $\lambda$, the dimension of $\mu^\perp_e$ (the measure on $M^\perp_e$) is also different from $\mu$, and they change together in such a way that the resultant $\mu^\perp$ (the measure of number of microstates on $\mathcal M^\perp$) has the same dimension as $\mu$.

This fact actually hints us that a contraction of a thermal system is essentially the same as the original thermal system in such a sense that the microstates in the two systems are naturally one-to-one connected. Indeed, the natural bijection from $\mathcal M$ to $\mathcal M^\perp$ is given by $\left(e,m\right)\mapsto\left(\pi(e),\left(e,m\right)\right)$. It is obvious that for any measurable function $f$ on $\mathcal M^\perp$ we have

$\int_{\left(e,m\right)\in\mathcal M}f\!\left(\pi(e),(e,m)\right)\mathrm d\mu(e,m) =\int_{\left(e,m\right)\in\mathcal M^\perp}f\!\left(e,m\right)\mathrm d\mu^\perp(e,m).$

Using this map, we can pull back any function $f^\perp$ on $\mathcal M^\perp$ to become a function on $\mathcal M$ by

$f\!\left(e,m\right):=f^\perp\!\left(\pi(e),\left(e,m\right)\right)$

and the other way around. I want to call $f$ the contractional pullback of $f^\perp$ under $\pi$ and call $f^\perp$ the contractional pushforward of $f$ under $\pi$. Specially, we may pull back any state $p^\perp$ of a contraction to become a state $p$ on the original thermal system. We will see that pullbacks of states are rather useful.

Obviously, the family of affine subspaces $\left\{W^\parallel_e\right\}_{e\in W^\perp}$ are parallel to each other. Therefore, their associated vector subspaces are the same vector subspace $\vec W^\parallel$ of $\vec W$, which is a complement of the vector subspace $\vec W^\perp$, the vector space that $W^\perp$ is associated with. We can write

$\vec W=\vec W^\perp+\vec W^\parallel,\quad W=W^\perp+\vec W^\parallel.$

Each point in $W$ can be written in the form of $e+s$, where $e\in W^\perp$ and $s\in\vec W^\parallel$. Furthermore, for any $e\in W^\perp$, the map $s\mapsto e+s$ is a bijection from $\vec W^\parallel$ to $W^\parallel_e$. This bijection can then push forward linear operations from $\vec W^\parallel$ to $W^\parallel_e$. For example, we can define the action of some continuous linear functional $i\in\vec W^{\parallel\prime}$ on a point $e’\in W^\parallel_e$ as

$\begin{equation} \label{eq: linear op on affine} i\!\left(e'\right):=i\!\left(e'-\pi\!\left(e'\right)\right), \end{equation}$

where $\pi\!\left(e’\right)$ is just $e$.

However, we need to remember that there is no generally physically meaningful linear structure on $W^\parallel_e$. The linear structure that we have constructed is just for convenience in notations.

An interesting fact about slicing, isolating, and contracting is that: an isolation of a contraction is a contraction of a slice.

Suppose we have a thermal system $\left(\mathcal E,\mathcal M\right)$, and by a contracting $\left(\pi,\lambda^\perp\right)$ we derive its contraction $\left(\mathcal E^\perp,\mathcal M^\perp\right)$.

Now, consider one of its contractive slices $\left(\mathcal E^\parallel_{e^\circ},\mathcal M^\parallel_{e^\circ}\right)$, where $e^\circ\in E^\perp$. Then, we contract this slice by the contracting $\left(\pi,\lambda^{\perp\prime}\right)$, where $\pi$ is the same $\pi$ as used above but whose domain is restricted to $W^\parallel_{e^\circ}$, and $\lambda^{\perp\prime}$ is the counting measure. Because the whole $W^\parallel_{e^\circ}$ is mapped to $e^\circ$ under $\pi$, the contraction becomes an isolated system whose only possible value of extensive quantities is $e^\circ$. Its spaces of microstates consist of only one measure space, which is $\mathcal M^\parallel_{e^\circ}$.

On the other hand, consider isolating $\left(\mathcal E^\perp,\mathcal M^\perp\right)$ at $e^\circ$. Its isolation at $e^\circ$ is an isolated system whose only possible value of extensive quantities is $e^\circ$. Its spaces of microstates consist of only one measure space, which is $M^\perp_{e^\circ}$, which is the same as $\mathcal M^\parallel_{e^\circ}$.

Therefore, an isolation of a contraction is a contraction of a slice.

This fact is useful because it enables us to find the equilibrium state of a slice. Using microcanonical ensemble, we can already find the equilibrium state of any isolated system, so we can find the equilibrium state of an isolation of a contraction. Then, it is the equilibrium state of a contraction of a slice. Then, by the contractional pullback, it is the equilibrium state of a slice.

## Thermal contact

Composite systems are systems that are composed of other systems. This is a useful concept because it allows us to treat multiple systems as a whole. The motivation of develop this concept is that we should use it to derive the canonical ensemble and the grand canonical ensemble. In those ensembles, the system is not isolated but in contact with a bath. To consider them as a whole system, we need to define composite systems.

The simplest case of a composite system is where the subsystems are independent of each other. Physically, this means that the subsystems do not have any thermodynamic contact between each other. I would like to call the simplest case a product thermal system just as how mathematicians name their product spaces constructed out of existing spaces.

Definition. The product space of extensive quantities of two spaces of extensive quantities $\left(W^{(1)},E^{(1)},\lambda^{(1)}\right)$ and $\left(W^{(2)},E^{(2)},\lambda^{(2)}\right)$ is a space of extensive quantities $\left(W,E,\lambda\right)$ constructed as such:

• $W:=W^{(1)}\times W^{(2)}$ is the product affine space of $W^{(1)}$ and $W^{(2)}$;
• $E:=E^{(1)}\times E^{(2)}$ is the product topological space as well as the product measure space of $E^{(1)}$ and $E^{(2)}$; and
• $\lambda$ is the product measure of $\lambda^{(1)}$ and $\lambda^{(2)}$, whose uniqueness is guaranteed by the σ-finiteness of $\lambda^{(1)}$ and $\lambda^{(2)}$.

Definition. The product thermal system of two thermal systems $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ and $\left(\mathcal E^{(2)},\mathcal M^{(2)}\right)$ is a thermal system $\left(\mathcal E,\mathcal M\right)$ constructed as such:

• $\mathcal E:=\left(W,E,\lambda\right)$ is the product space of extensive quantities of $\mathcal E^{(1)}$ and $\mathcal E^{(2)}$; and
• $\mathcal M:=\bigsqcup_{(e_1,e_2)\in E}M_{e_1,e_2}$, where $M_{e_1,e_2}:=M^{(1)}_{e_1}\times M^{(2)}_{e_2}$ is the product measure space of $M^{(1)}_{e_1}$ and $M^{(2)}_{e_2}$, equipped with measure $\mu_{e_1,e_2}$, the product measure of $\mu^{(1)}_{e_1}$ and $\mu^{(2)}_{e_2}$.

By this definition, $\mathcal M$ is naturally identified with $\mathcal M^{(1)}\times\mathcal M^{(2)}$, and the measure of number of microstates $\mu$ on $\mathcal M$ is in this sense the same as the product measure of $\mu^{(1)}$ and $\mu^{(2)}$ (the measures of number of microstates on $\mathcal M^{(1)}$ and $\mathcal M^{(2)}$). We can project elements in $\mathcal M$ back into $\mathcal M^{(1)}$ and $\mathcal M^{(2)}$ by the map $(e_1,e_2,m_1,m_2)\mapsto(e_1,m_1)$ and the map $(e_1,e_2,m_1,m_2)\mapsto(e_2,m_2)$.

This hints us that a probability distribution on $\mathcal M$ (which may be given by a state $p$ of $(\mathcal E,\mathcal M)$) can be viewed as a joint probability distribution of the two random variables on $\mathcal M$: $(e_1,e_2,m_1,m_2)\mapsto(e_1,m_1)$ and $(e_1,e_2,m_1,m_2)\mapsto(e_2,m_2)$. As we all know, a joint distribution encodes conditional distributions and marginal distributions. Therefore, given any state of a product thermal system, we can define its conditional states and marginal states of the subsystems. Conditional states are not very useful because they are not physically observed states of subsystems. The physically observed states of subsystems are marginal states, so marginal states are of special interest.

Definition. Given a state $p$ of the product thermal system $(\mathcal E,\mathcal M)$ of $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ and $\left(\mathcal E^{(2)},\mathcal M^{(2)}\right)$, its marginal state of the subsystem $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ is a state $p^{(1)}$ of the system $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ defined by

$p^{(1)}\!\left(e_1,m_1\right):=\int_{\left(e_2,m_2\right)\in\mathcal M^{(2)}} p\!\left(e_1,e_2,m_1,m_2\right)\mathrm d\mu^{(2)}\!\left(e_2,m_2\right).$

Physically, if a product thermal system is in equilibrium, then each of its subsystems is in equilibrium as well. Therefore, if $p^\circ$ is an equilibrium state of the product thermal system, then the marginal states of $p^\circ$ are equilibrium states of the subsystems.

Now, we need to consider how to describe the thermodynamic contact between subsystems. In the simplest case, where there is no thermodynamic contact between subsystems, the composite system is just the product thermal system of the subsystems, and the dimension of its space of extensive quantities is the sum of the that of the subsystems’. If there is some thermal contact between subsystems, then the dimension of the space of extensive quantities of the composite system will be less than that of the product thermal system. For example, if the subsystems are allowed to exchange energy, then two original extensive quantities (the energy of the first subsystem and that of the second subsystem) will be replaced by a single extensive quantity (the total energy of the composite system). Such a reduction in the dimension of the space of extensive quantities is the same as contracting that we defined above. Therefore, we can define a thermally composite system as a contraction of the product thermal system. Denote the projection map of the contracting as $\pi:W\to W^\perp:(e_1,e_2)\mapsto e$. (From now on in this section, composite systems refer to thermally composite system. I will introduce non-thermally composite systems later (in part 2), which describe non-thermal contacts between subsystems and are more complicated.)

Besides being the contraction of the product thermal system, there is an additional requirement. Given the extensive quantities of the composite system and those of one of the subsystems, we should be able to deduce those of the other subsystem. For example, if the subsystems are allowed to exchange energy, then the total energy of the composite system minus the energy of one of the subsystems should be the energy of the other subsystem, which is uniquely determined (if this is an allowed energy). Mathematically, thie means that for any $e_1\in W^{(1)}$ and $e_2\in W^{(2)}$, the two maps $\pi\!\left(e_1,\cdot\right)$ and $\pi\!\left(\cdot,e_2\right)$ are both injections.

Definition. A (thermally) composite thermal system of two thermal systems is the contraction of their product thermal system corresponding to a contracting $(\pi,\lambda^\perp)$, where $\pi:W\to W^\perp:(e_1,e_2)\mapsto e$ satisfies that for any $e_1\in W^{(1)}$ and $e_2\in W^{(2)}$, the two maps $\pi\!\left(e_1,\cdot\right)$ and $\pi\!\left(\cdot,e_2\right)$ are both injections.

We may define projection maps to get the extensive quantities of the subsystems from those of the composite system:

$c^{(1)}:W\to W^{(1)}:(e_1,e_2)\mapsto e_1,\quad c^{(2)}:W\to W^{(2)}:(e_1,e_2)\mapsto e_2.$

Then, for each $e\in W^\perp$, the two spaces

$W^{\parallel(1)}_e:=c^{(1)}\!\left(W_e^\parallel\right),\quad W^{\parallel(2)}_e:=c^{(2)}\!\left(W_e^\parallel\right)$

are respectively affine subspaces of $W^{(1)}$ and $W^{(2)}$, where $W_e^\parallel:=\pi^{-1}\!\left(e\right)$. The two affine subspaces are actually isomorphic to each other because of our additional requirement on the projection map $\pi$. Because $\pi\!\left(e_1,\cdot\right)$ is an injection, for any $e_1\in W^{\parallel(1)}_e$ there is a unique $e_2\in W^{\parallel(2)}_e$ such that $\pi\!\left(e_1,e_2\right)=e$, and vice versa. This gives a correspondence between the two affine subspaces. In other words, for each $e\in W^\perp$, there is a unique bijection $\rho_e:W^{\parallel(1)}_e\to W^{\parallel(2)}_e$ such that

$\begin{equation} \label{eq: pi and rho_e} \forall e_1\in W^{\parallel(1)}_e: \pi\!\left(e_1,e_2\right)=e\Leftrightarrow e_2=\rho_e\!\left(e_1\right). \end{equation}$

The bijection $\rho_e$ is an affine isomorphism from $W^{\parallel(1)}_e$ to $W^{\parallel(2)}_e$.

What is more, $c^{(1)}$ is an affine isomorphism from $W^{\parallel}_e$ to $W^{\parallel(1)}_e$, and $c^{(2)}$ is an affine isomorphism from $W^{\parallel}_e$ to $W^{\parallel(2)}_e$. The three affine spaces $W^{\parallel}_e,W^{\parallel(1)}_e,W^{\parallel(2)}_e$ are then mutually isomorphic.

Example. Suppose we have two thermal systems, each of them have two extensive quantities called the energy and the number of particles. We write them as $\left(U_1,N_1\right)$ and $\left(U_2,N_2\right)$. They are in thermal contact so that they can exchange energy but not particles. Then, the extensive quantities of the composite system may be written as $\left(U/2,U/2,N_1,N_2\right)$, with $\pi:\left(U_1,U_2\right)\mapsto\left(U/2,U/2\right)$ defined as

$\pi\!\left(U_1,U_2\right):=\left(\frac{U_1+U_2}2,\frac{U_1+U_2}2\right).$

The isomorphism $\rho_{U/2,U/2,N_1,N_2}$ is then

$\rho_{U/2,U/2,N_1,N_2}\!\left(U_1,N_1\right)=\left(U-U_1,N_2\right).$

The contracting is not unique. For example, $\left(U_1,U_2\right)\mapsto\left(3U/4,U/4\right)$ is another valid projection for constructing the composite thermal system, and it has exactly the same physical meaning as the one I constructed above.

The isomorphism from $W^{\parallel}_e$ can push forward the measure $\lambda^\parallel_e$ on $E^\parallel_e$ to a new measure $\lambda^{\parallel(1)}_e$ on $E^{\parallel(1)}_e$. Then, $\left(W^{\parallel(1)}_e,\lambda^{\parallel(1)}_e\right)$ is a slicing of $\left(W^{(1)},E^{(1)},\lambda^{(1)}\right)$, and we can get a slice $\left(\mathcal E^{\parallel(1)}_e,\mathcal M^{\parallel(1)}_e\right)$ of $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ out of this slicing. I would like to call this slice the compositing slice of $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ at $e$. Similarly, we define compositing slices of $\left(\mathcal E^{(2)},\mathcal M^{(2)}\right)$, denoted as $\left(\mathcal E^{\parallel(2)}_e,\mathcal M^{\parallel(2)}_e\right)$.

Similarly to how we can define marginal states of subsystems of a product thermal system, we can define marginal states of the compositing slices given a state of a contractive slice of the composite system. However, this time, there is a key difference: the subsystems (compositing slices) have isomorphic and completely dependent (deterministic) extensive quantities instead of having completely independent extensive quantities. Taken this into account, we can define marginal states of compositing slices as follows:

$\begin{equation} \label{eq: slice marginal state} p^{\parallel(1)}\!\left(e_1,m_1\right) :=\int_{m_2\in M^{(2)}_{\rho_e(e_1)}}p^\parallel\!\left(e_1,\rho_e(e_1),m_1,m_2\right) \mathrm d\mu^{(2)}_{\rho_e(e_1)}\!\left(m_2\right), \end{equation}$

where $p^{\parallel(1)}$ is a state of $\left(\mathcal E^{\parallel(1)}_e,\mathcal M^{\parallel(1)}_e\right)$, and $p^\parallel$ is a state of $\left(\mathcal E^{\parallel}_e,\mathcal M^{\parallel}_e\right)$ (a contractive slice of the composite system).

There is an additional property that $\rho_e$ has.

As we all know, an affine map is a linear map combined with a translation:

$\begin{equation} \label{eq: rho_e and vec rho} \rho_e\!\left(e_1\right)=\vec\rho\!\left(e_1-e_0\right)+\rho_e\!\left(e_0\right), \end{equation}$

where $e_0$ is a fixed point in $W^{\parallel(1)}_e$, and $\vec\rho:\vec W^{\parallel(1)}_e\to \vec W^{\parallel(2)}_e$ is a linear map that is independent of the choice of $e_0$. Because $\rho_e$ is a bijection, $\vec\rho$ is also a bijection, and is thus a linear isomorphism from $\vec W^{\parallel(1)}_e$ to $\vec W^{\parallel(2)}_e$.

Because different slices $W^{\parallel(1)}_e$ with different $e$ are parallel to each other, actually $\vec W^{\parallel(1)}_e$ is the same vector subspace of $\vec W^{(1)}$ for any $e\in W^\perp$. We can write it as $\vec W^{\parallel(1)}$. Similarly, $\vec W^{\parallel(2)}_e$ is the same vector subspace $\vec W^{\parallel(2)}$ of $\vec W^{(2)}$ for any $e\in W^\perp$. Therefore, we can say $\vec\rho$ is a linear isomorphism from $\vec W^{\parallel(1)}$ to $\vec W^{\parallel(2)}$.

Then, here is the interesting claim:

Theorem. The linear map $\vec\rho$ defined above is independent of the choice of $e$.

Proof

Proof. Because $\pi$ is an affine map, we have

$\pi\!\left(e_1,e_2\right) =\vec\pi\!\left(e_1-e_0,e_2-\rho_e\!\left(e_0\right)\right)+\pi\!\left(e_0,\rho_e\!\left(e_0\right)\right),$

where $e\in W^\perp$ is fixed, $e_0\in W^{\parallel(1)}_e$ is also fixed, and $\vec\pi:\vec W\to\vec W^\perp$ is a linear map that is independent of the choice of $e$ and $e_0$.

Let $e_2:=\rho_e\!\left(e_1\right)$ in the equation above, and we have

$\pi\!\left(e_1,\rho_e\!\left(e_1\right)\right) =\vec\pi\!\left(e_1-e_0,\rho_e\!\left(e_1\right)-\rho_e\!\left(e_0\right)\right) +\pi\!\left(e_0,\rho_e\!\left(e_0\right)\right).$

According to Equation \ref{eq: pi and rho_e} and \ref{eq: rho_e and vec rho}, we have

$e=\vec\pi\!\left(e_1-e_0,\vec\rho\!\left(e_1-e_0\right)\right)+e.$

In other words,

$\begin{equation} \label{eq: pi(s1, rho(s1))=0} \vec\pi\!\left(s_1,\vec\rho\!\left(s_1\right)\right)=0, \end{equation}$

where $s_1\in\vec W^{\parallel(1)}$ is an arbitrary vector.

Prove by contradition. Assume that $\vec\rho$ is dependent on the choice of $e$, then there exists two choices of $e$ such that we have two different $\vec\rho$’s, denoted as $\vec\rho$ and $\vec\rho’$. Because they are different maps, there exists an $s_1\in\vec W^{\parallel(1)}$ such that $\vec\rho(s_1)\ne\vec\rho’(s_1)$.

On the other hand, we have

$\vec\pi\!\left(s_1,\vec\rho\!\left(s_1\right)\right)=0,\quad \vec\pi\!\left(s_1,\vec\rho'\!\left(s_1\right)\right)=0.$

Subtract the two equations, and because of the linearity of $l$, we have

$\vec\pi\!\left(0,\delta\right)=0,$

where $\delta:=\vec\rho(s_1)-\vec\rho’(s_1)$ is a nonzero vector. Then, we have

$\pi\!\left(e_1,e_2+\delta\right)-\pi\!\left(e_1,e_2\right)=\vec\pi(0,\delta)=0,$

which contradicts with the requirement that $\pi\!\left(e_1,\cdot\right)$ is injective. $\square$

Besides, because $\vec\rho$ is a linear isomorphism from $\vec W^{\parallel(1)}$ to $\vec W^{\parallel(2)}$, the map $i_1\mapsto i_1\circ\vec\rho^{-1}$ is a linear isomorphism from $\vec W^{\parallel(1)\prime}$ to $\vec W^{\parallel(2)\prime}$. The inverse of this isomorphism is $i_2\mapsto i_2\circ\vec\rho$.

As we know, $i_1$ and $i_2$ are actually intensive quantities. The physical meaning of them being each other’s image/preimage under this isomorphism is that, if the two subsystems in thermal contact have intensive quantities $-i_1$ and $i_2$ respectively, then they are in equilibrium with each other. Therefore, I would like to call this pair of intensive quantities to be anticonsistent.

Since we have a family of slices called the compositing slices of a subsystem, can we make them the contractive slices of some contracting of the subsystem? Well, it depends. The first difficulty is that $W^{\parallel(1)}_e$ may be the same subspace of $W^{(1)}$ for different $e\in W^\perp$ and thus make $E^{\parallel(1)}_e$ equipped with possibly different measures.

Anyway, ignore this at this stage. Let me first construct a subspace $W^{\perp(1)}$ and a projection $\pi^{(1)}:W^{(1)}\to W^{\perp(1)}$ so that $W^{\parallel(1)}_e$ are preimages of points in $W^{\perp(1)}$, and then see what will happen.

Since any vector subspace has a complement, we can pick a subspace of $\vec W^{(1)}$ that is a complement of $\vec W^{\parallel(1)}$ and call it $\vec W^{\perp(1)}$. Any vector in $\vec W^{(1)}$ can be uniquely decomposed into the sum of a vector in $\vec W^{\perp(1)}$ and a vector in $\vec W^{\parallel(1)}$.

Then, we pick some fixed $e_1\in W^{(1)}$, and it can be used to generate an affine subspace $W^{\perp(1)}:=e_1+\vec W^{\perp(1)}$ of $W^{(1)}$. Then, each point in $W^{(1)}$ can be uniquely decomposed into the sum of a point in $W^{\perp(1)}$ and a vector in $\vec W^{\parallel(1)}$. Such unique decompositions can be encoded into a projection map $\pi^{(1)}:W^{(1)}\to W^{\perp(1)}$.

It seems that we are now halfway to the construction of our contracting. However, before we proceed, I would like to prove a property of $W^{\perp(1)}$ we construct:

Theorem. The map $\pi$ is an affine isomorphism from the product affine space $W^{\perp(1)}\times W^{(2)}$ to $W^\perp$.

Proof

Proof. The map $\pi$ is itself affine, so we just need to prove that it is injective and surjective.

To prove it is injective, suppose that we have two points $(e_1,e_2)$ and $(e_1’,e_2’)$ in $W^{\perp(1)}\times W^{(2)}$, such that

$\pi\!\left(e_1,e_2\right)=\pi\!\left(e_1',e_2'\right)=:e.$

Then, we have

$\left(e_1,e_2\right),\left(e_1',e_2'\right)\in W^\parallel_e.$

Therefore, $e_1,e_1’\in W^{\parallel(1)}_e$, so

$e_1-e_1'\in\vec W^{\parallel(1)}.$

On the other hand, because $e_1,e_1’\in W^{\perp(1)}$, we have

$e_1-e_1'\in\vec W^{\perp(1)}.$

Because $\vec W^{\perp(1)}$ is a complement of $\vec W^{\parallel(1)}$, the only possible case is that $e_1=e_1’$. Then, due to $\pi\!\left(e_1,\cdot\right)$ being injective, $e_2=e_2’$. Therefore, $\left(e_1,e_2\right)=\left(e_1’,e_2’\right)$. Therefore, $\pi$ is injective if its domain is restricted to $W^{\perp(1)}\times W^{(2)}$.

To prove it is surjective, suppose $e\in W^\perp$. Because $\pi$ is surjective from $W$ to $W^\perp$, there exists some $\left(e_1’,e_2’\right)\in W$ such that

$\pi\!\left(e_1',e_2'\right)=e.$

According to Equation \ref{eq: pi and rho_e}, this is equivalently

$e_2'=\rho_e\!\left(e_1'\right).$

We can uniquely decompose $e_1’\in W^{(1)}$ into the sum of a point $e_1\in W^{\perp(1)}$ and a vector $\delta\in\vec W^{\parallel(1)}$. Then, according to Equation \ref{eq: rho_e and vec rho}, we have

$e_2'=\rho_e\!\left(e_1+\delta\right)=\rho_e\!\left(e_1\right)+\vec\rho\!\left(\delta\right).$

Thus $e_2:=e_2’-\vec\rho\!\left(\delta\right)=\rho_e\!\left(e_1\right)$. According to Equation \ref{eq: pi and rho_e}, this is equivalently

$\pi\!\left(e_1,e_2\right)=e.$

Therefore, $\left(e_1,e_2\right)\in W^{\perp(1)}\times W^{(2)}$ is the desired point in $W^{\perp(1)}\times W^{(2)}$ that is mapped to $e$ under $\pi$. Therefore, $\pi$ is surjective if its domain is restricted to $W^{\perp(1)}\times W^{(2)}$. $\square$

Then, it seems that if we need a measure on $E^{\perp(1)}$ that is consistent with our theory, the product measure of it and that on $E^{(2)}$ should be equal to that on $E^\perp$. However, it is not always possible to find such a measure. This is our second difficulty.

Therefore, in order to construct a contracting, we need to following assumptions:

• For different $e\in E^\perp$, $\lambda^{\parallel(1)}_e$ is the same measure whenever $W^{\parallel(1)}_e$ is the same subspace.
• There exists a measure $\lambda^{\perp(1)}$ on $E^{\perp(1)}$ so that $\lambda^\perp$ is the pushforward of the product measure of $\lambda^{\perp(1)}$ and $\lambda^{(2)}$ under $\pi$.

Given those assumptions, if we define $\lambda^{\parallel(1)\prime}_{e_1}$ to be the measures from the disintegration of $\lambda^{(1)}$ w.r.t. $\pi^{(1)}$ and $\lambda^{\perp(1)}$ (just the way we constructed the measures in constructive slicings), then we can verify that they are actually the same as $\lambda^{\parallel(1)}_e$ defined before, for any $e$ in the image of $\pi\!\left(e_1,\cdot\right)$. You can verify this easily by the following check (not a rigorous proof), where $\otimes$ denotes product measures or integration:

$\lambda=\lambda^{\perp}\otimes\left\{\lambda^\parallel_e\right\} =\lambda^{\perp(1)}\otimes\lambda^{(2)}\otimes\left\{\lambda^\parallel_e\right\}.$

On the other hand,

$\lambda=\lambda^{(1)}\otimes\lambda^{(2)} =\lambda^{\perp(1)}\otimes\left\{\lambda^{\parallel(1)\prime}_{e_1}\right\}\otimes\lambda^{(2)}.$

Comparing them, we have

$\left\{\lambda^{\parallel(1)\prime}_{e_1}\right\}=\left\{\lambda^\parallel_e\right\} =\left\{\lambda^{\parallel(1)}_e\right\}.$

An explicit verification is more tedious and is omitted here.

Those assumptions are very strong, so we do not want to assume them. Without those assumptions, we still have a well-constructed $W^{\perp(1)}$ and $\pi^{(1)}$ so that $W^{\parallel(1)}_e$ are preimages of points in $W^{\perp(1)}$ under $\pi$. Then, we can use similar tricks as Equation \ref{eq: linear op on affine} to define the action of any continuous linear functional $i_1\in\vec W^{\parallel(1)\prime}$ on a point $e_1\in W^{(1)}$ as

$i_1\!\left(e_1\right):=i_1\!\left(e_1-\pi^{(1)}\!\left(e_1\right)\right).$

We can also do the same thing on $W^{(2)}$. Then, an interesting thing to notice is that if we have $e_1\in W^{(1)}$ and $e_2\in W^{(2)}$ such that

$e:=\pi\!\left(e_1,e_2\right) =\pi\!\left(\pi^{(1)}\!\left(e_1\right),\pi^{(2)}\!\left(e_2\right)\right),$

then we have

$i_1\!\left(e_1\right)=i_2\!\left(e_2\right),$

where $i_1\in\vec W^{\parallel(1)\prime}$ and $i_2\in\vec W^{\parallel(2)\prime}$ are anticonsistent to each other.

Example. In the example of two thermal systems that can exchange energy but not number of particles, we may choose

$\pi^{(1)}\!\left(U_1,N_1\right):=\left(0,N_1\right),\quad \pi^{(2)}\!\left(U_2,N_2\right):=\left(0,N_2\right).$

Such projections are not unique, but this is the simplest one and also the most natural one considering their physical meanings.

We have newly defined some vector spaces. There are interesting relations between them:

Theorem.

$\vec W^{\perp\parallel}:=\vec\pi\!\left(\vec W^{\parallel(1)}+\vec W^{\parallel(2)}\right) =\vec\pi\!\left(\vec W^{\parallel(1)}\right)=\vec\pi\!\left(\vec W^{\parallel(2)}\right).$
Proof

Proof. Obviously $\vec\pi\!\left(\vec W^{\parallel(2)}\right)\subseteq \vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right)$, so we just need to prove that $\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right) \subseteq\vec\pi\!\left(\vec W^{\parallel(2)}\right)$. To prove this, we just need to prove that for any

$s:=\vec\pi\!\left(s_1,s_2\right)\in\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right),$

where $s_1\in\vec W^{\parallel(1)}$ and $s_2\in\vec W^{\parallel(2)}$, we have $s\in\vec\pi\!\left(\vec W^{\parallel(2)}\right)$. To prove this, subtract Equation \ref{eq: pi(s1, rho(s1))=0} from the definition of $s$, and we have

$s=\vec\pi\!\left(0,s_2-\vec\rho\!\left(s_1\right)\right)\in\vec\pi\!\left(\vec W^{\parallel(2)}\right).$

Therefore, $\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right) \subseteq\vec\pi\!\left(\vec W^{\parallel(2)}\right)$. Similarly, $\vec\pi\!\left(\vec W^{\parallel(1)}\times\vec W^{\parallel(2)}\right) \subseteq\vec\pi\!\left(\vec W^{\parallel(1)}\right)$. Therefore, we proved the theorem. $\square$

Here we defined a new vector space $\vec W^{\perp\parallel}$. Obviously it is a subspace of $\vec W^\perp$. Because $\vec\pi(s_1,\cdot)$ and $\vec\pi(\cdot,s_2)$ are injective, $\vec\pi$ is a linear isomorphism from $\vec W^{\parallel(1)}$ to $\vec W^{\perp\parallel}$ and a linear isomorphism from $\vec W^{\parallel(2)}$ to $\vec W^{\perp\parallel}$.

Theorem. Suppose $e,e’\in W^\perp$. Iff $W^{\parallel(1)}_e=W^{\parallel(1)}_{e’}$ and $W^{\parallel(2)}_e=W^{\parallel(2)}_{e’}$, then $e’-e\in\vec W^{\perp\parallel}$.

Proof

Proof. First, prove the “if” direction.

Because $W^{\parallel(1)}_e=W^{\parallel(1)}_{e’}$, we have $c^{(1)}\!\left(\pi^{-1}\!\left(e\right)\right)=c^{(1)}\!\left(\pi^{-1}\!\left(e’\right)\right)$. In other words,

$\forall x\in\pi^{-1}(e):\exists s_2\in\vec W^{(2)}:x+\left(0,s_2\right)\in\pi^{-1}(e').$

Equivalently, this means

$\pi(x)=e\Rightarrow\exists s_2\in\vec W^{(2)}:\pi\!\left(x+\left(0,s_2\right)\right)=e'.$

Note that $\pi\!\left(x+\left(0,s_2\right)\right)=\pi(x)+\vec\pi\!\left(0,s_2\right)$, which is just $e+\vec\pi\!\left(0,s_2\right)$, and we have

$\exists s_2\in\vec W^{(2)}:e'-e=\vec\pi\!\left(0,s_2\right).$

Similarly,

$\exists s_1\in\vec W^{(1)}:e'-e=\vec\pi\!\left(s_1,0\right).$

Subtract the two equations, and we have

$0=\vec\pi\!\left(s_1,-s_2\right),$

which means

$\left(s_1,-s_2\right)\in\vec\pi^{-1}(0)=\vec W^\parallel.$

Therefore,

$s_1\in c^{(1)}\!\left(\vec W^\parallel\right)=\vec W^{\parallel(1)}.$

Therefore,

$e'-e=\vec\pi\!\left(s_1,0\right)\in\vec\pi\!\left(\vec W^{\parallel(1)}\right) =\vec W^{\perp\parallel}.$

Now, prove the “only if” direction.

Because $e’-e\in\vec W^{\perp\parallel}=\vec\pi\!\left(\vec W^{\parallel(2)}\right)$, there exists $s_2\in\vec W^{\parallel(2)}$ such that

$e'=e+\vec\pi\!\left(0,s_2\right).$

Therefore, obviously we have $c^{(1)}\!\left(\pi^{-1}\!\left(e\right)\right)=c^{(1)}\!\left(\pi^{-1}\!\left(e’\right)\right)$, and thus $W^{\parallel(1)}_e=W^{\parallel(1)}_{e’}$.

Similarly, we can prove that $W^{\parallel(2)}_e=W^{\parallel(2)}_{e’}$. $\square$

This means that, given both $W^{\parallel(1)}_e$ and $W^{\parallel(2)}_e$, we can determine $e$ upto a vector in $\vec W^{\perp\parallel}$.

Because we already have $\vec W^{\perp\parallel}$, we can define a new affine subspace $W^{\perp\perp}:=\pi\!\left(W^{\perp(1)}\times W^{\perp(2)}\right)$ so that $W^\perp=W^{\perp\perp}+\vec W^{\perp\parallel}$, and each point in $W^\perp$ can be uniquely decomposed as a sum of a point in $W^{\perp\perp}$ and a vector in $\vec W^{\perp\parallel}$. We can prove this easily. Such decomposition can be encoded into a projection $\pi^\perp:W^\perp\to W^{\perp\perp}$ so that for any $e\in W^\perp$, we have $e-\pi^\perp(e)\in\vec W^{\perp\parallel}$. Also, we can easily prove that $\pi$ is an affine isomorphism from $W^{\perp(1)}\times W^{\perp(2)}$ to $W^{\perp\perp}$.

Now that we have defined many affine spaces and vector spaces, here is a diagram of the relation between (some of) them (powered by quiver):

Diagrarm

Example. In the example of two thermal systems that can exchange energy but not number of particles, we may have

$\pi^\perp\!\left(\frac U2,\frac U2,N_1,N_2\right)=\left(0,0,N_1,N_2\right).$

## Baths

Baths are a special class of thermal systems. They are systems that have some of their intensive quantities well-defined and constant.

According to Equation \ref{eq: mce fundamental eq}, to make the intensive quantities constant, $\ln\Omega(e)$ should be linear in $e$. If we just require some of the intensive quantities to be constant, we need to make it be linear when $e$ moves in directions in some certain vector subspace.

The requirement above is required by the microcanonical ensemble, which does not involve change in extensive quantities. An intuitive requirement is that $\lambda$ is also translationally invariant in such directions.

Then, here comes the definition of a bath:

Definition. A thermal system $(\mathcal E,\mathcal M)$ is called a $\left(\vec W^\parallel,i\right)$-bath, where $\mathcal E=(W,E,\lambda)$ and $\mathcal M=\bigsqcup_{e\in W}M_e$, if

• $\vec W^\parallel$ is a vector subspace of $\vec W$ and is a Polish reflexive space;
• For any $e\in E$ and $s\in\vec W^\parallel$, $e+s\in E$.
• $\lambda$ is invariant under translations in $\vec W^\parallel$; in other words, for any $s\in\vec W^\parallel$ and $A\in\sigma(E)$, we have $\lambda(A+s)=\lambda(A)$;
• $i\in\vec W^{\parallel\prime}$ is a continuous linear functional on $\vec W^\parallel$, called the constant intensive quantities of the bath; and
• For any $e\in E$ and $s\in\vec W^\parallel$,
$\ln\mu_{e+s}\!\left(M_{e+s}\right)=i(s)+\ln\mu_e\!\left(M_e\right).$

An important notice is that $\vec W^\parallel$ must be finite-dimensional because a metrizable TVS with a non-trivial σ-finite translationally quasi-invariant Borel measure must be finite-dimensional (Feldman, 1966).

We can then define the non-trivial σ-finite translationally invariant Borel measure on $\vec W^\parallel$, denoted as $\lambda^\parallel$. It is unique up to a positive constant factor.

We may construct an affine subspace $W^\perp$ for the bath so that every point in $W$ can be uniquely decomposed into the sum of a point in $W^\perp$ and a vector in $\vec W^\parallel$. Then, we have a projection map $\pi:W\to W^\perp$ so that for any $e\in W$ we have $e-\pi(e)\in\vec W^\parallel$. Then, obviously, $\mu_e\!\left(M_e\right)$ must be in the form

$\begin{equation} \label{eq: Omega of bath} \mu_e\!\left(M_e\right)=f\!\left(\pi(e)\right)\mathrm e^{i(e-\pi(e))}, \end{equation}$

where $f:W^\perp\to\mathbb R^+$ is some function. The eplicit formula of $f$ is $f(e):=\mu_e\!\left(M_e\right)$.

Further, we may require that $W^\perp$ is associated with a topological complement of $\vec W^\parallel$ (this is because $\vec W$ is locally convex and Hausdorff and $\vec W^\parallel$ is finite-dimensional). Then, by the mathematical tools that were introduced in the beginning, we can disintegrate the measure $\lambda$ w.r.t. $\lambda^\parallel$ to get a measure $\lambda^\perp$ on $W^\perp$ (it is the same for any element in $\vec W^\parallel$ because $\lambda$ is $\vec W^\parallel$-translationally invariant). Then, $\lambda$ is the product measure of $\lambda^\perp$ and $\lambda^\parallel$. In other words, for any measurable function $f:E\to\mathbb R$, we have

$\int_Ef\,\mathrm d\lambda= \int_{e\in E^\perp}\int_{s\in\vec W^\parallel}f\!\left(e+s\right) \mathrm d\lambda^\perp\!\left(e\right)\mathrm d\lambda^\parallel\!\left(s\right).$

## Thermal ensembles

Different from microcanonical ensembles, thermal ensembles are ensembles where the system we study is in thermal contact with a bath. For example, canonical ensembles and grand canonical ensembles are thermal ensembles. There are also non-thermal ensembles, which will be introduced later after we introduce non-thermal contacts (in part 2).

The thermal ensemble of a thermal system is the ensemble of the composite system of the system in question (subsystem 1) and a $\left(\vec W^{\parallel(2)},-i\circ\vec\rho^{-1}\right)$-bath (subsystem 2), where $i\in\vec W^{\parallel(1)\prime}$ is a parameter, with an extra requirement:

$\begin{equation} \label{eq: W2 translationally invariant} \forall s_2\in\vec W^{\parallel(2)},A\in\sigma(E): \lambda^\perp\!\left(\pi\!\left(A+s_2\right)\right)=\lambda^\perp\!\left(\pi\!\left(A\right)\right). \end{equation}$

The physical meaning of $i$ is the intensive variables that the system is fixed at by contacting the bath.

This composite system is called the composite system for the $\vec W^{\parallel(1)}$-ensemble. It is called that because we will see that the only important thing that distinguishes different thermal ensembles is the choice of $\vec W^{\parallel(1)}$, and the choices of $\pi,\lambda^\perp,W^{\perp(1)},W^{\perp(2)}$ are not important.

Definition. The composite system for the $\vec W^{\parallel(1)}$-ensemble of the system $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ is the composite system of $\left(\mathcal E^{(1)},\mathcal M^{(1)}\right)$ and $\left(\mathcal E^{(2)},\mathcal M^{(2)}\right)$, where

• $\left(\mathcal E^{(2)},\mathcal M^{(2)}\right)$ is a $\left(\vec W^{\parallel(2)},-i\circ\vec\rho^{-1}\right)$-bath, where $i\in\vec W^{\parallel(1)\prime}$ is a parameter called the fixed intensive quantities;
• Equation \ref{eq: W2 translationally invariant} holds.

From the properties of a bath, we can derive a useful property of $\lambda^{\parallel(1)}_e$.

Because $\lambda^{\parallel(1)}_e$ is the pullback of $\lambda^{\parallel(2)}_e$ under $\rho_e$, but $\lambda^{\parallel(2)}_e$ is just the same $\lambda^{\parallel(2)}$ for all $e$ (although $\lambda^{\parallel(2)}_e$ is defined on $W^{\parallel(2)}_e$ but $\lambda^{\parallel(2)}$ is defined on $\vec W^{\parallel(2)}$), we have $\lambda^{\parallel(1)}_e$ is the same as long as $W^{\parallel(1)}_e$ is the same. This means that we are able to be consistent with different compositing slices of our subsystem.

As we have claimed before, the isolation of a contraction is the same as the full contraction of a contractive slice. Therefore, we can use the microcanonical ensemble to find the equilibrium state of any contractive slice. Then, we can use the marginal state of each contractive slice to get the equilibrium state of each compositing slice in the subsystem.

Because the equal a priori probability postulate, the equilibrium state $p^{\parallel\circ}_e$ on the contractive slice $$\left(\mathcal E^\parallel_e,\mathcal M^\parallel_e\right)$$ is

$p^{\parallel\circ}_e\!\left(e_1,e_2,m_1,m_2\right) =\frac1{\mu^\parallel_e\!\left(\mathcal M^\parallel_e\right)}\propto1,$

where $\mu^\parallel_e$ is the measure of the number of microstates on $\mathcal M^\parallel_e$. Here $\propto$ means that the factor is only related to $e$. We just need “$\propto$” instead of “$=$” because we can always normalize a probability density function.

Substitute this into Equation \ref{eq: slice marginal state}, and we get that the equilibrium state $p^{\parallel\circ(1)}_e$ on the compositing slice $$\left(\mathcal E^{\parallel(1)}_e,\mathcal M^{\parallel(1)}_e\right)$$ is

\begin{align} p^{\parallel\circ(1)}_e\!\left(e_1,m_1\right) &\propto\mu^{(2)}_{\rho_e(e_1)}\!\left(M^{(2)}_{\rho_e(e_1)}\right) \nonumber\\ &=f\!\left(\pi^{(2)}\!\left(\rho_e\!\left(e_1\right)\right)\right) \mathrm e^{\left(-i\circ\vec\rho^{-1}\right)\left(\rho_e(e_1)-\pi^{(2)}(\rho_e(e_1))\right)} \nonumber\\ &\propto\mathrm e^{-i(e_1)}. \label{eq: p^(1) propto e^-i(e1)} \end{align}

Here we utilized Equation \ref{eq: Omega of bath} and the fact that for any $e_1\in W^{\parallel(1)}_e$, $\pi^{(2)}\!\left(\rho_e(e_1)\right)=\pi^{(2)}\!\left(W^{\parallel(2)}_e\right)$ is the same and is only related to $e$. Note that we have already illustrated that $\lambda^{\parallel(1)}_e$ is the same as long as $W^{\parallel(1)}_e$ is the same, so we can normalize $p^{\parallel\circ(1)}_e$ to get the same state as long as $W^{\parallel(1)}_e$ is the same, avoiding any inconsistency.

Before we proceed to normalize $p^{\parallel\circ(1)}_e$, I would like to talk about what is just enough information to determine $\lambda^{\parallel(1)}_e$. First, we need to know how different $e$ can still make $W^{\parallel(1)}_e$ the same. We already know that $W^\perp$ is just $W^{\perp\perp}+\vec W^{\perp\parallel}$, and the component in $\vec W^{\perp\parallel}$ does not affect $W^{\parallel(1)}_e$ and $W^{\parallel(2)}_e$, so we only need to know no more than $\pi^\perp(e)$. Then, because $W^{\perp\perp}$ is isomorphic to $W^{\perp(1)}\times W^{\perp(2)}$ but the corresponding change in $W^{\perp(2)}$ does not affect $W^{\parallel(1)}_e$, we only need to know the component $\pi^{(1)}\!\left(e_1\right)=\pi^{(1)}\!\left(\pi^{-1}(e)\right)$, where $e_1$ is just the $e_1$ in Equation \ref{eq: p^(1) propto e^-i(e1)}. The space $W^{\parallel(1)}_e$ is just $\pi^{(1)-1}\!\left(e_1\right)$.

Besides these information (components of $e$) is useless, there is other useless information. I have previously mentioned that the choices of $\lambda^\perp$, $\lambda^{\perp(2)}$ etc. are also irrelevant. We can see this by noting that $\lambda^{\parallel(1)}$ is always the non-trivial translationally invariant σ-finite Borel measure on $W^{\parallel(1)}_e$, which is unique up to a constant postive factor (and exists because it is finite-dimensional). This is not related to the choices of $\lambda^\perp$, $\lambda^{\perp(2)}$ etc. By this, we reduced the only thing that we need to care about into three ones $\lambda^{(1)}$, $\lambda^{\perp(1)}$, and $\lambda^{\parallel(1)}$, and their relation is given by the following:

$\int_{E^{(1)}}f\,\mathrm d\lambda^{(1)}= \int_{e_1\in E^{\perp(1)}}\mathrm d\lambda^{\perp(1)}\!\left(e_1\right) \int_{s_1\in\vec E^{\parallel(1)}_{e_1}} f\!\left(e_1+s_1\right)\mathrm d\lambda^{\parallel(1)}\!\left(s_1\right),$

where $E^{\perp(1)}:=\pi^{(1)}\!\left(E^{(1)}\right)$ and $\vec E^{\parallel(1)}_{e_1}:=\left(E^{(1)}-e_1\right)\cap\vec W^{\parallel(1)}$ is the region of $s_1\in\vec W^{\parallel(1)}$ in which $e_1+s_1$ is in $E^{(1)}$.

Next, what we need to do is to normalize Equation \ref{eq: p^(1) propto e^-i(e1)}. The denominator in the normalization factor, which we could call the partition function $Z:\bigsqcup_{e_1\in E^{\perp(1)}}I^{(1)}_{e_1}\to\mathbb R$, is

\begin{align*} Z\!\left(e_1,i\right)&:=\int_{s_1\in\vec E^{\parallel(1)}_{e_1}} \int_{m_1\in M^{(1)}_{e_1+s_1}} \mathrm e^{-i\left(s_1\right)}\,\mathrm d\lambda^{\parallel(1)}\!\left(s_1\right) \mathrm d\mu^{(1)}_{e_1+s_1}\!\left(m_1\right)\\ &=\int_{s_1\in\vec E^{\parallel(1)}_{e_1}} \Omega^{(1)}\!\left(e_1+s_1\right) \mathrm e^{-i\left(s_1\right)}\,\mathrm d\lambda^{\parallel(1)}\!\left(s_1\right), \end{align*}

where $I_{e_1}\subseteq\vec W^{\parallel(1)\prime}$ is the region of $i$ in which the integral converges. It is possible that $I_{e_1}=\varnothing$ for all $e_1\in E^{\perp(1)}$, and in this case the thermal ensemble is not defined.

Because we have got rid of arguments about the bath and the composite system, we can now define the partition function without the “$(1)$” superscript:

$Z\!\left(e,i\right)=\int_{s\in\vec E^{\parallel}_e} \Omega\!\left(e+s\right) \mathrm e^{-i\left(s\right)}\,\mathrm d\lambda^{\parallel}\!\left(s\right),\quad e\in E^\perp,\quad i\in I_e\subseteq\vec W^{\parallel\prime}.$

By looking at the definition, we may see that the partition function is just the partial Laplace transform of $\Omega$.

Note that the partition function is unique only up to a positive constant factor because we can choose another $\lambda^\parallel$ by multiplying a positive constant factor.

The partition function has very good properties.

Theorem. For any $e\in E^\perp$, $I_e$ is convex.

Proof

Proof. Suppose $i,i’\in I_e$. The functional $i’-i$ defines a hyperplane $H:=\operatorname{Ker}\!\left(i’-i\right)$. The hyperplane separate $\vec W^\parallel$ into two half-spaces $H^+$ and $H^-$ defined as

$H^\pm:=\left\{s\in\vec W^\parallel\,\middle|\,i'\!\left(s\right)-i\!\left(s\right)\gtrless0\right\}.$

By definition, $Z\!\left(e,i\right)$ and $Z\!\left[e,i’\right]$ both converge. Let $t\in\left[0,1\right]$, and we have

\begin{align*} Z\!\left(e,i+t\left(i'-i\right)\right) &=\left(\int_{s\in\vec E^{\parallel}_e\cap H^+}+\int_{s\in\vec E^{\parallel}_e\cap H^-}\right) \Omega\!\left(e+s\right) \mathrm e^{-i(s)-t(i'(s)-i(s))}\,\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &\le\int_{s\in\vec E^{\parallel}_e\cap H^+}\Omega\!\left(e+s\right) \mathrm e^{-i(s)}\,\mathrm d\lambda^{\parallel}\!\left(s\right) +\int_{s\in\vec E^{\parallel}_e\cap H^-}\Omega\!\left(e+s\right) \mathrm e^{-i'(s)}\,\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &<\infty. \end{align*}

Therefore, $Z!\left[e,i+t\left(i’-i\right)\right]$ converges. $\square$

Being convex is good because it means that $I_e$ is not too shattered. It is connected, and its interior $\operatorname{Int}I_e$ and closure $\operatorname{Cl}I_e$ look very much like $I_e$ itself. Also, every point in $I_e$ is a limit point of $I_e$. This makes it possible to talk about the limits and derivatives of $Z\!\left(e,i\right)$ w.r.t. $i$.

Since $I_e$ is a region in a finite-dimensional space $\vec W^{\parallel\prime}$, we may define the derivatives w.r.t. $i$ in terms of partial derivatives to components of $i$. To define the components of $i$, we need first a basis on $\vec W^\parallel$, which sets a coordinate system although actually we should finally derive coordinate-independent conclusions.

Suppose we have a basis on $\vec W^\parallel$. Then, for any $s\in\vec W^\parallel$, we can write its components as $s_\bullet$, and for any $i\in\vec W^{\parallel\prime}$, we can write its components as $i_\bullet$. The subscript “$\bullet$” here can act as dummy indices (for multi-index notation). For example, we can write $i(s)=i_\bullet s_\bullet$. I do not use superscript and subscript to distinguish vectors and linear functionals because it is just for multi-index notation and because I am going to use them to label multi-index objects that are neither vectors nor linear functionals.

Theorem. For any $e\in E^\perp$, $Z\!\left(e,i\right)$ is $C^\infty$ w.r.t. $i$ on $\operatorname{Int}I_e$.

Proof

Proof. By the definition of the interior of a region, for any $i\in\operatorname{Int}I_e$ and any $p\in\vec W^{\parallel\prime}$, there exists $\delta_{i,p}>0$ such that $i+\delta_{i,p}p\in I_e$.

By Leibniz’s integral rule, the partial derivatives of $Z\!\left(e,i\right)$ w.r.t. $i$ (if existing) are given by

\begin{align*} \frac{\partial^{\Sigma\alpha_\bullet}Z\!\left(e,i\right)}{\partial^{\alpha_\bullet}i_\bullet} &=\int_{s\in\vec E^{\parallel}_e} \Omega\!\left(e+s\right)\left(-s_\bullet\right)^{\alpha_\bullet} \mathrm e^{-i\left(s\right)}\,\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &\le\int_{s\in\vec E^{\parallel}_e} \Omega\!\left(e+s\right)\left|s_\bullet\right|^{\alpha_\bullet} \mathrm e^{-i\left(s\right)}\,\mathrm d\lambda^{\parallel}\!\left(s\right) \end{align*}

where $\alpha_\bullet$ is some natural numbers indexed by $\bullet$. Now we just need to prove that this integral converges for any $i\in\operatorname{Int}I_e$.

Because of the inequality

$a\ln x-bx\le a\left(\ln\frac ab-1\right),\quad a,b,x>0,$

where the equality holds when $x=a/b$, we have

$\left|s_\bullet\right|^{\alpha_\bullet} \le\left(\frac{\alpha_\bullet}{\mathrm eb}\right)^{\alpha_\bullet}\mathrm e^{b\Sigma\left|s_\bullet\right|}, \quad b>0$

There are $2^{\dim\vec W^\parallel}$ orthants in $\vec W^\parallel$. We can label each of them by a string $\sigma_\bullet$ of $\pm1$ of length $\dim\vec W^\parallel$. Then, each orthant can be denoted as $O_\sigma$. Then, we have

$\forall s\in O_\sigma:\sigma_\bullet s_\bullet=\Sigma\left|s_\bullet\right|.$

Therefore,

$\forall s\in O_\sigma:\left|s_\bullet\right|^{\alpha_\bullet} \le\left(\frac{\alpha_\bullet}{\mathrm eb}\right)^{\alpha_\bullet}\mathrm e^{b\sigma_\bullet s_\bullet}, \quad b>0.$

Let $b:=\delta_{i,-\sigma}$, where $\sigma:s\mapsto\sigma_\bullet s_\bullet$ is a linear functional. Then,

$\forall s\in O_\sigma:\left|s_\bullet\right|^{\alpha_\bullet}\mathrm e^{-i(s)} \le\left(\frac{\alpha_\bullet}{\mathrm e\delta_{i,-\sigma}}\right)^{\alpha_\bullet} \mathrm e^{-\left(i-\delta_{i,-\sigma}\sigma\right)(s)}.$

Because $i-\delta_{i,-\sigma}\sigma\in I_e$, we have

$\frac{\partial^{\Sigma\alpha_\bullet}Z\!\left(e,i\right)}{\partial^{\alpha_\bullet}i_\bullet} \le\sum_\sigma\left(\frac{\alpha_\bullet}{\mathrm e\delta_{i,-\sigma}}\right)^{\alpha_\bullet} \int_{s\in\vec E^{\parallel}_e\cap O_\sigma}\Omega\!\left(e+s\right) \mathrm e^{-\left(i-\delta_{i,-\sigma}\sigma\right)(s)}\, \mathrm d\lambda^{\parallel}\!\left(s\right)<\infty.$

Therefore, the partial derivatives exist. $\square$

The next step is to find the macroscopic quantities. The equilibrium states are

$p_e^{\parallel\circ}\!\left(e,m\right) =\frac{\mathrm e^{-i\left(e\right)}}{Z\!\left(\pi(e),i\right)}.$

where $Z$ is the partition function. Here the role of $e$ becomes the label parameter in Equation \ref{eq: fundamental equation before}. The measured value of extensive quantities under equilibrium is then

\begin{align*} \varepsilon^\circ &=\frac1{Z\!\left(e,i\right)}\int_{s\in\vec E^{\parallel}_e} \left(e+s\right)\mathrm e^{-i\left(s\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &=e+\frac1{Z\!\left(e,i\right)}\int_{s\in\vec E^{\parallel}_e} s\mathrm e^{-i\left(s\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &=e+\frac{\partial\ln Z\!\left(e,i\right)}{\partial i}. \end{align*}

The entropy under equilibrium is then

\begin{align*} S^\circ &=\int_{s\in\vec E^{\parallel}_e} \frac{\mathrm e^{-i(s)}}{Z\!\left(e,i\right)}\ln\frac{\mathrm e^{-i(s)}}{Z\!\left(e,i\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right)\\ &=-\frac1{Z\!\left(e,i\right)}\int_{s\in\vec E^{\parallel}_e} i\!\left(s\right)\mathrm e^{-i\left(s\right)} \Omega\!\left(e+s\right)\mathrm d\lambda^{\parallel}\!\left(s\right) +\ln Z\!\left(e,i\right)\\ &=-i\!\left(\frac{\partial\ln Z\!\left(e,i\right)}{\partial i}\right)+\ln Z\!\left(e,i\right). \end{align*}

By this two equations, we can eliminate the parameter $e$ and get the fundamental equation in the form of Equation \ref{eq: fundamental equation}:

$S^\circ=i\!\left(\varepsilon^\circ\right)+\ln Z\!\left(\pi\!\left(\varepsilon^\circ\right),i\right).$

We can see that $S^\circ$ decouples into two terms, one of which is only related to the $\vec W^\parallel$ component of $\varepsilon^\circ$, and the other of which is only related to the $W^\perp$ component of $\varepsilon^\circ$. What is good is that we have a good notion of derivative w.r.t. the first term, and it is $i$. Therefore, the intensive quantities corresponding to change of extensive quantities in the subspace $\vec W^\parallel$ is well defined and is constant $i$, which is just what we have been calling the fixed intensive quantities. The other components of the intensive quantities are not guaranteed to be well-defined because $Z\!\left(\cdot,i\right)$ is not guaranteed to have good enough properties.

]]>
UlyssesZhan
The core of a voting system is the intersection of Pareto sets2023-03-25T23:28:53-07:002023-03-25T23:28:53-07:00https://ulysseszh.github.io/economics/2023/03/25/voting-paretoVoting system is a concept in political science. Here I give the mathematical definition of a voting system.

A (binary) voting system is a tuple $(P,V,q)$, where $P$ is any set, called the set of proposals, and $V$ is a finite set of preference relations on $P$, called the set of voters, and $q$ is an integer between (inclusive) $0$ and $\left|V\right|$, called the quota.

For each voter $v\in V$ and two proposals $x,y\in P$, we denote “$v$ prefers $x$ to $y$” by

$x\succeq_vy.$

A proposal $x\in P$ is a defeat of $y\in P$ if

$\left|\left\{v\in V\,\middle|\,x\succeq_vy\right\}\right|\geq q,$

denoted as $x\succsim_{V,q}y$ (despite this notation, $\succsim_{V,q}$ is not necessarily a preference relation on $P$ because it is not transitive generally, which is actually a well-known example of irrationality).

The core $\mathcal C(P,V,q)$ of the voting system is the set of such element $x\in P$: $x$ does not have any defeat other than $x$ itself (non-trivial defeat).

Pareto sets are common concepts in economics. To clarify, I also give the mathematical definition of them here.

Let $P$ be a set and $Q$ be a family of preference relations on $P$. Then, $x\in P$ is called a (weak) $Q$-Pareto improvement of $y\in P$ if $\forall v\in V:x\succeq_vy$, denoted as $x\succsim_Qy$ (despite the notation, $\succsim_Q$ is not necessarily a preference relation on $P$).

The Pareto set $\mathcal P(P,Q)$ is the set of all such element $x\in P$: $x$ does not have any $Q$-Pareto improvement other than $x$ itself (non-trivial $Q$-Pareto improvement).

Here is the main result. For a voting system $(P,V,q)$,

$\mathcal C(P,V,q)=\bigcap_{Q\subseteq V,\left|Q\right|=q}\mathcal P(P,Q).$

Proof. To prove this, we need to show that $x\in P$ does not have any non-trivial Pareto improvement for any $q$ voters iff $x$ does not have any non-trivial defeat.

To prove the forward direction, suppose that $x\in P$ does not have any non-trivial Pareto improvement for any $q$ voters. Let $y\in P$ such that $y\ne x$, and the goal is to prove that $y$ is not a defeat of $x$.

Let

$Y:=\left\{v\in V\,\middle|\,y\succeq_vx\right\}.$

Then, $y$ is a $Y$-Pareto improvement of $x$, so we have $\left|Y\right|<q$ (because otherwise there is a subset of $Y$ with $q$ voters for which $y$ is a Pareto improvement of $x$). Therefore, $y$ is not a defeat of $x$.

To prove the backward direction, suppose that $x\in P$ has a non-trivial $Q$-Pareto improvement, where $Q\subseteq V$ and $\left|Q\right|=q$. Denote the improvement as $y$. Let

$Y:=\left\{v\in V\,\middle|\,y\succeq_vx\right\}.$

because $y$ is a $Q$-Pareto improvement of $x$, we have $Q\subseteq Y$. Therefore, $\left|Y\right|\geq\left|Q\right|=q$. Therefore, $y$ is a defeat of $x$. $\square$

Specially, we have

$\mathcal C\!\left(P,V,\left|V\right|\right)=\mathcal P(P,V).$

Here is an example. Suppose we have 5 voters, and the set of proposals is $\mathbb R^2$. Each voter has an ideal point and prefers points nearer to the ideal point. The 5 ideal points form a convex pentagon. Then we can find the core easily by the conclusion above: ]]>
UlyssesZhan
Relationship between the Gini coefficient and the variance2023-02-06T16:38:25-08:002023-02-06T16:38:25-08:00https://ulysseszh.github.io/economics/2023/02/06/gini-varianceThis article is translated from a Chinese article on my Zhihu account. The original article was posted at 2021-04-25 10:06 +0800.

First, define the Lorenz curve: it is the curve that consists of all points $(u,v)$ such that the poorest $u$ portion of population in the country owns $v$ portion of the total wealth.

The Gini coefficient $G/\mu$ is defined as the area between the Lorenz curve and the line $u=v$ divided by the area enclosed by the three lines $u=v$, $v=0$, and $u=1$.

Now, suppose the wealth distribution in the country is $p(X)$, where $p\!\left(x\right)\mathrm dx$ is the portion of population that has wealth in the range $[x,x+\mathrm dx]$.

Then, the Lorenz curve is the graph of the function $g$ defined as

$g(F(x))=\frac1\mu\int_{-\infty}^xtp\!\left(t\right)\mathrm dt,$

where

$F\!\left(x\right):=\int_{-\infty}^xp\!\left(t\right)\mathrm dt$

is the cumulative distribution function of $p(X)$, and

$\begin{equation} \label{eq: def mu} \mu:=\int_{-\infty}^{+\infty}tp\!\left(t\right)\mathrm dt \end{equation}$

is the average wealth of the population, which is just $\mathrm E[\mathrm X]$ ($X$ is a random variable such that $X\sim p(X)$).

Then, the Lorenz curve is

$v=g(u):=\frac1\mu\int_{-\infty}^{F^{-1}(u)}tp\!\left(t\right)\mathrm dt.$

According to the definition of the Gini coefficient,

\begin{align*} G&:=2\mu\int_0^1\left(u-g(u)\right)\mathrm du\\ &=\mu-2\mu\int_0^1g\!\left(u\right)\mathrm du\\ &=\mu-2\int_{u=0}^1\int_{t=-\infty}^{F^{-1}(u)}tp\!\left(t\right)\mathrm dt\,\mathrm du. \end{align*}

Interchange the order of integration, and we have

\begin{align*} G&=\mu-2\int_{t=-\infty}^{+\infty}\int_{u=F(t)}^1tp\!\left(t\right)\mathrm dt\,\mathrm du\\ &=\mu-2\int_{-\infty}^{+\infty}\left(1-F(t)\right)tp\!\left(t\right)\mathrm dt. \end{align*}

Substitute Equation \ref{eq: def mu} into the above equation, and we have

\begin{align*} G&=\int_{-\infty}^{+\infty}2tF\!\left(t\right)p\!\left(t\right)\mathrm dt-\mu\\ &=\int_{-\infty}^{+\infty}\left(2tF\!\left(t\right)-1\right)tp\!\left(t\right)\mathrm dt\\ &=\int_0^1\left(2u-1\right)F^{-1}\!\left(u\right)\mathrm du. \end{align*}

Now here is the neat part. Separate it into two parts, and write them in double integrals:

\begin{align*} G&=\int_0^1uF^{-1}\!\left(u\right)\mathrm du-\int_0^1\left(1-u\right)F^{-1}\!\left(u\right)\mathrm du\\ &=\int_{u_2=0}^1\int_{u_1=0}^{u_2}F^{-1}\!\left(u_2\right)\mathrm du_1\,\mathrm du_2 -\int_{u_1=0}^1\int_{u_2=u_1}^1F^{-1}\!\left(u_1\right)\mathrm du_1\,\mathrm du_2. \end{align*}

Interchange the order of integration of the second term, and we have

\begin{align*} G&=\int_{u_2=0}^1\int_{u_1=0}^{u_2}\left(F^{-1}\!\left(u_2\right)-F^{-1}\!\left(u_1\right)\right)\mathrm du_1\,\mathrm du_2\\ &=\frac12\int_{u_2=0}^1\int_{u_1=0}^1\left|F^{-1}\!\left(u_2\right)-F^{-1}\!\left(u_1\right)\right|\mathrm du_1\,\mathrm du_2\\ &=\frac12\int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty}\left|x_2-x_1\right|p\!\left(x_1\right)p\!\left(x_2\right)\mathrm dx_1\,\mathrm dx_2\\ &=\frac12\mathrm E\!\left[\left|X_2-X_1\right|\right], \end{align*}

where $X_1$ and $X_2$ are two independent random variables with $p$ being their respective distribution functions: $\left(X_1,X_2\right)\sim p\!\left(X_1\right)p\!\left(X_2\right)$.

By this result, we can easily see how the Gini coefficient represents the statistical dispersion.

We can apply similar tricks to the variance $\sigma_X^2$.

\begin{align*} \sigma_X^2&=\mathrm E\!\left[X^2\right]-\mathrm E\!\left[X\right]^2\\ &=\int_{-\infty}^{+\infty}t^2p\!\left(t\right)\mathrm dt -\left(\int_{-\infty}^{+\infty}tp\!\left(t\right)\mathrm dt\right)^2\\ &=\int_0^1F^{-1}\!\left(u\right)^2\,\mathrm du -\left(\int_0^1F^{-1}\!\left(u\right)\mathrm du\right)^2. \end{align*}

Separate the first into two halves, and write the altogether three terms in double integrals:

\begin{align*} \sigma_X^2&=\frac12\int_0^1F^{-1}\!\left(u_2\right)^2\,\mathrm du_2\int_0^1\mathrm du_1\\ &\phantom{=~}{}-\int_0^1F^{-1}\!\left(u_1\right)\mathrm du_1\int_0^1F^{-1}\!\left(u_2\right)\mathrm du_2\\ &\phantom{=~}{}+\frac12\int_0^1F^{-1}\!\left(u_1\right)^2\,\mathrm du_1\int_0^1\mathrm du_2\\ &=\frac12\int_0^1\int_0^1 \left(F^{-1}\!\left(u_2\right)^2-2F^{-1}\!\left(u_1\right)F^{-1}\!\left(u_2\right)+F^{-1}\!\left(u_1\right)^2\right) \mathrm du_1\,\mathrm du_2\\ &=\frac12\int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty} \left(x_2-x_1\right)^2p\!\left(x_1\right)p\!\left(x_2\right)\mathrm dx_1\,\mathrm dx_2\\ &=\frac12\mathrm E\!\left[\left(X_2-X_1\right)^2\right]. \end{align*}

Then we can derive the relationship between the Gini coefficient and the variance:

$2\sigma_X^2-4G^2=\sigma_{\left|X_2-X_2\right|}^2.$]]>
UlyssesZhan
Distinguishing all the letters in handwritten math/physics notes2023-02-04T09:43:02-08:002023-02-04T09:43:02-08:00https://ulysseszh.github.io/misc/2023/02/04/handwritten-fontThis article is adapted from a Chinese article on my Zhihu account. The original article was posted at 2021-02-02 00:41 +0800. There are some minor modifications to the original article as well as some added contents.

Personally, I have the demand of handwriting math/physics notes, but an annoying fact about this is that I usually cannot distinguish every letter that may be possibly used well enough.

This article does not involve calligraphy, and I myself have not learnt calligraphy specially ever.

## List of different styles

Here is a full list of different styles except for their bold counterparts:

Style name $\LaTeX$ command Example
Roman \mathrm $\mathrm{ABC}$
Italic \mathit $\mathit{ABC}$
Blackboard \mathbb $\mathbb{ABC}$
Calligraphic \mathcal $\mathcal{ABC}$
Script \mathscr $\mathscr{ABC}$
Fraktur \mathfrak $\mathfrak{ABC}$
Sans-serif \mathsf $\mathsf{ABC}$
Typewriter \mathtt $\mathtt{ABC}$

We are not going to distinguish all the letters and all the styles.

## Some principles

I will try to find a handwriting style that satisfies the following conditions (in descending order of importance):

1. I am able to write them fast and simply.
2. I am able to recognize each character at a glance.
3. The style is consistent for all letters.
4. The shape is similar to the default mathematical font of $\LaTeX$ (Computer Modern).
5. If the last condition cannot be satisfied, the shape is similar to some style that ever existed.

The reason for the 2nd principle to be lower than the 1st is that the efficiency of taking notes should not be too low and that one may distinguish letters and styles by the context.

If a style fails to satisfy the 5th or the 4th principle (i.e. this style is invented by me), I will add an exclamation mark (!) to inform you of this.

The following lists all the letters and the styles that I want to distinguish:

• Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 (they are not letters, but they deserve distinguishing).
• Roman style of uppercase English letters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z.
• Italic style of uppercase English letters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, P, Q, R, S, T, U, V, W, X, Y, Z (not including O).
• Roman style of lowercase English letters: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z.
• Italic style of lowercase English letters: a, b, c, d, e, f, g, h, i, j, k, l, m, n, p, q, r, s, t, u, v, w, x, y, z (not including o).
• Roman style of uppercase Greek letters: Gamma, Delta, Theta, Lambda, Xi, Pi, Sigma, Upsilon, Phi, Psi, Omega (not including any letters that cannot be distinguished from english uppercase letters).
• Italic style of lowercase Greek letters: alpha, beta, gamma, delta, epsilon, zeta, eta, theta, iota, kappa, lambda, mu, nu, xi, pi, rho, sigma, tau, upsilon, phi, chi, psi, omega (not including omicron).
• Blackboard bold style of uppercase English letters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z.
• Calligraphic style of uppercase English letters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z.
• Script style of uppercase English letters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z.
• Fraktur style of uppercase English letters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z.
• Fraktur style of lowercase English letters: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z.

In terms of linguistic terminology, each entry in the above list is a grapheme in my handwritten notes. However, in extreme cases, even if I have actively avoided, it is still possible that two graphemes are indistinguishable. Then, I will design allographs for those graphemes to provide extra distinguishability in extreme cases.

Here are some of the general rules that I set up:

• We do not write any serif unless it is a must for distinguishing letters. (This is also why I did not plan to distinguish sans-serif styles.)
• The roman style of all english letters does not have tails (either ornamental or used for ligatures in connected writing).
• For both roman and italic styles, all uppercase letters (both English and Greek) have the same position of bottom and top.

For other details, look at this image: ## Roman and italic

### A, a, alpha

In italic style, the slanted line in the right side of A is nearly vertical. Actually, in the italic style of uppercase letters, almost all top-left-to-bottom-right slanted lines are nearly vertical.

To write conveniently, use the single-story glyph of a even for its roman style.

The difference of the glyph of alpha and that of a should be noticeable.

### C, c, sigma

C and c are tricky because it is very hard to distinguish roman and italic styles for them, but we have to because they are very commonly used. We need to be careful when writing and recognizing them.

Roman style of C is largely vertically symmetrical, while the italic style of C is not. In the italic style of C, the top endpoint of the stroke is to the right of the bottom endpoint, and the left-most position on the stroke is below the center instead of being at the same level as the center.

The opening direction of the roman style of c is to the right, while that of the italic style is to the top-right.

(I once tried using ornamental tails to distinguish the italic style of c from the roman style, but it would make them look strange and may possibly confuse with other letters.)

At first, I did not want to distinguish the roman and italic styles of c, but I found that it is useful to distinguish them. For example, some times we use $a,b,c$ for indices, so the italic style of c may be used as an index; meanwhile, we may use roman style of c to represent “center” so that we can express the position of the center as $\mathbf r_\mathrm c$. In both cases, the letter c appears in the position of a subscript, but they need to be distinguished from each other.

I want to talk about sigma here because in Greek, its final form $\varsigma$ looks very similar to c. Just do not use that glyph for sigma.

### e

It is important to distinguish the roman and italic styles of e because we may use $\mathrm e$ for the base of natural logarithm and use $e$ for the electric charge of a proton.

At the turning point of the stroke at the center-right of the glyph, the roman style of e is sharp while the italic style is round. This detail is enough to distinguish them.

### f

The roman style of f is not a descender while the italic style is a descender. Also, the italic style of f has a left-tail in the bottom.

### g

To make writing convenient, the roman style of g uses the single-story glyph. It would make it hard to distinguish it from the italic style, but we may write descender of the italic style of g in a exaggerated way to distinguish them.

### 1, I, l

Here we are at the only extreme case where multiple graphemes share the same glyph: 1, roman style of I, and roman style of l. They are all simply a vertical line.

Normally we should be able to distinguish them by their context, but in some cases we need to distinguish them clearly. We may add some small turnings at the top and bottom of I to distinguish it from l. It is like we are trying to write the serifs of I but we write so fast that they are connected and look like small turnings.

A small sharpe turning may be added at the top of 1 to distinguish it from l.

### i, iota

The italic style of i has two tails (one left-tail in the middle and one right-tail in the bottom). It looks exactly the same as iota except for the dot at the top.

### K, k, kappa

In both the roman and italic styles of K, the endpoint of the stroke branch of K at the top-right is approximately at the same level as the top endpoint of the vertical line at the left.

The slantation of the left vertical line should be enough to distinguish the italic style of K from the roman style, but we may also add a small tail at the bottom-right to distinguish them further. Do not worry about confusing with kappa because we have other ways to distinguish it.

In the italic style of k, the top-right stroke branch is written as a closed circle. This makes it easier to distinguish from K and kappa.

kappa is shorter than K and k. The bottom-right stroke branch is written in shape of an inclined mirrored S-curve to distinguish from K and k. The endpoint of the stroke branch of kappa at the top-right is approximately at the same level as the top endpoint of the vertical line at the left.

### M, mu

In the italic style of M, the bottom is wider than the top, while in the roman style, the top is as wide as the bottom. Write M in four strokes to distinguish it from mu.

As for mu, note that the bottom-left corner is a descender, while other parts are not.

### 0, O, o

These are the most cursed characters, even more than 1, I, and l. They are so cursed that I refuse to distinguish the roman style of O and o from the italic style, and I would refuse to use the italic style of O and o in my hand written notes.

The digit 0 is narrower than O and o.

Just avoid using omicron because it is indistinguishable from o.

### p, rho

Write the italic style of p in two strokes, and it has two left-tails, one at the top-left and one at the bottom-left.

Write rho in one stroke. Starting the stroke from below the baseline (at the bottom of the descender) is recommended.

### Q, q

In the italic style of Q, the last stroke looks like a tilde. It is straight for the roman style.

The italic style of q has a sharp right-tail in the bottom.

### r, u, v, gamma, nu, Upsilon, upsilon

OK, this is important.

Every physicist must have met at least one person who mistakenly recognized nu as v.

The roman style of r does not have tails (the arc at the top-right does not count as a tail). The italic style of r has a left-tail at the top-left and a right-tail at the top-right. The downward part and the upward part of the stroke overlap at the bottom to distinguish it from v.

The italic style of u has a left-tail at the top-left and a right-tail at the bottom-right. The tail at bottom-right distinguishes it from v.

The italic style of v has a left-tail at the top-left and a left-tail at the top-right. The tail at top-right is ommitable because it is not very noticeable. The bottom of both the roman style and italic style of v is a sharp turning.

The top-left of gamma is curvy while the top-right is straight. The letter is also a descender, so make its bottom lower than the baseline.

The left of nu is a vertical line. The right of nu is like a broken line (!). The left and right parts are tangent to each other at the bottom but separates quickly (!).

Both the top-left and top-right of Upsilon are curvy. It is thus different from r or gamma.

The letter upsilon is not commonly used. If it is used, its bottom is round instead of being sharp, to distinguish it from the italic style of v.

### S, s

They are cursed, but not as cursed as O and o.

In the italic style of S and s, the bottom-left is to the left of the top-left. In the roman style, the bottom-left and the top-left are aligned instead.

### t, tau

The roman style of t is a straight cross (no curvy strokes) to distinguish it from the italic style.

The horizontal stroke of multiple f’s and t’s may be connected (ligature). Note that they may only be connected if they are intended to form a word. If they are written together just for mathematical multiplication, there should not be a ligature.

The bottom of tau may be either turing to the right or stopping just straightly. I prefer it turning to the right.

### U

To distinguish from cup (the symbol for set union), add a vertical line at the right of the glyph (for both the roman and italic styles), but the italic style of it does not have a tail.

### W, w, omega

Just like how many people mistakenly recognize nu as v, many people also mistakenly recognize omega as w.

The top-left and top-right of w are the same as those of v for both roman and italic styles.

The letter W is not the same as a upside-down M. For both roman and italic styles, the top of W is wider than the bottom.

There is a right-tail at the top-left of omega. The bottom of omega is round instead of being sharp.

### X, x, chi

There are not as many people who mistakenly recognize chi as x as there are for nu and omega, but there are still many.

It is a little hard to distinguish the roman and italic styles of X. First, the top-right-to-bottom-left stroke of X is longer in the italic style to embody the feel of slantation. Also, in the italic style, the top-left of X is to the left of the bottom-left. These should be enough to distinguish it from the roman style. Note that the italic style of X is a little different from the italic styles of other letters in that the top-left-to-bottom-right stroke is not nearly vertical (because otherwise it would look strange).

The italic style of x has a left-tail at the top-left and a right-tail at the bottom-right. The bottom-left and the top-right do not have tails (for convenience). Write x as a cross instead of two C-curves tangent to each other (I know some people write it like that).

The top-left of chi has a left-tail, and the bottom-right has a right-tail. The bottom-left of chi has a right-tail (!), which is the main feature to distinguish it from x. Also, note that chi is a descender, and the intersection of the two strokes is at the baseline.

### Y, y

Write Y in three strokes.

Write the roman style of y in two strokes, both of which are straight. The italic style of y is the the same as the italic style of u but the tail at the bottom-right is changed into a descender like that of g.

### 2, Z, z

Some people add a short stroke in the middle of z (I used to do that) or add a descender at the bottom like that of g to distinguish it from 2. I use neither of them because the sharp turning corner at the top-right of z is enough to distinguish it from 2.

The top and bottom of Z are aligned in the roman style, but the top is a little bit offset to the left of the bottom in the italic style.

The bottom of the italic style of z is written like a tilde.

### epsilon

In Greek, there are two glyphs for epsilon, one of which is called the lunate epsilon or the uncial epsilon $\epsilon$, and the other $\varepsilon$ does not have a name but I like to call it varepsilon (because the command for the glyph in $\LaTeX$ is \varepsilon).

Use varepsilon. Never use the lunate epsilon because it confuses with the set membership symbol.

### Theta, theta

Write Theta as wide as O, and do not make the stroke in the middle touch either side. Tilt theta a bit. Because we do not use italic uppercase Greek letters and roman lowercase Greek letters, Theta and theta should be distinguishable enough.

### Lambda, Omega

I have never imagined someone would write Omega that looks very similar to Lambda, but there are people like that. They are very different! OK?

### Phi, phi

In Greek, there are two glyphs for phi, the loopy / open one $\varphi$ or the stroked / closed one $\phi$. Just stick to the loopy one and forget about the stroked one so that we can distinguish it from Phi.

Some sources say that we should use the stroked one for the golden ratio. Just forget about that. I never use the letter to represent the golden ratio.

### Psi, psi

The tops of the two strokes of Psi are at the same level.

The top of the middle stroke of psi is a little bit higher than the top of the other stroke. There is a left-tail at the top-left of psi. There is a left-tail at the bottom (decender) of psi (!).

## Blackboard

We only need to write blackboard style for uppercase English letters. Generally, we just add one or two strokes to the roman style of the letters to make them blackboard style. The general rules are as follows:

• If there are multiple vertical strokes, add a vertical stroke next to each of them, and we are done.
• Otherwise, if there is a non-horizontal stroke that starts from the top-left, add a stroke next to it.
• Otherwise, if the leftmost stroke is a curve that span from top to bottom, at a vertical stroke in the inner, next to the leftmost part of the curve.
• Otherwise, this is a special letter!

There are some special letters as well as some exceptions to the general rules listed below.

### A

Add a stroke next to the leftmost stroke.

### J

It does not contain a vertical stroke, but we regard the right part of the stroke as one vertical troke.

### S

Add two short vertical strokes to the inner of the leftmost part curve and the rightmost part curve.

### W

It would be strange if we only add one additional stroke. I want to add two to make it looks like double V (actually, it indeed should be).

### Y

Add a stroke next to the to-left stroke and a stroke next to the bottom stroke.

### Z

Add a stroke next to the middle part of the stroke.

## Calligraphic and script

Different from roman style, some uppercase letters in calligraphic and script styles are descenders. The descenders are: G, J, Q, Y. Some people possibly write F, H (less likely), P, and Z as descenders as well, but I do not.

As for details, I am tired of explaining for each letter. Just look at the image before.

## Fraktur

This is the most tricky style. You may think it is hard to write in Fraktur style when you look at how $\LaTeX$’s default typeface renders it. Actually, it indeed is, but it is not intended for you to handwrite. I recommend write them as shown here (ignore the final line because we do not need them): They look very distinguishable.

]]>
UlyssesZhan

“女厕所从来没有过像现在这样散发着智慧的光芒。”

“你把你聪明的小脑瓜落在那里了？”

“傻子，这里写错了。我记得我给你写过的……”

“我明明记得我在这里写过的啊，写在哪里了呢……”响起了小明翻草稿本的声音。

]]>
UlyssesZhan