Although Grassmann numbers are purely mathematical concept, but like most people, I was introduced to them in physics class. I then had the natural question: how to formally define Grassmann numbers? In a homework given by my professor of QFT course, I found that I had to answer the question to do a problem in the homework in a way that I am satisfied with.

Let $\p{\mbb G_0,+}$ and $\p{\mbb G_1,+}$ be two abelian groups such that $\mbb G_0\cap\mbb G_1=\B{0}$. For convenience, for any $k\in\bN$, define $\mbb G_k\ceq\mbb G_{k\bmod2}$. Define a multiplication on $\mbb G_0\cup\mbb G_1$ such that

- multiplication is associative, non-degenerate, and distributive over addition;
- $\mbb G_0$ are
**commuting numbers**and $\mbb G_1$ are**anticommuting numbers**: $\forall\psi_1\in\mbb G_{k_1},\psi_2\in\mbb G_{k_2}: \psi_1\psi_2=\p{-}^{k_1k_2}\psi_2\psi_1;$ - and there is a unity $1\in\mbb G_0$ such that $1+\cdots+1\ne0$ for any finite number of summands.

We then have to have $\forall\psi_1\in\mbb G_{k_1},\psi_2\in\mbb G_{k_2}: \psi_1\psi_2\in\mbb G_{k_1+k_2}.$ Therefore, $\mbb G_0$ is a commutative ring with characteristic zero, and $\mbb G_1$ is a $\mbb G_0$-module. We can then define linear functions with this structure. In this sense, the multiplication on $\mbb G_1$ defines a symplectic bilinear form.

These are not enough to define every property we need for $\mbb G_0$ and $\mbb G_1$. I will introduce more properties as axioms later.

It seems that we need this property as an axiom: for any linear function $\func\lmd{\mbb G_k}{\mbb G_0}$, $\exists!\vphi\in\mbb G_k:\lmd=\p{\psi\mapsto\vphi\psi}.$ I call this property the **first representation property**, analog to the Riez representation theorem. I will call linear functions that maps objects to $\mbb G_0$ **linear functionals**, and the **dual space** of a $\mbb G_0$-module as the set of all linear functionals on it.

With the fist representation property, we can identify $\mbb G_k$ with its dual space so that any **multilinear map (tensor)** have well-defined components. For any $k$-linear map $\func T{\p{\mbb G_1^n}^k}{\mbb G_0}$ (or alternatively called a rank-$k$ tensor on $\mbb G_1^n$), we can write it uniquely in the form
$\fc T{\psi_1,\dots,\psi_k}=\psi_{1i_1}\cdots\psi_{ki_k}a_{i_1\cdots i_k},$ where the components
$T_{i_1\cdots i_k}\in\mbb G_k$, and the dummy indices are summed from $1$ to $n$. Denote the set of all rank-$k$ tensors on $\mbb G_1^n$ as $\mcal T_1^{nk}$.

Similarly, we can define $k$-linear maps $\func T{\p{\mbb G_0^n}^k}{\mbb G_0}$ (or rank-$k$ tensors on $\mbb G_0^n$), whose components are in $\mbb G_0$, and denote the set of all of them as $\mcal T_0^{nk}$. Tensors from $\mcal T_0^{nk}$ and those from $\mcal T_1^{nk}$ can be multiplied and contracted together without any problems. However, the result of these operations may not be in $\mcal T_0^{nk}$ or $\mcal T_1^{nk}$, but some tensor that takes arguments from both $\mbb G_0^n$ and $\mbb G_1^n$.

Here we will need another property as an axiom: for any linear function $\func\lmd{\mbb G_k}{\mbb G_k}$, $\exists!\vphi\in\mbb G_0:\lmd=\p{\psi\mapsto\vphi\psi}.$ I call this property the **second representation property**. This is very similar to the first representation property, but it covers linear endomorphisms on $\mbb G_k$ instead of linear functionals on $\mbb G_k$.

With the second representation property, we can prove that any possible linear endomorphism $J$ on $\mbb G_k^n$ can be written as a unique matrix in $\mbb G_0^{n\times n}$ acting on the components of the argument: $\fc J\psi_i=J_{ij}\psi_j,$ where $J_{ij}\in\mbb G_0$ are called the components of the linear endomorphism $J$. From now on, we do not need to distinguish between matrices in $\mbb G_0^{n\times n}$ and linear endomorphisms on $\mbb G_k^n$.

For a matrix $J\in\mbb G_0^{n\times n}$, we can define its determinant as $\det J\ceq J_{1i_1}\cdots J_{ni_n}\veps^{\b n}_{i_1\cdots i_n}\in\mbb G_0,$ where $\veps^{\b n}\in\mcal T_0^{nn}$ is the Levi-Civita symbol, which is a completely antisymmetric tensor on $\mbb G_0^n$ whose components take values in $\B{-1,0,1}\subset\mbb G_0$.

For any $T\in\mcal T_1^{nk}$, define a degree-$k$ **monomial** on $\mbb G_1^n$ as $\vfunc{M_T}{\mbb G_1^n}{\mbb G_0}{\psi}{\fc T{\psi,\dots,\psi}},$ which is a degree-$k$ homogeneous function on $\mbb G_1^n$. Note that different tensors may correspond to the same monomial. Especially, for any
$k>n$, a degree-$k$ monomial must be trivial (send any input to zero). Also, if there is any pair of indices such that $T$ is symmetric in exchanging them, then the monomial $M_T$ must be trivial. Therefore, we only need to consider the those completely antisymmetric tensors when studying monomials. Denote the set of all completely antisymmetric rank-$k$ tensors on $\mbb G_1^n$ as $\mcal T_1^{n\b k}$, and then the fact that we only need antisymmetric tensors to define monomials can be written as
$M_{\mcal T_1^{n\b k}}=M_{\mcal T_1^{nk}}$.

An **analytic function** $f$ on $\mbb G_1^n$ is defined as a sum of monomials: $\vfunc f{\mbb G_1^n}{\mbb G_0}\psi{\sum_k \fc{M_{T^{\b k}}}\psi},$ where
$T^{\b k}\in\mcal T_1^{n\b k}$, whose components may be referred to as **expansion coefficients**. We do not need to worry about the convergence because this is a finite sum ($k\le n$). Denote the set of all analytic functions on $\mbb G_1^n$ as $\mcal A_n$.

Two properties of analytic functions:

- If $f\in\mcal A_n$, then for any $\dlt\in\mbb G_1^n$, the translation $\p{\psi\mapsto\fc f{\psi+\dlt}}\in\mcal A_n$.
- If $f\in\mcal A_n$, then for any $J\in\mbb G_0^{n\times n}$, the linear transformation in the argument $f\circ J\in\mcal A_n$.

Now we define that a linear function $\int:\mcal A_n\to\mbb G_n$ is called an **integral** if it satisfies the following property: $\forall f\in\mcal A_n,\dlt\in\mbb G_1^n:\int f=\int\psi\mapsto\fc f{\psi+\dlt},$ which intuitively means that an integral is invariant under translation.

With this definition of an integral, we are now interested in the most general form of an integral.

Because $\int$ is linear, we can find its form on monomials, and then sum them up to get the form on all analytic functions. As a linear function on monomials, it must be of the form (by the second representation property) $\int M_{T^{\b k}}=c^{\b k}_{i_1\cdots i_k}T^{\b k}_{i_1\cdots i_k},$ where $c^{\b k}\in\mcal T_0^{n\b k}$ does not depend on $T^{\b k}$. Plug this form into the translational invariance of $\int$, and we have $\begin{align*} c_{i_1\cdots i_k}^{\b k}T^{\b k}_{i_1\cdots i_k} &=\int\psi\mapsto\p{\psi_{i_1}+\dlt_{i_1}}\cdots\p{\psi_{i_k}+\dlt_{i_k}}T^{\b k}_{i_1\cdots i_k}\\ &=\int\psi\mapsto\sum_l\binom kl\psi_{i_1}\cdots\psi_{i_l} \dlt_{i_{l+1}}\cdots\dlt_{i_k}T^{\b k}_{i_1\cdots i_k}\\ &=\sum_l\binom kl c^{\b l}_{i_1\cdots i_l}\dlt_{i_{l+1}}\cdots\dlt_{i_k}T^{\b k}_{i_1\cdots i_k} \end{align*}$ (here the binomial coefficient should be regarded as its image under the natural ring homomorphism from $\bZ$ to $\mbb G_0$, which must be non-zero because $\mbb G_0$ has characteristic zero). Regarding $T^{\b k}$ as the independent variable, this equation is a homogeneous linear equation $\fc{L^{\b k}}{T^{\b k}}=0$ associated with the linear operator $L$ on $\mcal T_1^{n\b k}$ defined as $L^{\b k}_{i_1\cdots i_k}\ceq c^{\b k}_{i_1\cdots i_k}-\sum_l\binom kl c^{\b l}_{i_1\cdots i_l}\dlt_{i_{l+1}}\cdots\dlt_{i_k}.$ For the solution set of the linear equation to be the whole space $\mcal T_1^{n\b k}$, we need $L^{\b k}=0$. Again by the second representation property, we need all the components to vanish (strictly speaking, we need the completely antisymmetric part to vanish, but they are already completely antisymmetric): $\forall k\le n,\dlt\in\mbb G_1^n,i_1,\dots,i_k: c^{\b k}_{i_1\cdots i_k}-\sum_l\binom kl c^{\b l}_{i_1\cdots i_l}\dlt_{i_{l+1}}\cdots\dlt_{i_k}=0.$ The first term cancels with the $l=k$ term in the sum, so this equation does not impose any requirement for $c^{\b k}$ but only impose requirements for $c^{\b l}$ with $l<k$. Then, we can induce on $k$: the equation for $k=0$ does nothing; the equation for $k=1$ requires $c^{\b 0}$ to vanish; the equation for $k=2$, given that $c^{\b 0}$ vanishes, now requires $c^{\b 1}$ to vanish; and so on. For each $k$, the equation additionally requires $c^{\b{k-1}}$ to vanish. Finally, when we reach $k=n$, which is the end of the induction, we require $c^{\b l}$ to vanish for all $l<n$, and there is no requirement for $c^{\b n}$. Therefore, the integral of any monomial is zero except for the degree-$n$ monomial, and thus we only need to consider the $n$th degree term when finding the integral of an analytic function.

Note that $\mcal T_k^{n\b n}=\mbb G_k\veps^{\b n}$ (in other words, the most general form of a completely antisymmetric rank-$n$ tensor on $\mbb G_k^n$ is a constant in $\mbb G_k$ times the Levi-Civita symbol). Therefore, $c^{\b n}_{i_1\cdots i_n}=c_n\veps^{\b n}_{i_1\cdots i_n}, \quad T^{\b n}_{i_1\cdots i_n}=d\veps^{\b n}_{i_1\cdots i_n},$ where $c_n=\in\mbb G_0$ and $d\in\mbb G_n$. The definition of an integral does not impose any requirement for $c_n$, so it can be any element in $\mbb G_0$. For convenience, define $c_n\ceq1$ for all $n$, and then we have $\int M_{d\veps^{\b n}}=\veps^{\b n}_{i_1\cdots i_n}d\veps^{\b n}_{i_1\cdots i_n} =n!\,d,$ where $n!$ is the image of $n!$ under the natural ring homomorphism from $\bZ$ to $\mbb G_0$. The integral of any monomial with its degree different from $n$ is zero, so the integral of any analytic function is just that of its degree-$n$ term: $\int\psi\mapsto\sum_k \fc{M_{T^{\b k}}}\psi=n!\,T^{\b n}_{1\cdots n}.$

Now, for a linear endomorphism $J\in\mbb G_0^{n\times n}$ and an analytic function $f\in\mcal A_n$, consider the integral $\int f\circ J$. We only needs to consider the degree-$n$ monomial term, which is $\fc{M_{T^{\b n}}}{\fc J\psi}=J_{i_1j_1}\psi_{j_1}\cdots J_{i_nj_n}\psi_{j_n}d\veps^{\b n}_{i_1\cdots i_n},$ where $T^{\b n}=d\veps^{\b n}$ is used. Notice that $\veps_{i_1\cdots i_n}J_{i_1j_1}\cdots J_{i_nj_n}$ itself is a rank-$n$ completely antisymmetric tensor on $\mbb G_0^n$, so it can also be written as a constant times $\veps^{\b n}$. By letting $j_1,\dots,j_n$ be $1,\dots,n$ respectively, we see that the constant is just $\det J$. Therefore, $\fc{M_{T^{\b n}}}{\fc J\psi}=\fc{M_{T^{\b n}}}\psi\det J.$ By the linearty of the integral, we have $\forall f\in\mcal A_n:\int f\circ J=\det J\int f.$

Actually, before I wrote my answer, I already know the exterior algebra. In this article, my definition to Grassmann numbers is more abstract and puts the commuting numbers and anticomuuting numbers in more equal footings. This definition is closer to what I intuitively think Grassmann numbers could be.

There are several potential problems in this article:

- Some axioms are given, but I did not prove that they are consistent.
- Some claims are made without proof. They may turn out to be wrong.
- I did not prove that the usual definition of Grassmann numbers (with exterior algebra) can be formulated as a special case of my definition.
- I am not educated in supersymmetry, which is where Grassmann numbers are applied most. I only made my definition comply with the properties of Grassmann numbers that I have learned for doing the path integral of fermionic fields.

There are two major conventions for the metric signature: $\p{+,-,-,-}$ (west coast) and $\p{-,+,+,+}$ (east coast). However, the first convention that I have met in my journey of learning physics is neither of them: the imaginary time. Shortly after, I started using the west coast convention, so I never really used the imaginary time convention seriously. I personally dislike the imaginary time convention, and so do most people in the physics community and history, which is why most modern textbooks use either the west coast or the east coast convention. One of my past physics teachers deemed the imaginary time convention to be a heresy (异端邪说).

However, in some cases, the imaginary time convention can be convenient due to the use of multi-index notation (which is more concise and feature-rich than the Einstein notation). Here is one of such cases: the derivation of the metric in Poincaré coordinates for the anti-de Sitter space.

The $d$-dimensional anti-de Sitter space $\mrm{AdS}_d$ of scale $l$ is defined as the hyperboloid $-l^2=-T_1^2-T_2^2+\sum_{i=1}^{d-1}\p{X^i}^2$ in $M^{d-1,2}$ (the analogue of the Minkowski space, but with signature $d-1,2$). The Poincaré coordinates are defined as $\begin{align*} z&\ceq\fr{l^2}{T_1+X^{d-1}},\\ t&\ceq\fr{lT_2}{T_1+X^{d-1}},\\ x^i&\ceq\fr{lX^i}{T_1+X^{d-1}},&i=1,\ldots,d-2. \end{align*}$

Define $T\ceq T_1$ and $X\ceq X^{d-1}$ just for fun. Then, define two $\p{d-1}$-dimensional multi-indices $Y\ceq\p{\i T_2,X^1,\ldots,X^{d-2}},\quad y\ceq\p{\i t,x^1,\ldots,x^{d-2}}.$

The hyperboloid constraint and the metric (east coast convention) are then $X^2-T^2+Y^2=-l^2,\quad \d s^2=\d X^2-\d T^2+\d Y^2,$ which are equivalently $\p{X+T}\p{X-T}=-l^2-Y^2,\quad\d s^2=\p{\d X+\d T}\p{\d X-\d T}+\d Y^2.$$\p{1}$ The definition of the Poincaré coordinates can be written as $z=\fr{l^2}{X+T},\quad y=\fr zlY,$ or equivalently $X+T=\fr{l^2}z,\quad Y=\fr{ly}z.$$\p{2}$

Substitute Equation 2 into the first equation in Equation 1. Then, we have $X-T=-z-\fr{y^2}z.$$\p{3}$ Differentiate Equation 2 and 3, and we have $\d X+\d T=-\fr{l^2}{z^2}\,\d z,\quad \d X-\d T=-\d z+\fr{y^2}{z^2}\,\d z-\fr{2y}z\,\d y,\quad \d Y=l\p{\fr{\d y}z-\fr y{z^2}\,\d z}.$ Substitute this into the second equation in Equation 1, and we have $\d s^2=-\fr{l^2}{z^2}\d z\p{-\d z+\fr{y^2}{z^2}\,\d z-\fr{2y}z\,\d y}+l^2\p{\fr{\d y}z-\fr y{z^2}\,\d z}^2 =\fr{l^2}{z^2}\p{\d y^2+\d z^2}.$

Finally, substitute back the definition of $y$, and we have the result $\d s^2=\fr{l^2}{z^2}\p{-\d t^2+\sum_{i=1}^{d-2}\p{\d x^i}^2+\d z^2}.$

]]>Exercise 12.5 from *Modern Condensed Matter Physics* (Girvin and Yang, 2019) asks to construct a Gaussian wave packet in the lowest Landau level in the Landau gauge, such that it is localized as closely as possible around some point $\mbf R\ceq\p{R_x,R_y}$.

Actually, we can prove that the smallest wave packet is a Gaussian wave packet. Here is the derivation.

First, for readers who are not familiar with the Landau levels, here is a brief introduction. For an electron confined in the $xy$ plane under a magnetic field $\mbf B=B\bhat z$, its Hamiltonian is $H=\fr1{2m_e}\p{p_x^2+\p{p_y-\fr{eB}cx}^2}$ under the Landau gauge $\mbf A=Bx\bhat y$. Its eigenstates in the position representation are $\fc{\psi_{nk}}{x,y}=\e^{\i ky}\fc{H_n}{\fr xl-kl} \e^{-\p{x-kl^2}^2/2l^2}$ labeled by $n\in\bN$ and $k\in\bR$, where $H_n$ is the Hermite polynomial of degree $n$ and $l\ceq\sqrt{\hbar c/eB}$. States with the same $n$ are degenerate in energy ($E_n=\p{n+1/2}\hbar eB/m_ec$) and make up the $n$th Landau level. The Landau level with $n=0$ is called the lowest Landau level.

The problem, now, is this optimization problem: $\begin{align*} \min_{a_k}\quad&\mel{\Psi}{x^2+y^2}{\Psi}\\ \st\quad&\braket{\Psi}{\Psi}=1,\\ &\mel{\Psi}{x}{\Psi}=R_x,\\ &\mel{\Psi}{y}{\Psi}=R_y \end{align*}$ (optimizing $\a{x^2+y^2}$ is equivalent to optimizing $\sgm_x^2+\sgm_y^2$ because $\a x$ and $\a y$ are both fixed), where $\ket\Psi$ is defined as the state whose position representation is $\fc\Psi{x,y}=\int\d k\,a_k\e^{\i ky}\e^{-\p{x-kl^2}^2/2l^2}.$

Consider the moment-generating function $\begin{align*} \fc M{u,v}&\ceq\mel{\Psi}{\e^{ux+vy}}{\Psi}\\ &=\iint\d x\d y\,\e^{ux+vy} \int\d k\,a_k^*\e^{-\i ky}\e^{-\fr1{2l^2}\p{x-kl^2}^2} \int\d k'\,a_{k'}\e^{\i k'y}\e^{-\fr1{2l^2}\p{x-k'l^2}^2}\\ &=\iint\d k\d k'\,a_k^*a_{k'}\int\d x\,\e^{ ux-\fr1{2l^2}\p{x-kl^2}^2-\fr1{2l^2}\p{x-k'l^2}^2 }\underbrace{\int\d y\,\fc\exp{vy+\i\p{k'-k}y}}_{2\pi\fc\dlt{k'-k-\i v}}\\ &=2\pi\int\d k\,a_k^*a_{k+\i v}\underbrace{\int\d x\,\fc\exp{ ux-\fr1{2l^2}\p{x-kl^2}^2-\fr1{2l^2}\p{x-\p{k+\i v}l^2}^2 }}_{l\sqrt\pi\fc\exp{\fr14l^2\p{4ku+u^2+2\i uv+v^2}}}\\ &=2\pi^{3/2}l\fc\exp{\fr14l^2\p{u^2+2\i uv+v^2}} \int\d k\,a_k^*a_{k+\i v}\e^{kl^2u}\\ &=2\pi^{3/2}l\int\d k\,a_k^*\left( a_k+kl^2a_ku+\i a_k'v+\fr14l^2\p{1+2k^2l^2}a_ku^2 \right.\\&\qquad\qquad\qquad\qquad\left. {}+\fr14\p{l^2a_k-2a_k''}v^2 +\fr\i2l^2\p{a_k+2ka_k'}uv+\cdots \right), \end{align*}$ where $a_k'\ceq\d a_k/\d k$ and $a_k''\ceq\d^2a_k/\d k^2$. On the other hand, we have $\fc M{u,v}=\mel{\Psi}{1+ux+uy+\fr12u^2x^2+\fr12v^2y^2+uvxy+\cdots}{\Psi}.$ Compare the expansion coefficients, and we have $\begin{align*} \braket{\Psi}{\Psi}&=2\pi^{3/2}l\int\d k\,a_k^*a_k,\\ \mel{\Psi}{x}{\Psi}&=2\pi^{3/2}l^3\int\d k\,a_k^*ka_k,\\ \mel{\Psi}{y}{\Psi}&=2\i\pi^{3/2}l\int\d k\,a_k^*a_k',\\ \mel{\Psi}{x^2}{\Psi}&=\fr12\pi^{3/2}l^3\int\d k\,a_k^*\p{1+2k^2l^2}a_k,\\ \mel{\Psi}{y^2}{\Psi}&=\fr12\pi^{3/2}l\int\d k\,a_k^*\p{l^2-2a_k''}a_k. \end{align*}$

Define $\fc\vphi k\ceq a_k\sqrt{2\pi^{3/2}l}$. Define fictitious position and momentum operators acting on $\vphi$ as $\Xi\vphi:k\mapsto k\fc\vphi k,\quad \Pi\vphi:k\mapsto-\i\fc{\vphi'}k.$ Using the constraints of the original optimization problem and abusing the bra–ket notation on $\vphi$, we have $\braket{\vphi}{\vphi}=1,\quad\mel\vphi\Xi\vphi=\fr{R_x}{l^2},\quad \mel\vphi\Pi\vphi=-R_y.$ The objective function then becomes $\mel{\Psi}{x^2+y^2}{\Psi}=\fr12l^2+\mel{\vphi}{\mcal H}{\vphi},$ where $\mcal H\ceq \Pi^2/2+l^4\Xi^2/2$ is a fictitious Hamiltonian, which is the Hamiltonian of a harmonic oscillator with mass $1$ and angular frequency $\omg\ceq l^2$.

The optimization problem can now be re-stated in terms of $\ket\vphi$ as $\begin{align*} \min_{\ket\vphi}\quad&\mel\vphi{\mcal H}{\vphi}\\ \st\quad&\braket\vphi\vphi=1,\quad\mel\vphi\Xi\vphi=R_x/\omg,\quad\mel\vphi\Pi\vphi=-R_y. \end{align*}$ Physically, this means that we want to find the state of a harmonic oscillator with the given expectation values of position and momentum and the lowest energy. To find it, we can use Hisenberg’s uncertainty principle: $\begin{align*} \a{\mcal H}&=\fr12\a{\Pi^2}+\fr12\omg^2\a{\Xi^2}\\ &=\fr12\p{\a\Pi^2+\sgm_\Pi^2}+\fr12\omg^2\p{\a{\Xi^2}+\sgm_\Xi^2}\\ &=\fr12\sgm_\Pi^2+\fr12\omg^2\sgm_\Xi^2+\fr12R_y^2+\fr12 R_x^2\\ &\ge\omg\sgm_\Pi\sgm_\Xi+\fr12R^2 \ge\fr12\omg+\fr12R^2. \end{align*}$ The equality in the first “$\ge$” is achieved when $\sgm_\Pi=\omg\sgm_\Xi$, and that in the second “$\ge$” is achieved when the uncertainty principle is saturated. As we know from quantum mechanics, the coherent state of a harmonic oscillator satisfies both conditions. The wavefunction of this state is $\fc\vphi k=\p{\fr\omg\pi}^{1/4} \fc\exp{-\fr12\omg\p{k-\fr{R_x}{\omg}}^2-\i R_yk}.$ Express the final result in terms of $a_k$: $a_k=\fr1{\sqrt2\pi}\e^{-\i kR_y}\e^{-\fr1{2l^2}\p{R_x-kl^2}^2}.$ We may work out the integral to get the wave function of the wave packet: $\fc{\Psi}{x,y}=\fr1{\sqrt{2\pi}l}\fc\exp{-\fr1{4l^2}\p{ \p{x-R_x}^2+\p{y-R_y}^2-2\i\p{x+R_x}\p{y-R_y} }}.$ This is a Gaussian wave packet centered at $\mbf R$ with covariance matrix $\opc{Diag}{l^2,l^2}$.

The optimal wave packet is indeed Gaussian. This makes me curious about whether this is a coincidence or not.

Another thing worth noting is that this result is actually the Dirac delta wave function peaking at $\mbf R$ projected into the lowest Landau level. This was actually my first idea to solve the problem. I was like: well, isn’t the Dirac delta the smallest possible wave packet by all means? If the basis is complete, I can surely combine them into a Dirac delta, and it would be very easy to work out $a_k$ in this case. Then, I was like: nah, merely a single Landau level is not complete, so I cannot do that anyway. I then did not even bother to proceed with this approach and went on to trying other methods. It turns out that this approach is actually correct—at least it gives the same result as the correct approach.

]]>The unit system used in this article is Hartree atomic units: $m_\mrm e=k_\mrm B=\hbar=4\pi\veps_0=e=1$, where $m_\mrm e$ is the electron mass.

In this unit system, the Bohr radius is $a_\mrm B=1$, which is of angstrom order. Therefore, I will use $10^{10}$ as the order of macroscopic lengths. The Rydberg unit of energy is $\mrm{Ry}=1/2$, which is of electronvolt order. Therefore, I will use $10^3$ as the order of inverse room temperature.

One can adjust the units to get results for the cases of other hydrogen-like atoms: use $Z^2/4\pi\veps_0=1$ instead of $4\pi\veps_0=1$, where $Z$ is the atomic number.

In this article, I also assume that the mass of the nucleus is infinite. If you want more accuracy, you can use $m_\mrm Nm_\mrm e/\p{m_\mrm N+m_\mrm e}=1$ instead of $m_\mrm e=1$, where $m_\mrm N$ is the mass of the nucleus.

I will mainly be working with the inverse temperature $\beta\ceq1/k_\mrm BT$, where $T$ is the temperature. However, I will still use “temperature” often to give some physical intuition. To avoid confusion in the context of using $\beta$ and in appearance of negative temperature, I would avoid using phrases like “high temperature” and “low temperature”. Instead, here are some terminologies that I am going to use:

- “Cold (positive) temperature” means $\beta\to+\infty$.
- “Hot positive temperature” means $\beta\to0^+$.
- “Cold negative temperature” means $\beta\to0^-$.
- “Hot negative temperature” means $\beta\to-\infty$.

The energy levels of a hydrogen atom are (ignoring fine structures etc.) $E_n=-1/2n^2$, with each energy level labeled by $n\in\bZ^+$, and each energy level has $g_n\ceq n^2$ degeneracy (ignoring spin degeneracy, which merely contributes to an overall factor of the partition function). The partition function is $\fc Z\beta\ceq\sum_{n=1}^\infty g_n\e^{-\beta E_n} =\sum_{n=1}^\infty n^2\e^{\beta/2n^2},$$\p{1}$ which diverges for any $\beta\in\bC$ (of course, normally we can only have $\beta\in\bR$, but the point of saying that it diverges for any complex $\beta$ is that there is no way we can analytically continue the function to get a finite result). Does this mean that statistical mechanics breaks down for this system? Not necessarily. Actually, there are multiple ways we can tackle this divergence.

One should notice that, although this article concentrates on regularizing partition functions and that of the hydrogen atom in particular, all the methods are valid for more general divergent sums.

Here is a sentence that is quoted by many literatures on diverging series, so I want to quote it, too:

It translates to “Divergent series are in general deadly, and it is shameful that anyone dare to base any proof on them.”

A physicist always tell you that one should not be afraid of infinities. Instead, one should look at where the infinity comes out from the seemingly physical model, where there is something sneakily unphysical which ultimately leads to this unphysical divergence. In our case, the divergence comes from high energy levels. It is then a good time to question whether those high energy levels are physical.

There is a radius associated with each energy level in the sense of the Bohr model: $r_n=n^2$. When $r_n\sim L\ceq10^{10}$ (which happens at $n\sim\Lmd\ceq10^5$), the orbit is really microscopic now, and the interaction between the electron and the “box” that contains the whole experimental setup is now having significant effects. Or, if there is not a box at all, we can use the size of the universe instead, which is about $r_n\sim L\ceq10^{36}$ ($\Lmd\ceq10^{18}$). Use the model of particle in a box for energy levels higher than $n=\Lmd$, and we have $Z=\sum_{n=1}^\Lmd n^2\e^{\beta/2n^2} +\sum_{n_x,n_y,n_z=1}^\infty\fc\exp{-\beta\fr{\p{n_x^2+n_y^2+n_z^2}\pi^2}{2L^2}},$ where $L$ is the side length of the box (assuming that the box is cubic). If $L$ is very large, we can approximate the second term as a spherically symmetric integral over the first octant to get $L^3\p{2\pi\beta}^{-3/2}$.

This is actually the result for Boltzmann ideal gas, so it should be familar, but I still write down the calculation here for completeness.

We can approximate $\sum_{n_x,n_y,n_z=1}^\infty\fc\exp{-\beta\fr{\p{n_x^2+n_y^2+n_z^2}\pi^2}{2L^2}} \approx I\ceq\int_0^\infty\d^3n\,\fc\exp{-\beta\fr{\p{n_x^2+n_y^2+n_z^2}\pi^2}{2L^2}},$ where $\int_0^\infty\d^3n$ means $\int_0^\infty\int_0^\infty\int_0^\infty\d n_x\,\d n_y\,\d n_z$. We can then change the integral to spherical coordinates: $I=\int_0^\infty\fr184\pi n^2\,\d n\,\fc\exp{-\beta\fr{n^2\pi^2}{2L^2}} =\fr{L^3}{4\pi^2\beta^{3/2}}\int_{-\infty}^\infty\d n\,n^2\e^{-n^2/2},$ where the factor of $1/8$ is because we only integrate in the first octant, and the second step utilizes the symmetry of the integrand and redefines the integrated variable. This integral is than a familiar Gaussian integral of order unity. The value of it is not important for later discussion because all the arguments that follow only uses orders of magnitude, but I tell you it is $\sqrt{2\pi}$, which can be evaluated by integrating by parts once and utilizing the famous $\int_{-\infty}^{\infty}\e^{-n^2/2}\,\d n=\sqrt{2\pi}$. The final result is $I=L^3\p{2\pi\beta}^{-3/2}$.

Is this an overestimation or underestimation? It is actually an overestimation. Draw a picture of $\e^{-n^2/2}$ to convince yourself of this. We do not need to estimate how large the error is, though, because we will see that we only need an upper bound to get the arguments we need.

For the first term, we need to consider how the magnitude of the summand changes with $n$. The minimum value of the summand is at $n=\sqrt{\beta/2}$. At room temperature, we have $\beta\sim10^3$, so $\sqrt{\beta/2}$ is well between $1$ and $\Lmd$. Therefore, the largest term is either $n=1$ or $n=\Lmd$. The former is $\e^{\beta/2}$, which is of order $10^{217}$, while the latter is $\Lmd^2$, which is of order $10^{36}$ for the case of the size of the universe. We may then be interested in the $n=2$ term $4\e^{\beta/8}$, which is of order $10^{54}$. This is much larger than the $n=\Lmd$ term but much smaller than the $n=1$ term, so it is second largest term in the sum.

An upper bound of the summation is given by replacing every term except the largest term by the second largest term, which gives $Z<\underbrace{\e^{\beta/2}}_{10^{217}} +\underbrace{\p{\Lmd-1}4\e^{\beta/8}}_{10^{72}}+\underbrace{L^3\p{2\pi\beta}^{-3/2}}_{10^{48}}\approx\e^{\beta/2}.$ Therefore, the $n=1$ term dominates the entire partition function. This means that the hydrogen atom is extremely likely to be in the ground state (despite the seeming divergence of the partition function). This is intuitive. The probability of the system not being in the ground state is of order $10^{-55}$ for the size of the universe and $10^{-158}$ for a typical macroscopic experiment.

The usage of the model of particle in a box for energy levels $n>\Lmd$ gives good enough arguments and results, but one may want to question whether this is appropriate.

What happens if you actually put a hydrogen atom in a box (for simplicity, make the box spherically symmetric)? More accurately, consider the quantum mechanical problem in spherically symmetric potential $V$ such that $V\sim-r^{-1}$ for small $r$ but grows fast and high enough at large $r$ so that the partition function for bound states is convergent. This is called a confined hydrogen atom. A book chapter The Confined Hydrogen Atom Revisited discusses this problem in detail and cited several papers that did the calculations about the energy levels.

By analyzing the orders of magnitude, we see that we actually do not lose much if we just simply cut off the sum at $n=\Lmd$. This corresponds to a regularization method called the simple cutoff: it replaces the infinite sum by a finite partial sum. This can be generalized a little by considering a more general cutoff function $\chi$ such that $\lim_{x\to0^+}\fc \chi x=1$. Then, an infinite sum $\sum_{n=1}^\infty\fc fn$ can be written as $\sum_{n=1}^\infty\fc fn=\lim_{\lmd\to0^+}\sum_{n=1}^\infty\fc fn\fc\chi{\lmd n}.$ The simple cutoff is then the case where $\fc \chi x\ceq\fc\tht{1-x}$ and $\lmd\ceq1/\Lmd$, where $\tht$ is the Heaviside step function. For converging series, this gives the same result as the original sum thanks to the dominated convergence theorem.

For diverging series, this may give a finite result. For example, for $\fc fn\ceq\p{-1}^nn^k$, this method gives $-\fc\eta{-k}$ for any complex $k$ and any smooth enough $\chi$, where $\eta$ is the Dirichlet eta function. Here is a check for the special case $\fc\chi x\ceq\e^{-x}$ (equivalent to the Abel summation). By definition of the polylogarithm, we have $\sum_{n=1}^\infty\p{-1}^nn^k\e^{-\lmd n}=\fc{\mrm{Li}_{-k}}{-\e^{-\lmd}}.$ Now, substitute $\lmd=0$, and utilizing the identity $\fc{\mrm{Li}_s}{-1}=-\fc\eta s$, we have the result $-\fc\eta{-k}$.

You may wonder what is the case for $\fc fn\ceq n^k$, which is also a diverging series, and it looks much like the case above. However, the limit at $\lmd\to0^+$ simply does not exist when $\Re k\ge-1$ (i.e., when the series diverges). This is because we have $\fc{\mrm{Li}_s}1=\fc\zeta s$ only for $\Re s>1$, where $\zeta$ is the Riemann zeta function, but it is undefined for other values of $s$. If you analytically continue the result, you will get the famous Rieman zeta function.

However, although this series may converge for any positive $\lmd$, the limit as $\lmd\to0^+$ may not exist. If it diverges because $\fc fn$ grows too fast (or decays too slowly) as $n\to\infty$, then we should expect that the sum also tends to infinity as $\lmd\to0^+$. Assume that we can characterize this divergence by a Laurent series: $\sum_{n=1}^\infty\fc fn\fc\chi{\lmd n} =\sum_{k=-\infty}^\infty\gma_k\lmd^k.$$\p{2}$ If the $\lmd\to0^+$ limit converge, we would expect $\gma_{k<0}$ to be zero, and then the result is simply $\gma_0$. Therefore, we may also want only $\gma_0$ when the limit does not exist. To pick out $\gma_0$, utilize the residue theorem: $\sum_{n=1}^\infty\fc fn=\fr1{2\pi\i}\oint\fr{\d\lmd}\lmd \sum_{n=1}^\infty\fc fn\fc\chi{\lmd n},$$\p{3}$ where the domain of $\lmd$ is now analytically continued from $\bR^+$ to a deleted neighborhood of $0$. Equation 3 is then a generalized version of Equation 2.

Notice that I have been super slippery in math in the discussion. For example, the Laurent series may not exist at all, and the analytic continuation may not be possible at all; even if they exist, the $\lmd\to0^+$ limit may also be different from $\gma_0$. However, I may claim that we should be able to select smooth enough $\chi$ for all of these to work, and the results will be independent of the choice of $\chi$ as long as Equation 3 works in this form.

Particularly, one can rigorously prove that for $\fc fn\ceq n^k$, the sum obtained by this precedure is $\fc\zeta{-k}$, where $\zeta$ is the Riemann zeta function, as long as $x^k\fc\chi x$ has bounded $\p{k+2}$th derivative and the sum converges. This is proven in an interesting blog article.

In some cases, one may discover that $\sum_n\fc fn\fc\chi{\lmd n}$ is not analytic when $\lmd\to0^+$ so that the Laurent series expansion is not possible. An example is $E_n\ceq\ln\ln n$ (for $n\ge2$) with no degeneracies (this system also has a diverging partition function for any complex $\beta$). In this case, if you try to use the cutoff function $\fc\chi x\ceq\e^{-x}$, the sum goes like $\p{-\ln\lmd}^{-\beta}/\lmd$ instead of analytically when $\lmd\to0^+$. Proving this is simple. We have $Z_\lmd=\sum_{n=2}^\infty\e^{-\lmd n}\p{\ln n}^{-\beta} \approx\int_2^\infty\e^{-\lmd n}\p{\ln n}^{-\beta}\d n =\fr1{\lmd\p{-\ln\lmd}^\beta}\int_{2\lmd}^\infty \fr{\e^{-x}\,\d x}{\p{1-\ln x/\ln\lmd}^\beta},$ where the last step uses the substitution $x\ceq\lmd n$. Using the binomial theorem, we have $Z_\lmd\approx\fr1{\lmd\p{-\ln\lmd}^\beta}\int_{2\lmd}^\infty\d x\,\e^{-x} \sum_{k=0}^\infty\binom{-\beta}k\p{\fr{\ln x}{-\ln\lmd}}^k,$ where $\binom{-\beta}k$ is the binomial coefficient. Note that $\fc{\Gma^{\p k}}z=\int_0^\infty x^{k-1}\p{\ln x}^k\e^{-zx}\d x$, where $\Gma^{\p k}$ is the $k$th derivative to the Euler Gamma function, so the integral for $x$ gives a factor $\fc{\Gma^{\p k}}1$ in the limit of $\lmd\to0^+$. Therefore, $Z_\lmd\approx\fr1{\lmd\p{-\ln\lmd}^\beta},$ where only the $k=0$ term in the sum is retained for the leading contribution as $\lmd\to0^+$.

However, for any $k\in\bZ^+$, one can always choose functions $h,\chi$ so that the sum $\sum_n\fc fn\fc\chi{\lmd\fc hn}$ goes like $\lmd^{-k}$ as $\lmd\to0^+$. For example, for $\fc\chi x\ceq\e^{-x}$ (equivalent to the Abelian mean or the heat-kernel regularization), we have $Z_\lmd\approx\int_{n_0}^\infty\e^{-\lmd\fc hn}\fc fn\d n =\int_{\lmd\fc f{n_0}}^\infty\e^{-x}\fc f{\fc{h^{-1}}{\fr x\lmd}}\fr{\d x}{\lmd\fc{h'}{\fc{h^{-1}}{\fr x\lmd}}}.$ We can choose $\fc hn\ceq\p{\int\fc fn\d n}^{1/k}$ so that $\fc f{\fc{h^{-1}}{\fr x\lmd}}=k\p{\fr x\lmd}^{k-1}\fc{h'}{\fc{h^{-1}}{\fr x\lmd}}.$ Therefore, as $\lmd\to0^+$, we have $${Z}_{\lambda}\approx \frac{1}{\lambda}{\int}_{\lambda f\text{\hspace{0.17em}\u2063}\left({n}_{0}\right)}^{}$$