This article is translated from a Chinese article on my Zhihu account. The original article was posted at 2021-04-25 10:06 +0800.


First, define the Lorenz curve: it is the curve that consists of all points (u,v)(u,v) such that the poorest uu portion of population in the country owns vv portion of the total wealth.

The Gini coefficient G/μG/\mu is defined as the area between the Lorenz curve and the line u=vu=v divided by the area enclosed by the three lines u=vu=v, v=0v=0, and u=1u=1.

Now, suppose the wealth distribution in the country is p(X)p(X), where p ⁣(x)dxp\!\left(x\right)\mathrm dx is the portion of population that has wealth in the range [x,x+dx][x,x+\mathrm dx].

Then, the Lorenz curve is the graph of the function gg defined as g(F(x))=1μxtp ⁣(t)dt,g(F(x))=\frac1\mu\int_{-\infty}^xtp\!\left(t\right)\mathrm dt, where F ⁣(x)xp ⁣(t)dtF\!\left(x\right)\coloneqq\int_{-\infty}^xp\!\left(t\right)\mathrm dt is the cumulative distribution function of p(X)p(X), and μ+tp ⁣(t)dt\mu\coloneqq\int_{-\infty}^{+\infty}tp\!\left(t\right)\mathrm dt (1)(1) is the average wealth of the population, which is just E[X]\mathrm E[\mathrm X] (XX is a random variable such that Xp(X)X\sim p(X)).

Then, the Lorenz curve is v=g(u)1μF1(u)tp ⁣(t)dt.v=g(u)\coloneqq\frac1\mu\int_{-\infty}^{F^{-1}(u)}tp\!\left(t\right)\mathrm dt.

According to the definition of the Gini coefficient, G2μ01(ug(u))du=μ2μ01g ⁣(u)du=μ2u=01t=F1(u)tp ⁣(t)dtdu.\begin{align*} G&\coloneqq2\mu\int_0^1\left(u-g(u)\right)\mathrm du\\ &=\mu-2\mu\int_0^1g\!\left(u\right)\mathrm du\\ &=\mu-2\int_{u=0}^1\int_{t=-\infty}^{F^{-1}(u)}tp\!\left(t\right)\mathrm dt\,\mathrm du. \end{align*} Interchange the order of integration, and we have G=μ2t=+u=F(t)1tp ⁣(t)dtdu=μ2+(1F(t))tp ⁣(t)dt.\begin{align*} G&=\mu-2\int_{t=-\infty}^{+\infty}\int_{u=F(t)}^1tp\!\left(t\right)\mathrm dt\,\mathrm du\\ &=\mu-2\int_{-\infty}^{+\infty}\left(1-F(t)\right)tp\!\left(t\right)\mathrm dt. \end{align*} Substitute Equation 1 into the above equation, and we have G=+2tF ⁣(t)p ⁣(t)dtμ=+(2tF ⁣(t)1)tp ⁣(t)dt=01(2u1)F1 ⁣(u)du.\begin{align*} G&=\int_{-\infty}^{+\infty}2tF\!\left(t\right)p\!\left(t\right)\mathrm dt-\mu\\ &=\int_{-\infty}^{+\infty}\left(2tF\!\left(t\right)-1\right)tp\!\left(t\right)\mathrm dt\\ &=\int_0^1\left(2u-1\right)F^{-1}\!\left(u\right)\mathrm du. \end{align*} Now here is the neat part. Separate it into two parts, and write them in double integrals: G=01uF1 ⁣(u)du01(1u)F1 ⁣(u)du=u2=01u1=0u2F1 ⁣(u2)du1du2u1=01u2=u11F1 ⁣(u1)du1du2.\begin{align*} G&=\int_0^1uF^{-1}\!\left(u\right)\mathrm du-\int_0^1\left(1-u\right)F^{-1}\!\left(u\right)\mathrm du\\ &=\int_{u_2=0}^1\int_{u_1=0}^{u_2}F^{-1}\!\left(u_2\right)\mathrm du_1\,\mathrm du_2 -\int_{u_1=0}^1\int_{u_2=u_1}^1F^{-1}\!\left(u_1\right)\mathrm du_1\,\mathrm du_2. \end{align*} Interchange the order of integration of the second term, and we have G=u2=01u1=0u2(F1 ⁣(u2)F1 ⁣(u1))du1du2=12u2=01u1=01F1 ⁣(u2)F1 ⁣(u1)du1du2=12++x2x1p ⁣(x1)p ⁣(x2)dx1dx2=12E ⁣[X2X1],\begin{align*} G&=\int_{u_2=0}^1\int_{u_1=0}^{u_2}\left(F^{-1}\!\left(u_2\right)-F^{-1}\!\left(u_1\right)\right)\mathrm du_1\,\mathrm du_2\\ &=\frac12\int_{u_2=0}^1\int_{u_1=0}^1\left|F^{-1}\!\left(u_2\right)-F^{-1}\!\left(u_1\right)\right|\mathrm du_1\,\mathrm du_2\\ &=\frac12\int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty}\left|x_2-x_1\right|p\!\left(x_1\right)p\!\left(x_2\right)\mathrm dx_1\,\mathrm dx_2\\ &=\frac12\mathrm E\!\left[\left|X_2-X_1\right|\right], \end{align*} where X1X_1 and X2X_2 are two independent random variables with pp being their respective distribution functions: (X1,X2)p ⁣(X1)p ⁣(X2)\left(X_1,X_2\right)\sim p\!\left(X_1\right)p\!\left(X_2\right).

By this result, we can easily see how the Gini coefficient represents the statistical dispersion.

We can apply similar tricks to the variance σX2\sigma_X^2. σX2=E ⁣[X2]E ⁣[X]2=+t2p ⁣(t)dt(+tp ⁣(t)dt)2=01F1 ⁣(u)2du(01F1 ⁣(u)du)2.\begin{align*} \sigma_X^2&=\mathrm E\!\left[X^2\right]-\mathrm E\!\left[X\right]^2\\ &=\int_{-\infty}^{+\infty}t^2p\!\left(t\right)\mathrm dt -\left(\int_{-\infty}^{+\infty}tp\!\left(t\right)\mathrm dt\right)^2\\ &=\int_0^1F^{-1}\!\left(u\right)^2\,\mathrm du -\left(\int_0^1F^{-1}\!\left(u\right)\mathrm du\right)^2. \end{align*} Separate the first into two halves, and write the altogether three terms in double integrals: σX2=1201F1 ⁣(u2)2du201du1= 01F1 ⁣(u1)du101F1 ⁣(u2)du2= +1201F1 ⁣(u1)2du101du2=120101(F1 ⁣(u2)22F1 ⁣(u1)F1 ⁣(u2)+F1 ⁣(u1)2)du1du2=12++(x2x1)2p ⁣(x1)p ⁣(x2)dx1dx2=12E ⁣[(X2X1)2].\begin{align*} \sigma_X^2&=\frac12\int_0^1F^{-1}\!\left(u_2\right)^2\,\mathrm du_2\int_0^1\mathrm du_1\\ &\phantom{=~}{}-\int_0^1F^{-1}\!\left(u_1\right)\mathrm du_1\int_0^1F^{-1}\!\left(u_2\right)\mathrm du_2\\ &\phantom{=~}{}+\frac12\int_0^1F^{-1}\!\left(u_1\right)^2\,\mathrm du_1\int_0^1\mathrm du_2\\ &=\frac12\int_0^1\int_0^1 \left(F^{-1}\!\left(u_2\right)^2-2F^{-1}\!\left(u_1\right)F^{-1}\!\left(u_2\right)+F^{-1}\!\left(u_1\right)^2\right) \mathrm du_1\,\mathrm du_2\\ &=\frac12\int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty} \left(x_2-x_1\right)^2p\!\left(x_1\right)p\!\left(x_2\right)\mathrm dx_1\,\mathrm dx_2\\ &=\frac12\mathrm E\!\left[\left(X_2-X_1\right)^2\right]. \end{align*} Then we can derive the relationship between the Gini coefficient and the variance: 2σX24G2=σX2X22.2\sigma_X^2-4G^2=\sigma_{\left|X_2-X_2\right|}^2.