This article is translated from a Chinese article on my Zhihu account. The original article was posted at 2021-04-25 10:06 +0800.
First, define the Lorenz curve: it is the curve that consists of all points such that the poorest portion of population in the country owns portion of the total wealth.
The Gini coefficient is defined as the area between the Lorenz curve and the line divided by the area enclosed by the three lines , , and .
Now, suppose the wealth distribution in the country is , where is the portion of population that has wealth in the range .
Then, the Lorenz curve is the graph of the function defined as
is the cumulative distribution function of , and
is the average wealth of the population, which is just ( is a random variable such that ).
Then, the Lorenz curve is
According to the definition of the Gini coefficient,
Interchange the order of integration, and we have
Substitute Equation 1 into the above equation, and we have
Now here is the neat part. Separate it into two parts, and write them in double integrals:
Interchange the order of integration of the second term, and we have
where and are two independent random variables with being their respective distribution functions: .
By this result, we can easily see how the Gini coefficient represents the statistical dispersion.
We can apply similar tricks to the variance .
Separate the first into two halves, and write the altogether three terms in double integrals:
Then we can derive the relationship between the Gini coefficient and the variance: