Given your probability of breaking the combo at each note, what is the probability distribution of your max combo in the rhythm game chart? I considered the problem seriously!
As a rhythm game player, I often wonder what my max combo will be in my next play. This is a rather unpredictable outcome, and what I can do is to try to conclude a probability distribution of my max combo.
For those who are not familiar with rhythm games and also to make the question clearer, I state the problem in a more mathematical setting.
Consider a random bit string of length n∈N, where each bit is independent and has probability Y∈[0,1] of being 1. Let Pn,k(Y) be the probability that the length of the longest all-1 substring of the bit string is k∈N (where obviously Pn,k(Y) is nonzero only when k≤n). What is the expression of Pn,k(Y)?
A more interesting problem to consider is what the probability distribution tends to be when n→∞. Define the random variable κ:=k/n where k is the length of the longest all-1 substring. Define a parameter y:=Yn (this parameter is held constant while n→∞). Define the probability distribution function of κ as
f(y,κ):=n→∞lim(n+1)Pn,κn(yn1).
(1)
What is the expression of f(y,κ)?
Notation
Notation for integer range: a…b denotes the integer range defined by the ends a (inclusive) and b (exclusive), or in other words {a,a+1,…,b−1}. It is defined to be empty if a≥b. The operator … has a lower precedence than + and − but a higher precedence than ∈.
The notation a..b denotes the inclusive integer range {a,a+1,…,b}. It is defined to be empty if a>b.
The case for finite n
A natural approach to find Pn,k is to try to find a recurrence relation of Pn,k for different n and k, and then use a dynamic programming (DP) algorithm to compute Pn,k for any given n and k.
The first DP approach
For a rhythm game player, the most straightforward way of finding k for a given bit string is to track the current combo, and update the max combo when the current combo is greater than the previous max combo.
To give the current combo a formal definition, denote each bit in the bit string as bi, where i∈0…n. Define the current combo ri as the length of the longest all-1 substring of the bit string ending before (exclusive) i (so ri=0 if bi−1=0, which is callled a combo break):
ri:=max{r∈0..i∣∀j∈i−r…i:bj=1},
where i∈0..n.
Now, use three numbers (n,k,r) to define a DP state. Denote Pn,k,r to be the probability that the max combo is kand the final combo (rn) is r. Then, consider a transition from state (n,k,r) to state (n+1,k′,r′) by adding a new bit bn to the bit string. There are two cases:
If bn=0 (has 1−Y probability), then this means a combo break, so we have r′=0 and k′=k.
If bn=1 (has Y probability), then the combo continues, so we have r′=r+1. The max combo needs to be updated if needed, so we have k′=max(k,r′).
However, in actual implementation of the DP algorithm, we need to reverse this transition by considering what state can lead to the current state (n,k,r) (to use the bottom-up approach).
First, obviously in any possible case r∈0..k (currently we only consider the cases where n>k>0). Divide all those cases into three groups:
If r=0, this is means a combo break, so the last bit is 0, and the previous state can have any possible final combo r′. Therefore, it can be transitioned from any (n−1,k,r′) where r′∈0..k. For each possible previous state, the probability of the transition to this new state is 1−Y.
If r∈1..k−1, this means the last bit is 1, the previous final combo is r−1, and the previous max combo is already k. Therefore, the previous state is (n−1,k,r−1), and the probability of the transition is Y.
If r=k, this means the max combo may (or may not) have been updated. In either case, the previous final combo is r−1=k−1.
If the max combo is updated, the previous max combo must be k−1 because it must not be less than the previous final combo k−1 and must be less than the new max combo k. Therefore, the previous state is (n−1,k−1,k−1), and the probability of the transition is Y.
If the max combo is not updated, the previous max combo is the same as the new one, which is k. Therefore, the previous state is (n−1,k,k−1), and the probability of the transition is Y.
Therefore, we can write a recurrence relation that is valid when n>k>0:
However, there are also other cases (mostly edge cases) because we assumed n>k>0. Actually, in the meaningfulness condition n≥k≥r≥0 (necessary condition for Pn,k,r to be nonzero), there are three inequality that can be altered between a less-than sign or an equal sign, so there are totally 23=8 cases. Considering all those cases (omitted in this article because of the triviality), we can write a recurrence relation that is valid for all n,k,r, covering all the edge cases:
Note that the probabilities related to note count n only depend on those related to note count n−1 and that the probabilities related to max combo k and final combo r only depend on those related to either less max combo than k or less final combo than r (except for the case n>k>r=0, which can be specially treated before the current iteration of k actually starts), so for the bottom-up DP we can reduce the spatial complexity from O(n3) to O(n2) by reducing the 3-dimensional DP to a 2-dimensional one. What needs to be taken care of is that the DP table needs to be updated from larger k and r to smaller k and r instead of the other way so that the numbers in the last iteration in n are left untouched while we need to use them in the current iteration.
After the final iteration in n finishes, we need to sum over the index r to get the final answer:
Pn,k=r=0∑kPn,k,r.
Writing the code for the DP algorithm is then straightforward. Here is an implementation in Ruby. In the code, dp[k][r] means Pn,k,r in the nth iteration.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
## Returns an array of size m+1,## with the k-th element being the probability P_{m,k}.defcombom(1..m).each_with_object[[1]]do|n,dp|dp[n]=[0]*n+[Y*dp[n-1][n-1]]# n = k > 0(n-1).downto1do|k|# n > k > 0dpk0=(1-Y)*dp[k].sumdp[k][k]=Y*(dp[k-1][k-1]+dp[k][k-1])# n > k = r > 0(k-1).downto(1){|r|dp[k][r]=Y*dp[k][r-1]}# n > k > r > 0dp[k][0]=dpk0# n > k > r = 0enddp[0][0]*=1-Y# n > k = r = 0end.map&:sumend
Because of the three nested loops, the time complexity of the DP algorithm is O(n3).
The second DP approach
Here is an alternative way to use DP to solve the problem. Instead of building a DP table with the k,r indices, we can build a DP table with the n,k indices.
First, we need to rewrite the recurrence relation of Pn,k instead of that of Pn,k,r. We then need to try to express Pn,k,r in terms of Pn,k terms. The easiest part is the case where n≥k=r=0. By recursively applying Equation 2 to Pn,0,0, we have
We can then substitute Equation 3 and 4 into the above equation. The substitution of Equation 3 can be done without a problem, but the substitution of Equation 4 requires some care because of the different cases.
If n−k>k, then only the case n−r>k in Equation 4 will be involved in the summation.
If n−k≤k, then both cases in Equation 4 will be involved in the summation. To be specific, for j∈1..2k−n+1, we need the case n−r≤k in Equation 4 (where the summed terms are just zero and can be omitted); for other terms in the summation, we need the other case.
Considering both cases, we may realize that we can just modify the range of the summation to j∈max(1,2k−n+1)..k and adopt the case n−r>k in Equation 4 for all terms in the summation. Therefore, we have
where in the last line we changed the summation index to k′:=k−j+1 to simplify it. Because Pn−k−1,0=Pn−k−1,0,0=(1−Y)n−k−1 according to Equation 3, we can combine the two terms into one summation to get the final result for n>k=r>0:
Pn,k,k=Yk(1−Y)k′=0∑min(k,n−k−1)Pn−k−1,k′.
(5)
Noticing the obvious fact that ∑k=0nPn,k=1, the above equation can be simplified, when k≥n−k−1, to
Pn,k,k=Yk(1−Y).
(6)
This simplification is not specially useful, but it can be used to simplify the calculation in the program.
Then, for n>k>0, express Pn,k in terms of Pn,k,r by summing over r, and substitute previous results:
Then, we can write the program to calculate Pn,k:
1
2
3
4
5
6
7
8
9
10
## Returns an array of size m+1,## with the k-th element being the probability P_{m,k}.defcombom(1..m).each_with_object[[1]]do|n,dp|dp[n]=(1..n-1).each_with_object[(1-Y)**n]do|k,dpn|dpn[k]=(1-Y)*(Y**k*(0..[k,n-k-1].min).sum{dp[n-k-1][_1]}+(0..[k-1,n-k-1].min).sum{Y**_1*dp[n-_1-1][k]})enddp[n][n]=Y**nend.lastend
This algorithm has the same (asymptotic) space and time complexity as the previous one.
Polynomial coefficients
We have wrote programmes to calculate probabilities Pn,k(Y) based on given Y, which we assumed to be a float number. However, float numbers have limited precision, and the calculation may be inaccurate. Actually, the calculation can be done symbolically.
The probability Pn,k is a polynomial of degree (at most) n in Y, and the coefficients of the polynomial are integers. This can be easily proven by using mathematical induction and utilizing Equation 7. Therefore, we can calculate the coefficients of the polynomial Pn,k(Y) instead of calculate the value directly so that we get a symbolic but accurate result.
Both the two DP algorithms above can be modified to calculate the coefficients of the polynomial. Actually, we can define Y to be a polynomial object that can do arithmetic operations with other polynomials or numbers, and then the programmes can run without any modification. Here, I will modify the second DP algorithm to calculate the coefficients of the polynomial.
We can also utilize Equation 6 to simplify the calculation. Considering the edge cases involved in min(k,n−k−1) and min(k−1,n−k−1), there are three cases we need to consider:
Case k>n−k−1: Equation 6 can be applied, and r is summed to n−k−1.
Case k=n−k−1 (can only happen when n is odd): Equation 6 can be applied, and r is summed to k−1.
Case k<n−k−1: Equation 6 cannot be applied, and r is summed to k−1.
Then, use arrays to store the coefficients of the polynomial Pn,k(Y), and we can write the program to calculate the coefficients:
## Returns a nested array of size m+1 times m+1,## with the j-th element of the k-th element being the coefficient of Y^j in P_{m,k}(Y).defcombo_pcm(1..m).each_with_object[[[1]]]do|n,dp|dp[n]=Array.new(n+1){Array.newn+1,0}# dp[n][0] = (1-Y)**n0.upto(n-1){dp[n][0][_1]=dp[n-1][0][_1]}# will be multiplied by 1-Y later1.upton/2-1do|k|# dp[n][k] = (1-Y) * (Y**k * (0..k).sum { |j| dp[n-k-1][j] } + (0..k-1).sum { |r| Y**r * dp[n-r-1][k] })0.upto(k){|j|0.upto(n-k-1){dp[n][k][_1+k]+=dp[n-k-1][j][_1]}}0.upto(k-1){|r|0.upto(n-r-1){dp[n][k][_1+r]+=dp[n-r-1][k][_1]}}endifn%2==1k=n/2# dp[n][k] = (1-Y) * (Y**k + (0..k-1).sum { |r| Y**r * dp[n-r-1][k] })dp[n][k][k]=10.upto(k-1){|r|0.upto(n-r-1){dp[n][k][_1+r]+=dp[n-r-1][k][_1]}}end((n+1)/2).upton-1do|k|# dp[n][k] = (1-Y) * (Y**k + (0..n-k-1).sum { |r| Y**r * dp[n-r-1][k] })dp[n][k][k]=10.upto(n-k-1){|r|0.upto(n-r-1){dp[n][k][_1+r]+=dp[n-r-1][k][_1]}}end0.upto(n-1){|k|n.downto(1){dp[n][k][_1]-=dp[n][k][_1-1]}}# multiply by 1-Y# dp[n][n] = Y**ndp[n][n][n]=1end.lastend
Here I list first few polynomials Pn,k(Y) calculated by the above program:
When evaluating the polynomials for large n, the result is inaccurate for Y that is not close to 0 because of the limited precision of floating numbers. If Y is closer to 1, we can first find the coefficients of Pn,k(1−X) and then substitute X:=1−Y.
Plots of the probability distributions
Here are some plots of the probability distribution of max combo k when n=50:
The plots are intuitive as they show that one has higher probability to get a higher max combo when they have a higher success rate.
There is a suspicious jump in Pn,k(Y) near k=n/2 when Y is close to 1. We can look at it closer:
In the zoomed-in plot, we can also see a jump in first derivative (w.r.t. k) of Pn,k(Y) near k=n/3. Actually, the jumps can be modeled in later sections when we talk about the case when n→∞.
The case when n→∞
A natural approach is to try substituting Equation 7 into Equation 1 to get a function w.r.t. the unknown function f(y,κ). First, we can easily write the case when y=0 because it means zero success rate, and the only possible max combo is zero:
f(y=0,κ)=δ(κ).
(8)
Similarly, we can easily write the case when y=1:
f(y=1,κ)=δ(κ−1).
(9)
From now on, we only consider the case when 0<y<1. First, for the case κ=0, according to Equation 7,
The ∞ means that there is a Dirac δ function. Actually, it is easy to see that there must be a yδ(κ−1) term in the expression of f(y,κ) because the probability of getting a max combo (κ=1) is y.
Define
h(y,κ):=f(y,κ)−yδ(κ−1),
(10)
and then we can get rid of the infinity here.
From now on, we only consider the case when 0<y<1 and 0<κ<1. According to Equation 7,
where Equation 13 is utilized when finding the first term. We can try to solve Equation 14 by using Adomian decomposition method (ADM). Suppose g1 can be written in a series
then we can equate each term in the two series. If the sum g1=∑i=0∞g1(i) converges, then this is a guess of the solution to Equation 14, which we can verify whether it is correct or not.
Using Equation 15, we can find first few terms in the series by directly integrating. The first few terms are
Sum up the terms, and we have g1(y,κ)=i=0∑∞g1(i)(y,κ)=q→∞limi=0∑q−lny(yκ−1+i!1(lnyκ−1)i−j=0∑i−1j!1(lnyκ−1)j)=−lny(explnyκ−1+q→∞lim((q+1)yκ−1−j=0∑qi=j+1∑qj!1(lnyκ−1)j))=−lny(yκ−1+q→∞lim((q+1)yκ−1−j=0∑qj!q−j(lnyκ−1)j))=−lny(yκ−1+q→∞lim(qyκ−1−qj=0∑qj!1(lnyκ−1)j)=−lnyj∑q+yκ−1+lnyκ−1explnyκ−1)=−lny(2+lnyκ−1)yκ−1.
Therefore, we have the final guess of solution
g1(y,κ)=−lny(2+lnyκ−1)yκ−1.
(16)
We can substitute it into Equation 14 to verify that it is indeed the solution.
The case κ∈(31,21)
In this case, we have
min(1−κκ,1)=1−κκ∈(21,1).
We can then use the same method as in the previous case to find the solution. First, by Equation 14,
Equation 17 can again be solved by ADM though the calculation is much more complicated than the previous case. We may guess g2=∑i=0∞g2(i) is the solution if the series converges, where
The first few terms go too long to be written here before one may find the pattern, so they are omitted here. If you want to see them, use a mathematical software to help you, and you should be able to find the pattern after calculating first six (or so) terms. After looking at first few terms, the guessed general term is
Equation 22 can determine Bq,0 for all q∈2…∞ once B2,0 is determined. The relationship between B1,0 and B2,0 cannot be described by Equation 22, but is given by
B2,0=1−B1,0.
(23)
Equate the coefficients in Line (**) with the corresponding ones on the LHS, and we have
Equate the coefficients in Line (***) with the corresponding ones on the LHS, and we have
As,l=s−1sBs−1,l+Bs,l,As,s=Bs,s.
By Equation 21, Bs,l=As,l−Bs,l+1 for l∈0…s, and As,s=Bs,s is always true. Therefore,
0=s−1sBs−1,l−Bs,l+1.
This equation is true for any s∈2..q and l∈0…s. Because q is arbitrary, we can change the variable s to q and the equation tells us exactly the same information. Therefore,
Bq,l=q−1qBq−1,l−1,q∈2…∞,l∈1..q.
(25)
Equation 22, 23, 24, and 25 are sufficient to determine Bq,l for all q∈1…∞ and l∈0..q up to one arbitrary parameter. Define the arbitrary parameter
Actually, one may find b=0 by simply comparing with the results in Equation 16, 18, or 19. Another way to find b is comparing with Eqution 13. Here I wil show the latter approach.
Now we have covered almost all cases. The only cases that we have not covered are the cases when κ=q1, where q∈2…∞. The discontinuity in g at κ=q1 is
Therefore, for q∈3…∞, the function g has defined limit at κ=q1, and the value of g here should just be the limit value. Now, the only problem is at κ=21. We should determine whether the value of g at κ=21 is its left limit or right limit.
Looking at Equation 12, one may see that the discontinuity at κ=21 is due to the Dirac δ function in the integrand. Therefore, whether g at κ=21 is g1 or g2 depends on whether the Dirac δ function is within the integrated interval. If it is, then g at κ=21 is g1; otherwise, it is
g2.
The inclusion of the Dirac δ function in the integrated interval corresponds to the inclusion of the highest term in the summation in Equation 7. Because both min(k,n−k−1) and min(k−1,n−k−1) equal n−k−1 when n=2k, the highest term in the summation can be reached, so the Dirac δ function is within the integrated interval. Therefore, g at κ=21 is g1.
Therefore, we may conclude that for any κ∈(0,1),
g(y,κ)=g⌈κ1⌉−1(y,κ).
(29)
Another edge case that is interesting to consider is when κ→0+. However, because the domain of g does not include κ=0 by definition, so we do not need to consider this case. By some mathematical analysis techniques, one may prove that the limit of g as κ→0+ is 0.
The solution
Substitute Equation 27 into Equation 29, and we have
Here are plots of the function f(y,κ) whose expression is given by Equation 30:
We can compare it with a plot of the distributions when n is finite (say, 100), and we may see that they are very close:
We have not investigated the asymptotic behavior of the error if we approximate the distribution with finite n by the distribution with infinite n, but we may expect that the error is small enough for applicational uses when n is a usual note count in a rhythm game chart (usually at least 500).
Moments
It may be interesting to calculate the moments of the distribution.
Now, the only problem is how to get Dν,p,l. Substitute Equation 26 into Equation 31, and after some calculations, we can get the general formula of Bs,l,p:
Then, the following steps will be extremely tedious, and I doubt there will be a closed form for our final result, so I will not continue to find the general formula for the moments.
However, we may obtain the first moment (mean) analytically. We have
which is intuitive. (This function tends to 0 very slowly when y→0+, so slowly that I almost did not believe that when I did the numerical calculation first.)
The plot:
We should also be able to find other statistical quantities like the median, the mode, the variance, etc., but they seem do not have closed forms.
Some interesting observations
The probability distribution of κ seems to tend to be a uniform distribution plus a Dirac δ distribution when y is very close to 1. This phenomenon is very visible if we look at the plot of f(y=0.9,κ).
In other words, the distribution seems like
f(y≈1,κ)≈(1−y)U(21,1)+yδ(κ−1),
where U(a,b) denotes the uniform distribution on the interval [a,b].
This can be justified by expanding f(y,κ) in Taylor series of 1−y and retaining the first-order terms only. Note that
ya(lny)b=(y−1)b(1+(2b−a)(1−y)+⋯),
so the only case where the Taylor series has a non-zero first-order term is when b=1 or b=0. In Equation 30, we can see that the power on lny is at least one for each term (because of the general lny factor in front), so only the terms with no lny factors but the general one will have a first-order term. In this case, the first order term is proportional to y−1, and the proportional coefficient is just the coefficient in the front of the term in f, which is independent of κ because κ only appears in the power index of y.
Therefore, we may see that only q=1 and q=2 terms have a non-zero first-order term, and they are respectivey −2(y−1) and 2(y−1). This means that when y is very close to 1,
f(y≈1,κ)≈{2(1−y),0,κ∈(21,1),κ∈(0,21).
This is exactly the uniform distribution.
There is an intuitive way to explain the appearance of the uniform distribution. When y is very close to 1, the probability of getting one combo break (1−Y) is already very small, so it is very unlikely that there are two or more combo breaks. Assuming there is only one combo break and it may appear anywhere with equal probability. The combo break will cut the string of notes into two pieces, and the length of the larger piece is the max combo, which is uniformly distributed between half note count and full note count.
Every rhythm game player knows: never celebrate too early. You never know whether you will miss near the end. It is then interesting to know what is the probability of getting almost a full combo, i.e. what is the probability of getting κ very close to 1.
If we find the limit of f(y,κ) as κ→1−, it is
f(y,κ→1−)=−2ylny.
There is a peak of this probability density at y=e−1. Therefore, when y=e−1, the probability of getting κ very close to 1 is the largest.
When does y=e−1, then? Because
y=Yn=(1−nnb)n,
where nb is the average number of combo breaks, then it tends to e−nb when n→∞. Therefore, the probability of getting almost a full combo is the highest when your average number of combo breaks is exactly one.
From the plot, it seems that the probability of getting κ a little bit higher than 21 is always higher than the probability of getting κ a little bit lower than 21. According to Equation 28, the jump in f(y,κ) at κ=21 is
f(y,κ→21+)−f(y,κ→21−)=−2ylny.
Interestingly, this coincides with f(y,κ→1−).
Define
y0(κ):=argmaxy∈[0,1]f(y,κ),
and then it seems that y0:[0,1]→[0,1] is injective but not surjective. It is strictly increasing, and there is a jump at κ=21 and at κ=1.
It has an elementary expression on [21,1):
y0(κ∈[21,1))=expκ(κ−1)−2κ+1+2κ2−2κ+1.
Some applications
In Phigros, one should combo at least 60% of the notes to get a white V () rank. If on average I have one combo break in a chart, which has 1300 notes, what is the probability of comboing at least 60% of the notes in the chart?
Solution. The success rate is
Y=13001300−1,y=Y1300≈e−1.
The probability of comboing more than 60 of the notes is