Crunching PB data - step 1



  • Pretty much end of the season and thought I'd share some PB analysis I have worked up. This is based on 2019/20 season up to and including CL and EL quarter finals.

    You will see various approaches to picking out key PB stats. What I wanted to do was ignore the wins and look at probabilities of PB wins, removing the variance from the equation.

    This first cut doesn't do any weighting around matchday types but want to do that next as that would add more weight to some players than others. Those at top CL clubs for example should have more gold days.

    First up I looked at the PB winning scores by position and the quartiles for what scores achieved wins:

    Screenshot_20200817-235720__01.jpg

    Now, in these quartiles I looked at the number of PB wins as a % of the number of PB scores posted within that range. I think this gives some sort of rough win probability:

    Screenshot_20200817-235727__01.jpg

    Finally, I looked at how many times each individual player posted scores within each range and weighted the number of wins by the win% in the chart above. I think this gives a rough dividend win potential, and ignores the variance of where some players get unlucky. This (PBIX) gives a probablised number of dividend wins per season:

    Screenshot_20200817-235733__01.jpg

    Limitations are obviously the fact this is historic data, not predictive, and that it doesn't deal with positional changes (e.g. this uses Kimmich data for while he was a defender, with cleans sheets etc). Also strong PB players like Depay miss out due to lack of games. Data is dripping with covid too, which hopefully won't be the case next season.

    Anyways, will run a matchday weighted version of this too, so feel free to pick apart the logic here in the meantime.



  • This is great - cheers @TotalPunt. Data/Statistical Analysis like this is so useful if it can be understood so appreciate the information and the explanation.


Log in to reply