10 Comments
User's avatar
Ammar Ahmad's avatar

Love these posts where you share how stuff is calculated

Expand full comment
Vivek's avatar

Thanks for explaining your RaaR methodology. I had a few questions:

1. Did you consider additional features as well for your RaaR model? For example, does the innings number (first innings or second) have an impact on expected runs, holding all else constant?

2. Also, intuitively it feels like the runs remaining (target - current runs) could have an impact on expected runs, for second innings specifically. So, did you consider building a separate model for chases specifically, with runs remaining as an additional feature?

3. How do you typically score this model, when calculating RaaR for a batter's innings? Do you do it ball-by-ball? For example, for each ball a batter plays, you use features up to the previous ball and then predict expected runs off that ball? And then sum it up across all balls played by the batter?

4. I am curious what that cyclic pattern in the 2nd chart represents? Is it like the first ball of every over?

Expand full comment
cricketingview's avatar

Its estimated ball by ball as you describe.

I did consider several options. One of them being building separate models for chasing and setting. I wasn't convinced that it was helpful. Its tempting to keep adding features. This creates two potential problems. First, it increases the risk of double counting things. Second, it increases the likelihood that one will introduce arbitrary weights. Neither is desirable in a good model.

I had separate setting and chasing models in an earlier version of this. I don't think they provide any significant information. Its reasonable to assume that that both the setting and chasing teams are playing to win. The model is trying to evaluate how the actual outcome varies from the average expectation in a given match state.

So I don't really care what tactical choices teams make. I want to be able to describe the consequences of those tactical choices and the trade-offs they represent. So, if a team chasing 170 is trying to get to 70/0 in 10 overs and keep wickets in hand for the last 10, then at 10 overs, the model should show that they're scoring slower than the expectation.

The model, if you notice, also does not care about dismissals. If a batter is dismissed 4th ball then the RAAR is whatever it is over those 4 balls. If a batter remains not out facing 4 balls, then its whatever they are over those. It will generally be better since the dismissal typically means zero runs are scored. But there's no explicit penalty for being dismissed.

Expand full comment
Vivek's avatar

Thanks for responding!

One follow-up question re: the estimation methodology (ball-by-ball): wouldn't that result in some kind of weird effect where: if a batter has been scoring slowly for the past 10 balls, that dampens the overall team run rate as of that point. And since the team's overall run rate up to that point is a feature in the model (if I understood correctly), the model would estimate a lower xR value for the 11th ball for the batter. And this effect might compound over time, and potentially have the effect of rewarding batters who bat slowly with a higher RaaR than should be the case (especially for openers, where this might be most pronounced)?

Thoughts on this?

Expand full comment
cricketingview's avatar

Yes. Though if the batter has been scoring slowly for 10 balls the individual score would already be behind xR for those 10 balls. Further, other factors, like balls remaining and wickets in hand act in the other direction. Its not generally necessary that the events on any one ball move every one of the four factors in the same direction.

Expand full comment
Vivek's avatar

Makes sense! Thanks for the discussion!

Expand full comment
Prahalad's avatar

Loved the very simplistic approach to this problem. curious as to how it would stack up next to smart runs/batting impact metrics.

One potential issue I see is that it treats all matches equally regardless of situation i.e. pitch conditions, chasing target e.t.c.?

Another question, how well does it it value different roles at similar points of a match? Let's take Kohli and Salt for example, for the brief time that salt is present would his RAAR be higher than kohli's even if kohli plays out the whole innings slowly in account of regular wickets falling at the other end

Expand full comment
cricketingview's avatar

If Kohli's using up balls at the cost of runs, then RAAR will reflect that. If Salt is scoring quickly, the RAAR will reflect that. The roles don't really exist. We imagine they exist. They're tactical choices some of the time. Limitations of ability most of the time. The point of the model is not to be fair to a particular player. Its to describe the competition.

The pitch conditions are reflected in the scoring rate. The chasing target is reflected in the scoring rate too.

Expand full comment
Shubham S's avatar

Do you have data of shot types as well? Is it possible to answer questions like how often should a batter try to hit the ball for boundary to maximize the raar. I remember your analysis from tests that included shot choice.

Expand full comment
cricketingview's avatar

This should be possible. Access to shot type records is intermittent though...

Expand full comment