statistics

A Better Bayes’ Rule

(Let me say upfront that this post has a little bit of math in it. For those of you who are not mathematically inclined, stay with this: it’s is a very useful trick that you can do in 10 seconds and will help you make better decisions. I’ll explain everything and keep it simple.)

Let’s say I’m considering investing in an early-stage startup (Company X) and I want to assess the probability that it will succeed. One the one hand I know that most early startups fail, so investing in them is always risky. On the other hand, this particular company company seems to have a lot going for it, so the evidence is compelling. How should I weigh these two things?

Answer: Bayes Rule. As a refresher, Bayes’ Rule allows you to answer two related questions: (a) what is the probability of ‘x’ being true given some evidence; and (b) if I had a prior belief about the probability of ‘x’ being true, how should I update that believe given new evidence. It should be pretty obvious why/how this could be helpful.

When I first learned about Bayes’ Rule in college, it intuitively struck me as both extremely important and useful. Over the years I’ve revisited it occasionally in an attempt to really drill it into my brain and hopefully get to a point where I would just naturally use it. But it never quite happened as the mental math involved was just a bit complex for me (I am not good at mental math).

Then one day a couple of years ago I came across the odds form of Bayes’ Rule. And it simplified everything. A lot. I won’t go through a detailed explanation on how it works or how it’s derived (if you want that, see here) but let me just show how I practically can use it now and how easy it is.

Play the Odds

First, a quick refresher on odds for those who need it. Let’s say I think there’s a 10% change that my favorite team is going to win the game (and therefore a 90% chance that they wont). The odds are 10:90 or 1:9. In other words, odds are just p(x will happen)/p(x won’t happen). To convert from odds of a:b back to a percentage, just calculate a/(a+b). In this example, it’s 1/(1+9) = 1/10 = 10%.

OK, with that done, let’s move on to Bayes’ Rule. Saying with the startup example, the odds form of Bayes’ Rule says:

An Example

Let’s go through the example of trying to determine the probability that the startup will be a success:

Start with the ‘prior odds’ or the ‘base rate’: p(success)/p(failure) (i.e. the rightmost term in the above equation). From previous reading say I know that the probability for a seed stage startup being successful is 10%, so the odds are 10% : 90% = 1:9.

Next, assess the likelihood ratio, p(E|S) : p(E|F) (i.e. the middle term above). Let’s say this startup has a strong team, a compelling idea in a large and growing market, seems to have a unique take on the space, and is moving quickly.

p(E|S)
To evaluate the numerator, p(E|S), I ask myself “assume a random startup company ends up being a success. What is the probability that this company had all of the things in place that Company X has (at the same phase in their lifecycles)?” The answer is probably ” almost all”. So let’s say 95%.

p(E|F)
For the denominator of the likelihood ratio, it’s the almost the same question: “assume a random startup company ends up being a failure. What is the probability that such company had all of the things in place that Company X has? (again, at the same place in their lifecycles)” Actually, the answer is still probably “most” – even great teams fail regularly etc. – so let’s say it’s 70%.

So now we have the likelihood ratio p(E|S) / p(E|F) = 95% / 70%, or ~ 1.35 : 1

So now we just multiply:

So the odds that the company will be a success are 1.35:9. To convert that to a percentage, we calculate numerator / (numerator + denominator), so we have 1.35/(10.35) = ~13%.

This means that – given my rough assumptions – I should expect that this company has a 13% chance of succeeding.

Why so low? To get a better understanding you can read the full articles, but I think of it this way: the base rate of success is very low. You have some evidence and you have to evaluate how much more likely that evidence is to show up for successful companies than for unsuccessful ones. In this example we’ve estimated that while almost all successful companies will demonstrate that evidence….so will most unsuccessful ones. For that reason, the evidence doesn’t sway us much away from our base rate.

The good thing about this odds form calculation is that I can do it very quick on a napkin, excel, or (sometimes) even in my head. After I tried it once I found it easy to use. And now I use it all the time.

Which Way To Miss?

Policy, as in many areas of life, is about tradeoffs. To take a simple example, consider arguments that some conservatives and progressives might make regarding welfare. I’ve heard friends that lean progressive say things like “How can we let someone who is really trying and down on their luck go hungry? We need to increase the availability of SNAP [food stamps].” On the other side, I’ve heard friends that lean conservative say some version of “I’ve seen people who get food stamps just waste them on things like cookies, cake, soda and chips – we need to reduce their use.” Who is right?

The answer, of course, is that they both are. People come in all shapes and sizes. They also vary in their behaviors, values, and ethics. This is what makes policy so difficult: you have one policy, but how people behave in response to that policy can vary widely.

There are various ways to deal with this. One is to refine the policy. For example, current SNAP policy does not allow folks to buy alcoholic beverages with those funds. This can work well when there is fairly broad agreement that such a refinement makes sense. But this can easily end up getting very complicated as you attempt to refine further and further until you end up with a complex mess that is difficult for the consumer to understand and for the regulator to enforce, and where the interaction effects between the various rules can cause unintended outcomes. (Tax policy, anyone?)

Error Types

Beyond some basic “common sense” refinements, however, a better approach is simply to acknowledge that any policy is going to have some “error” in it, and to ask which type of error is more acceptable, and how much? This is basically the same thing as thinking about Type I and Type II model errors in hypothesis testing.

Using the example above, would you rather someone get food stamps that didn’t really need them or not give someone food stamps that really did need them? To be clear, not everyone may agree on the answer to this question, but at least we’re now starting to have a real conversation.

Let’s say you think that it’s better to err on the side of being generous, even if it means some abuse of your generosity will happen. What ratio are you wiling to accept? For example, if there’s one abuse for every 10,000 people you truly help, that seems reasonable. What if it’s 5 people helped for every 1 abuse? 1 to 1? What if it’s 5 abuses for every 1 person truly helped? What if it’s 10,000?

Standards of Proof

Some parts of our legal system are already explicitly like this (or at least try to be). In the justice system there are known various ‘standards of proof’ that are required depending upon what’s going on. For example, a police officer is required to have a ‘reasonable suspicion’ before stopping and questioning an individual. A ‘probable cause’ is required to issue a search warrant or arrest someone. A ‘preponderance of evidence’ or ‘clear and convincing evidence’ is required in civil court (and sometimes in criminal). And ‘proof beyond a reasonable doubt’ is the standard required for a criminal charge.

Source: DefenseWiki

By placing such a high bar for evidence, we as a society have made the choice that we would rather let a guilty party go free than convict an innocent one. According to Wikipedia, it is estimated that between 2.3 and 5 percent of all U.S. prisoners are innocent. Is that an acceptable error rate? That’s an open question, but at least its a tractable one.

I’m not saying that the details of individual policies don’t matter. Clearly they do. And of course there are other real considerations, such as cost. But when there is disagreement it may help to start the conversation by asking “which type of error are we more willing to make?” and “by how much”?