decision making

What Causal Inference Can Tell Us About Hiring

One area I’ve gotten interested in lately is causal inference. For those of you not familiar, it’s a methodology that attempts to find and validate cause-effect relationships between variables. The key is that it attempts to do so using data without having to rely on controlled experiments. (For an introduction for the casual reader, I highly recommend Judea Pearl’s book The Book of Why.)

One concept I found interesting was the implications of something called a collider. A collider is a variable that is the effect of two or more variables. As a simple example, consider the following:

The way to read this diagram is that fame is a function (or effect) of money, talent, and looks. In other words, fame = f(money, talent, looks). In this example, fame is a collider relative to money, looks, and talent because they all have arrows pointing into fame.

The interesting implication from the book is the following: given that you hold the level of a collider constant, the other variables become dependent upon each other even though there is no causal influence on them.

To understand this better, let’s use an even simpler example: X + Y = Z. In this case, Z is a function of X and Y (i.e. Z = f(X,Y)), so Z is a collider with respect to X and Y:

Here’s the key point: if we fix the value of Z at some specific value (say 10, so we’re left with the relationship X + Y = 10), then X and Y become correlated. In other words, if I know the value of X (say 8), then I can infer Y (i.e. 2).

The interesting finding from causal inference is that this dynamic generalizes. Said another way, for a given level of Z, information about X automatically gives me some information about Y, even if I can’t observe Y directly.

Almost Famous

Why is this interesting? Let’s go back to our fame example. Assuming our causal model is valid, then we can say that for a given level level of fame, if we know something about their level of wealth, we can infer something about their level of looks and talent. If we simplified it down for a minute to say just include looks and talent, then we could say – for a given level of fame – we’d expect that a person who is more attractive is likely to be less talented. (Another way to think about this is if they were both attractive and talented, they’d be even more famous).

I haven’t done an analysis to verify this yet, but it’d be interesting to run an experiment. For example, look on social media for actors who have a similar level of followers (as a proxy for fame). Within that cohort, if the model is valid then you would see a spectrum ranging from the good-looking-but-hacky to the talented-but-ugly.

Counterintuitive Hiring

This finding has interesting implications in many places. Take hiring, for example. Consider, for example, a hypothesis that seniority_level = f(skill, likability). If you think both skill and likability are positively correlated to seniority level, then – for a given level of seniority – consider that the most skilled person is likely to be the one you personally like the least.

These are of course toy examples; the causal structure of real life is likely to be much more complex. But they illustrate both the power of causal analysis and the sometimes counterintuitive truths behind the way the world works.

Questions to Make Decisions by (Part I)

Most decisions can be ranked along two dimensions: importance and difficulty to reverse. For easy-to-reverse decisions, make them quickly or delegate. Same for difficult-to-reverse but unimportant ones. But for important and difficult to reverse decisions, you need to think carefully. Here are some questions I ask myself when making these kinds of decisions:

Outcome Clarity

  1. What is the ideal outcome?
  2. What outcome(s) do I definitely want to avoid? How can I do that?

Scenario Planning

  1. If this is going to fail (or end in a bad outcome), what are the most likely reasons why?
  2. For each reason, are there likely to be any ‘early warning’ signs?
  3. Are there any actions I can take that can mitigate those risks?
  4. If the risk does manifest, what’s my “Plan B”?
  5. If this ends up going extremely well, what are the most likely reasons why?
  6. Are there likely to be any ‘early momentum’ signs?
  7. Say things go really well. Then what? What risks / problems can success cause?
  8. What steps could I take that might mitigate those?
  9. How is this likely to affect other stakeholders? What incentives / reasons might they have to support or oppose?

Hypothesis Testing

  1. Why do I think that ideal outcome isn’t possible (if I think it isn’t)?
  2. For each reason listed in #2, state the opposite. How can I make that true?
  3. I currently believe X. What evidence would change my mind?
  4. Is there a way for me to obtain that evidence?

External Inputs

  1. Who can I talk to/ask that has some expertise in this area (where expertise ideally = has done something very similar with success at least three times)?
  2. Who is most likely to disagree with my decision? What is their reasoning? Are they wrong?
  3. Is there historical evidence or statistics that can help inform how similar decisions have turned out in the past? Why?
  4. Is there a relevant comparison class? What is the base rate?
  5. Who are three people that I admire in this domain (dead or alive)? If I can’t talk to them, what do I imagine they would tell me?

Prioritizing

  1. When I’m old, what path/scenario am I most likely to regret?
  2. What are the three most important criteria by which to evaluate this decision?
  3. Logic aside, how do I feel about each option. Why?
  4. What are the three pieces of information that would be the most helpful to know?

Meta Questions

  1. Take a step back and think creatively. Is there another option I’m not considering?
  2. This there a way to ‘test’ one or more options in a low-risk/cost way that gets me more information?
  3. Is there a decision that provides me more or less optionality in the future?
  4. What’s the opportunity cost?
  5. What my real problem here?

Timing

  1. Do I need to make this decision now?
  2. Am I likely to benefit from delaying making this decision?
  3. What is the risk of delaying this decision?
  4. Once I’ve made up my mind, is there any reason I can’t wait 24 hours before acting on it (just in case?)

A Better Bayes’ Rule

(Let me say upfront that this post has a little bit of math in it. For those of you who are not mathematically inclined, stay with this: it’s is a very useful trick that you can do in 10 seconds and will help you make better decisions. I’ll explain everything and keep it simple.)

Let’s say I’m considering investing in an early-stage startup (Company X) and I want to assess the probability that it will succeed. One the one hand I know that most early startups fail, so investing in them is always risky. On the other hand, this particular company company seems to have a lot going for it, so the evidence is compelling. How should I weigh these two things?

Answer: Bayes Rule. As a refresher, Bayes’ Rule allows you to answer two related questions: (a) what is the probability of ‘x’ being true given some evidence; and (b) if I had a prior belief about the probability of ‘x’ being true, how should I update that believe given new evidence. It should be pretty obvious why/how this could be helpful.

When I first learned about Bayes’ Rule in college, it intuitively struck me as both extremely important and useful. Over the years I’ve revisited it occasionally in an attempt to really drill it into my brain and hopefully get to a point where I would just naturally use it. But it never quite happened as the mental math involved was just a bit complex for me (I am not good at mental math).

Then one day a couple of years ago I came across the odds form of Bayes’ Rule. And it simplified everything. A lot. I won’t go through a detailed explanation on how it works or how it’s derived (if you want that, see here) but let me just show how I practically can use it now and how easy it is.

Play the Odds

First, a quick refresher on odds for those who need it. Let’s say I think there’s a 10% change that my favorite team is going to win the game (and therefore a 90% chance that they wont). The odds are 10:90 or 1:9. In other words, odds are just p(x will happen)/p(x won’t happen). To convert from odds of a:b back to a percentage, just calculate a/(a+b). In this example, it’s 1/(1+9) = 1/10 = 10%.

OK, with that done, let’s move on to Bayes’ Rule. Saying with the startup example, the odds form of Bayes’ Rule says:

An Example

Let’s go through the example of trying to determine the probability that the startup will be a success:

Start with the ‘prior odds’ or the ‘base rate’: p(success)/p(failure) (i.e. the rightmost term in the above equation). From previous reading say I know that the probability for a seed stage startup being successful is 10%, so the odds are 10% : 90% = 1:9.

Next, assess the likelihood ratio, p(E|S) : p(E|F) (i.e. the middle term above). Let’s say this startup has a strong team, a compelling idea in a large and growing market, seems to have a unique take on the space, and is moving quickly.

p(E|S)
To evaluate the numerator, p(E|S), I ask myself “assume a random startup company ends up being a success. What is the probability that this company had all of the things in place that Company X has (at the same phase in their lifecycles)?” The answer is probably ” almost all”. So let’s say 95%.

p(E|F)
For the denominator of the likelihood ratio, it’s the almost the same question: “assume a random startup company ends up being a failure. What is the probability that such company had all of the things in place that Company X has? (again, at the same place in their lifecycles)” Actually, the answer is still probably “most” – even great teams fail regularly etc. – so let’s say it’s 70%.

So now we have the likelihood ratio p(E|S) / p(E|F) = 95% / 70%, or ~ 1.35 : 1

So now we just multiply:

So the odds that the company will be a success are 1.35:9. To convert that to a percentage, we calculate numerator / (numerator + denominator), so we have 1.35/(10.35) = ~13%.

This means that – given my rough assumptions – I should expect that this company has a 13% chance of succeeding.

Why so low? To get a better understanding you can read the full articles, but I think of it this way: the base rate of success is very low. You have some evidence and you have to evaluate how much more likely that evidence is to show up for successful companies than for unsuccessful ones. In this example we’ve estimated that while almost all successful companies will demonstrate that evidence….so will most unsuccessful ones. For that reason, the evidence doesn’t sway us much away from our base rate.

The good thing about this odds form calculation is that I can do it very quick on a napkin, excel, or (sometimes) even in my head. After I tried it once I found it easy to use. And now I use it all the time.

Which Way To Miss?

Policy, as in many areas of life, is about tradeoffs. To take a simple example, consider arguments that some conservatives and progressives might make regarding welfare. I’ve heard friends that lean progressive say things like “How can we let someone who is really trying and down on their luck go hungry? We need to increase the availability of SNAP [food stamps].” On the other side, I’ve heard friends that lean conservative say some version of “I’ve seen people who get food stamps just waste them on things like cookies, cake, soda and chips – we need to reduce their use.” Who is right?

The answer, of course, is that they both are. People come in all shapes and sizes. They also vary in their behaviors, values, and ethics. This is what makes policy so difficult: you have one policy, but how people behave in response to that policy can vary widely.

There are various ways to deal with this. One is to refine the policy. For example, current SNAP policy does not allow folks to buy alcoholic beverages with those funds. This can work well when there is fairly broad agreement that such a refinement makes sense. But this can easily end up getting very complicated as you attempt to refine further and further until you end up with a complex mess that is difficult for the consumer to understand and for the regulator to enforce, and where the interaction effects between the various rules can cause unintended outcomes. (Tax policy, anyone?)

Error Types

Beyond some basic “common sense” refinements, however, a better approach is simply to acknowledge that any policy is going to have some “error” in it, and to ask which type of error is more acceptable, and how much? This is basically the same thing as thinking about Type I and Type II model errors in hypothesis testing.

Using the example above, would you rather someone get food stamps that didn’t really need them or not give someone food stamps that really did need them? To be clear, not everyone may agree on the answer to this question, but at least we’re now starting to have a real conversation.

Let’s say you think that it’s better to err on the side of being generous, even if it means some abuse of your generosity will happen. What ratio are you wiling to accept? For example, if there’s one abuse for every 10,000 people you truly help, that seems reasonable. What if it’s 5 people helped for every 1 abuse? 1 to 1? What if it’s 5 abuses for every 1 person truly helped? What if it’s 10,000?

Standards of Proof

Some parts of our legal system are already explicitly like this (or at least try to be). In the justice system there are known various ‘standards of proof’ that are required depending upon what’s going on. For example, a police officer is required to have a ‘reasonable suspicion’ before stopping and questioning an individual. A ‘probable cause’ is required to issue a search warrant or arrest someone. A ‘preponderance of evidence’ or ‘clear and convincing evidence’ is required in civil court (and sometimes in criminal). And ‘proof beyond a reasonable doubt’ is the standard required for a criminal charge.

Source: DefenseWiki

By placing such a high bar for evidence, we as a society have made the choice that we would rather let a guilty party go free than convict an innocent one. According to Wikipedia, it is estimated that between 2.3 and 5 percent of all U.S. prisoners are innocent. Is that an acceptable error rate? That’s an open question, but at least its a tractable one.

I’m not saying that the details of individual policies don’t matter. Clearly they do. And of course there are other real considerations, such as cost. But when there is disagreement it may help to start the conversation by asking “which type of error are we more willing to make?” and “by how much”?

Obama on Decision Making

What I’m thinking about:

“One thing I learned as president was that the decisions I had to make were so weighty and consequential, the pace so unyielding, that it was easy to feel almost removed from myself. But the time I spent away from my desk, especially with my wife and kids — whether coaching Sasha’s basketball team or date night with Michelle — was a crucial, daily reminder of who I fundamentally was as a person. This was so important, because we bring our whole selves to the decisions we make. And those decisions, in turn, both reflect and determine who we are.”

Barack Obama, The Promised Land

The Value of Options

When making a decision, we often tend to immediately gravitate to a specific option and then evaluate that decision in a vacuum. Instead, I find its helpful to make sure to force myself to layout at least 2-3 options and weigh their pros and cons. This is not so different than the A/B Hypothesis Method I try to use. The difference is that whereas that process primarily takes place quickly in my head, this process is meant to be used for larger decisions; often involving a group.

Attention as the Constraint

Every system only has one to a few constraints that keep it from performing better. Sometimes this is due to an outdated or incorrect policy. Sometimes it’s due to a limited resource. Identifying and addressing the constraint in a system is the only way to meaningfully improve its performance.

Several years ago I had the (not unique, I’m sure) realization that human attention is often the constraint in many systems. Evidence for this being true is the fact that many of the most valuable businesses in the world today – Facebook, Google, Netflix, just to name a few – are valuable precisely because they’ve become highly effective in capturing, directing, and monetizing people’s attention.

Within an organization, often times the constraint is management attention: from the CEO, the executive team, the board, etc. How the CEO in particular manages her attention is one of the biggest – though far from the only – determinants of success.

I read somewhere recently that “the biggest barrier to scale at a startup isn’t capital, it’s the time & attention to design and run experiments testing the core tenets of the business.” I don’t know if I’d necessarily word it exactly like that, but I think it’s close.

Clear Thinking

Naval Ravikant has a saying: “Clear thinker” is a better compliment than “smart“. He doesn’t define exactly what he means by this, but if you’ve ever observed smart people acting stupidly (or done so yourself), then it’s clear they are not exactly the same.

To me, clear thinking seems to imply two things:

  1. An ability to think logically and rationally.
  2. An ability to be able to articulate or explain your reasoning.

Professors Keith Stanovich and Richard West have also proposed that there are actually two types of rationality: epistemic rationality (the ability to see reality accurately) and instrumental rationality (the ability to pursue one’s goals in a rational manner). Importantly, they also posit that someone’s “RQ” (rationality quotient) is only loosely correlated with someone’s IQ.

It has been my experience that many people – and particularly folks who have been in a field for a long time – have very powerful intuitions. However, their ability to articulate their intuitions and the rational basis behind them is often much weaker.

This is one of the reasons why I think the practice of writing can be helpful, as it forces you to slow down and verbalize your intuitions.

The A/B Test for Decision Making

Confirmation bias – i.e. overweighting evidence that supports your view and underweighting evidence that doesn’t – is a well-established cognitive bias. Indeed, it often seems to be the case that when people are presented with evidence against their beliefs, they simply retrench further.

Several years ago, I came up with a trick that helps me avoid this.

The method is simple. In my mind, I make two columns, each of which represents one ‘hypothesis’.

For example, a few years ago my wife made the claim that I was a “bad driver”. I of course immediately became defensive and thought of all the evidence I could that supported my being a good driver: I had never been in an accident; had only received two speeding tickets since I started driving; etc.

Using this technique, I picture something like this in my mind:

For each piece of evidence, I then ask which hypothesis that evidence supports better and put it in that column. For example, let’s say we had the following pieces of evidence:

  • In my driving career:
    • I had gotten in one accident
    • I had gotten two speeding tickets
    • I often parked terribly
    • I had been in many near accidents
    • I seemed to drive much worse when other people were in the car
    • When asked, other people rated me as ‘below average’.

Using my method, in my mind I saw something like this:

Based upon this, I had to concede that the balance of the evidence suggested that, while I may not be a terrible driver, I certainly wasn’t as good a driver as I had thought.

I think the reason this framework works (at least for me) is that it forces me to start by treating the probability of each hypothesis being right as equal and then to consider all the evidence and how each piece supports the hypotheses.

I use this general construct all the time. Is a person being malicious or just lazy? Does God exist or not? Is a given policy likely to be helpful or hurtful?

Now if I were doing this more rigorously I suppose I would instead ask whether each peice of evidence refutes or falsifies each hypothesis. I’ll work on that. In the meantime, I’ve found this practice to be easy to do and effective in helping me think more clearly.