I’ve been refreshing my stats knowledge (or really learning it myself for the first time) since I’m writing bayesian and fisher classifiers and I want to really understand what’s going on under the hood.

Khan academy has a good primer on random variables and probability and after doing his exercises I know how to calculate the expected outcome of sampling (and summing) the returns on a random variable over a huge number of samples. Using this knowledge I can calculate what I’d expect to win by playing the lottery a LOT of times.

A couple of things that I don’t want to forget –

In mathematical notation random variables are denoted with CAPITAL letters eg:

Expected value is written as E(X) and is a sort of average of outcomes multiplied by their probabilities. It says, “If you sampled the problem space a billion times and summed up the results then you’d be left with THIS value”

Ok, first… some setup –

X – outcome of playing the lottery – in this case it can be either win or lose

So let’s be formal:

X – 1 = win

X – 0 = lose

The numbers and info we need are–

Now there are probabilities associated with playing the lottery. They tell you the odds of winning. Useful!

I found a california lottery game. Super Lotto Plus – Basically if I win I’ll earn… 30 million$ before taxes. Of course there’s tax too so that’ll cut the total winnings down quite a bit. They actually have a guaranteed cash estimate… probably closer to what you’d actually take home if you won. That value is $21,300,000 so that’s probably a better number to use.

According the Super Lotto Plus’ FAQ the odds of winning are roughly 1 in 23 — the odds of winning a jackpot are 1 in 41,416,353

The cost of a ticket is 1$

There are a lot of different tiers of “winning” the lottery — most of them have pretty low payouts.

I calculated the % of winners who won less than 100$ in the last drawing to be — 99.71% That means that the percent of winners who won more than 100$ was .29%

So I think that means that the odds of winning more than 100$ are roughly 29 in 230000.

Back to our simplified case

We’re going to win the jackpot right? So let’s just call everything else a loss (which isn’t really realistic) of 1$. So a win is the after tax jackpot payout.

So an expected value calculation just is basically saying that playing the lottery over the long long term our wins and losses would average out to this value.

E(X) = p(win)*winnings + p(loss)*cost of loss

E(X) = 1/41,416,353 * (21,300,000 – 1) + ((1 – (1/41,416,353)) * -1)

E(X) = -.48571

What does it mean?

We should really expect to lose money playing the lottery. Makes sense. Otherwise the Wall Street people would figure out how to borrow a billion dollars to exploit the lottery opportunity. Of course those odds improve some when you take into account the other winning tiers I haven’t calculated that expected value but I’m betting it’s going to be negative as well.

Cool. Expected result. I may have made mistakes so if you see something that doesn’t make sense let me know.

So keep in mind that this doesn’t mean that we couldn’t win big on a random draw. Random variables are… random. Occasionally randomness will cause us to hit the jackpot really early in our sampling or hit the jackpot multiple times in a row.

What this IS suggesting is that if we took more and more and more and more samples we would expect the average of all the samples to approach that expected value. And if I’m using all these vague words like expect and suggest it’s because YOU NEVER KNOW and when you have only a few samples or you’re looking for an expected outcome from a particular random sample then you should expect the outcome to be RANDOM. So when you play Catan put expect 6 and 8 to roll a lot but keep in mind, they may not roll a lot or they might roll later in the game. That’s randomness.