Hong Kong Super Star

July 27th, 2014 by Walt

The name of this post came from a great song featuring Daniel Wu called HK Superstar.

Guess what? Bruce lee was an American Citizen born in San Francisco. Who knew? Well not me but now I do.

Yes that’s right! He was born at Jackson Street Hospital in SF.









Of course who wants to live in the city when you can live in the EAST BAY! Yeah! Bruce opened some martial arts schools to teach his techniques. Check it out –>

4175 Broadway in Oakland. Now it’s a Toyota dealership (where I got a chance to test drive the Scion FRS)

Screen shot 2014-07-27 at 2.11.24 PM






3039 Monticello Avenue. It’s just some guy’s house but it had a garage where the martial artists could train.

Screen shot 2014-07-27 at 2.13.48 PM







And finally… some shopping center in Hayward — 26663 Mission Blvd., Hayward

Screen shot 2014-07-27 at 2.14.37 PM






Degrees of freedom in statistics

March 1st, 2013 by Walt


Degrees of freedom is a term which keeps coming up as I’m trying to understand probability distributions.

Degrees of freedom seem to have to do with calculations involving random numbers where say the mean is known but the numbers which generate the mean aren’t.

You could for example have 4 numbers whose mean is 50. Since their mean is 50 that means that their sum must have been 200.

Now there are obviously a lot of ways to get a sum of 200 using 4 numbers. Maybe it’s –

1 + 1 + 1 + 197


10 + 90 + 10 + 90

The important thing to realize is that you can CHOOSE the first 3 numbers as anything but once you’ve chosen 3 numbers that 4th number is out of your control. Its value is going to be whatever it needs to be in order to satisfy the condition of having the sum of those numbers add up to 200.

So in the case above there are 3 degrees of freedom.

I’m honestly not 100% sure why this is important and why all distributions talk about having k degrees of freedom but I’m going to try to find out.


npk – n permutations for k spots

February 27th, 2013 by Walt


How many ways can 300 people be seated into 20 seats?


Well to calculate this you’d have to do the following -


Select 1 person from the 300 — you could have picked any of those 300 people so you’re going to have 300 different starting options so….

300 different permutations of people sitting in 1 seat

Now for the second seat SOMEBODY is already sitting in the first seat (1 of your 300 in each case you’re looking at) so you only have 299 people you can work with for the second seat.

300 in seat 1 * 299 in seat 2

Rinse and repeat with 3 seats… 4 seats… 5 seats…

300 * 299 * 298 * 297 * 296 * 295 * 294 * 293 * 292 * 291 * 290 * 289 * 288 * 287 * 286 * 285 * 284 * 283 * 282 * 281

Um…. that’s a really big number. I tried this calculation when thinking about filling a concert hall with 300 seats with 300 people and couldn’t perform the calculation because the numbers were too big.

This formula is used in computing probabilities since it can calculate the number of combinations that fit a category. Usually you’d then divide by the total number of possibiliteis in the problem space so the size would come back down to earth.

There must be some tricks to doing this. Otherwise it would make probability calculations pretty impossible. Gotta look into this further.

Expected value of winning the lottery in the USA

February 26th, 2013 by Walt

I’ve been refreshing my stats knowledge (or really learning it myself for the first time) since I’m writing bayesian and fisher classifiers and I want to really understand what’s going on under the hood.

Khan academy has a good primer on random variables and probability and after doing his exercises I know how to calculate the expected outcome of sampling (and summing) the returns on a random variable over a huge number of samples. Using this knowledge I can calculate what I’d expect to win by playing the lottery a LOT of times.


A couple of things that I don’t want to forget –

In mathematical notation random variables are denoted with CAPITAL letters eg:

Expected value is written as E(X) and is a sort of average of outcomes multiplied by their probabilities. It says, “If you sampled the problem space a billion times and summed up the results then you’d be left with THIS value”


Ok, first… some setup –

X – outcome of playing the lottery – in this case it can be either win or lose

So let’s be formal:

X – 1 = win

X – 0 = lose


The numbers and info we need are–

Now there are probabilities associated with playing the lottery. They tell you the odds of winning. Useful!

I found a california lottery game. Super Lotto Plus – Basically if I win I’ll earn… 30 million$ before taxes. Of course there’s tax too so that’ll cut the total winnings down quite a bit. They actually have a guaranteed cash estimate… probably closer to what you’d actually take home if you won. That value is $21,300,000 so that’s probably a better number to use.

According the Super Lotto Plus’ FAQ the odds of winning are roughly 1 in 23 — the odds of winning a jackpot are 1 in 41,416,353

The cost of a ticket is 1$

There are a lot of different tiers of “winning” the lottery — most of them have pretty low payouts.

I calculated the % of winners who won less than 100$ in the last drawing to be — 99.71% That means that the percent of winners who won more than 100$ was .29%

So I think that means that the odds of winning more than 100$ are roughly 29 in 230000.


Back to our simplified case

We’re going to win the jackpot right? So let’s just call everything else a loss (which isn’t really realistic) of 1$. So a win is the after tax jackpot payout.

So an expected value calculation just is basically saying that playing the lottery over the long long term our wins and losses would average out to this value.

E(X) = p(win)*winnings + p(loss)*cost of loss

E(X) = 1/41,416,353 * (21,300,000 – 1) + ((1 – (1/41,416,353)) * -1)

E(X) = -.48571


What does it mean?

We should really expect to lose money playing the lottery. Makes sense. Otherwise the Wall Street people would figure out how to borrow a billion dollars to exploit the lottery opportunity. Of course those odds improve some when you take into account the other winning tiers I haven’t calculated that expected value but I’m betting it’s going to be negative as well.

Cool. Expected result. I may have made mistakes so if you see something that doesn’t make sense let me know.

So keep in mind that this doesn’t mean that we couldn’t win big on a random draw. Random variables are… random. Occasionally randomness will cause us to hit the jackpot really early in our sampling or hit the jackpot multiple times in a row.

What this IS suggesting is that if we took more and more and more and more samples we would expect the average of all the samples to approach that expected value. And if I’m using all these vague words like expect and suggest it’s because YOU NEVER KNOW and when you have only a few samples or you’re looking for an expected outcome from a particular random sample then you should expect the outcome to be RANDOM. So when you play Catan put expect 6 and 8 to roll a lot but keep in mind, they may not roll a lot or they might roll later in the game. That’s randomness.


More correlation madness

February 8th, 2013 by Walt


I’m having a blast playing with the Pearson’s correlation calculation. This time I’ve written an app that pulls content off of Reddit — the top stories that is. The app then downloads those top stories, does a quick word count on them and stores the results of those word counts to MongoDB (which is really super simple and handy in this case). I think I need to do some more tuning… probably need to wipe out the most and least used words from comparisons… but here is a preliminary correlation sample.


  • http://leapgamer.com/blog/14/browsing_reddit_with_the_leap_motion_and_greasemonkey 0.102800100988
  • http://www.infoq.com/news/2013/02/MongoDB-Fault-Tolerance-Broken 0.472552880944
  • http://lwn.net/Articles/534735/ 0.425154672208
  • http://programminggroundup.blogspot.com/ 0.463452264456
  • http://www.doxsey.net/blog/go-and-assembly 0.23856726673
  • http://blog.getprismatic.com/blog/2013/2/1/graph-abstractions-for-structured-computation 0.394875530066
  • http://comoyo.github.com/blog/2013/02/06/the-inverse-of-ioc-is-control/ 0.356758219203
  • http://shopkick.github.com/flawless/ 0.378502926029
  • http://ericlippert.com/2013/02/06/static-constructors-part-one/ 0.275317859192
  • http://swizec.com/blog/first-impressions-of-rails-as-a-javascripter/swizec/5948 0.376670056001
  • http://solarianprogrammer.com/2013/02/07/sorting-data-in-parallel-cpu-gpu-2/ 0.283753036092
  • http://blogs.jetbrains.com/dotnet/2013/02/using-resharper-with-monotouch-applications/ 0.362785569257
  • http://blog.etapix.com/2013/02/hacking-liferay-securing-against-online.html -0.155582848044
  • http://shuklan.com/haskell/index.html -0.00684673283255
  • http://channel9.msdn.com/Series/Developing-HTML-5-Apps-Jump-Start 0.0370534449745
  • http://forthfreak.net/jsforth80x25.html -0.266162461616
  • https://gist.github.com/AdrianGaudebert/4708381 0.00708437321615
  • http://weblogs.asp.net/gunnarpeipman/archive/2013/02/07/using-database-unit-tests-in-visual-studio.aspx 0.154161227371
  • http://mindref.blogspot.com/2013/02/sql-vs-orm.html 0.319834105137


First impressions?? Seems to be ok accurate. It’s matching the text I gave it (the previous post from this blog) more closely with other articles than it is with pages like github and nothing is really a close match.

Like I said I think I need to do a bunch more tuning…

First off, it would be nice if the scraper I’ve built were truly grabbing the core text of the pages that are getting downloaded. 2nd like I said, I need to toss out words with fewer occurrences. Finally… I need a better (read more text) article to compare against than the text from the previous post. Basically there just isn’t enough of it.