Journal Updates
As part of the Canadian Distributed Mentorship Program, here is where I will be posting my journal entries relating to the work done.
I - | II - | III - | IV - | V - | VI - | VII - | VIII - | IX- | X- | XI- | XII- | XIII- | XIV- | XV- | XVI- | XVII- | XVIII |
Week Thirteen: July 24 - July 28
The good news are that on Monday, top of the morning, I found just why the equations were converging within the first iteration. The bad news is that it's a very, very nasty bug, and that by Friday I still wasn't any closer to fixing it. It turns out the two most important probability distributions in the system of equations, namely p(state|past) and p(action|past)--that every other probability distribution just happens to depend on-- both contain entries of Infinity and Not-a-Number. No real numbers, none at all! Tickle me surprised, but infinity probability distributions are so far away from right, that you can't even see the right value anymore. The KL distance between infinity and infinity is, unsurprisingly, 0.
At this time, you too can raise your Spock eyebrow.
But before I could take care of this, I had to meet with Doina and Prakash about slightly more important matters. Since he is only in town for two days(he's been gone at conferences pretty much every week, and pretty much all over Europe. Jealous? Don't mind if I am...:) ), my non-functional code wasn't a priority. The two of them, as well as Chris Hundt, a previous McGill student, have been working on a duality notion for automata. With Chris graduating, they wanted someone else keen and eager to continue his work. However, having never worked with automata before, Prakash had to give me an automata-101 impromptu lecture. It turns out that an automaton (two automata, oh latin, you are too much!) is very similar to a Markov decision process, if you take away the probabilistic transitions, and add some greek letters where normal letters could do. We decided we would have a meeting later on, when he would get back from his vacations. In the meantime, I should be reading about Category Theory, Adjoints, and Automata...Ah, back to the code. At first I thought there was a division by zero happening somewhere, but the problem turned out to be much uglier. There are two constants, lambda and mu, being used in two of the equations. For deterministic policies, these two ought to reeach 0, in the limit. Of course, setting them to 0 in the code is asking for trouble, so I assigned a value of 10^-10 to both of them. The mu is used in a term of the form x = exp( -1/mu (dkl) ), where dkl is a KL distance between two probabililties, and exp is your classic power-of-e function. Now, the smaller the mu, the larger the exp term becomes. If the term becomes largely negative, all is well, as exp(something-very-negative) is pleasingly 0. Otherwise, our good friend infinity shows up. Clearly, negative KL distances reared their head again.
Since I thought I had taken care of this, I went to meet with Doina and express my concerns. The only reason why this could be happening is if the probability distributions in the algorithm were initialized "wrongly". We noticed that 2 of them happened to be dependent on each other, and thus initializing both to a normalized random distribution would not make sense. I left to make the adequate changes, and hope that this would fix things.
Which it didn't. After asking Amin, one of my lab-mates, to sanity check what I was doing, he pointed out that I would also have a sampling problem. Specifically, in order to get the p(future|past,action), I would simulate one random future for each simulated past. However, this isn't so great if you're trying to use it in learning, as seeing one value is often times irrelevant. Thus, I adjusted my code to simulate 100 different futures for each past, thus increasing the correctness of the distribution. Of course, this was completely unrelated to the infinity problem, and only helped in sidetrack me for about one day.
Infinity, we are no longer friends.