Journal Updates

As part of the Canadian Distributed Mentorship Program, here is where I will be posting my journal entries relating to the work done.

Week Ten: July 3 - July 7

Last friday I had come up with a 5-dimensional matrix solution as an internal representation for the p(future|past,action). 'Solution' is a bit of an overstatement, as indexes of the form array[i][j][k][l][m] are just asking for trouble. Not to mention that 5 nested for-loops look alarming. Why 5? Well, each past is represented as an array of a finite length, each entry having an (action, observation) pair component; thus, if NHIST is the total number of histories saved, and T is the T steps remembered for each history(i.e to...tT), this looks like: Pasts[NHIST][T][2], where the [0] and [1] entries represent the action and observation at time step t. Since for the p(future|past,action) representation you need to add two new dimensions, one for the action taken and one for the observation observed, you end up with 5 dimensions. This is, quite possibly, the ugliest, chunkiest representation known to man, and you can understand why I wasn't too eager to implement it. After a meeting with Doina, we decided that there was a better way to represent the probability distribution, using counts. Namely, p(Xfuture|Xpast,Xaction) = #times(Xpast, Xaction, Xfuture) was seen/ #times (Xpast, Xaction) was seen, where Xfuture, Xpast and Xaction are specific occurences. This makes much more sense, and results in a significantly more elegant representation, code wise.

This took care of one out of the two input distributions for the array. I decided to calculate the p(xpast) probability distribution very similarly to a belief state in a POMDP. Specifically, if your system model has 3 states (this would be one of the bubbles in the float-reset diagram), at each time step you calculate the probability of being in each of those states. Each time step in a history can be of the form: f0 (we took a float action and observed a 0; note that f1 is impossible to observe) r0(we took a reset action, and observed a 0, which means we were not in the initial state) or r1(we took a reset action from the initial state). The probability of the first history would be 1, as a float action can only give you an observation of 1, while the probabilities of the other 2 would be, respectively, that of not being in the initial state (r0) or of being in it(r1).

As I will find out next week, this approach isn't entirely correct, as it isn't general enough. Because it was stolen from the POMPDP people, what my approach really gives is the p(future|past,action,state), and in order to fix it, you would have to marginalize over all possible states. However, thank counts, the easier approach is the count-based calculation again, namely: p(xpast) = #times xpast was seen/total pasts seen.

Implementing these 2 distributions, testing them, as well as fixing some silly bugs (with the original belief-state approach for p(xpast), I forgot that the pasts were of finite length, and calculated the probability of being in each state in reference to the first time step; of course, as time progressed, each history became more and more improbable, until the probability of it being observed reached 0. This, of course, is patently wrong.) took pretty much the rest of the week. It would have probably taken me less time to do this, had I not been counting the hours until the week off...