> > > > > Hi Group,
> > > > > I have a few very basic questions/misunderstandings about the "discrete-time
> > > > > Markov process" model Forney uses in his 1973 landmark paper
> > > > > "The Viterbi Algorithm" (Proceedings of the IEEE, vol 61, no 3,
> > > > > March, 1973).
> > > > > On p.269, Forney arrives at a fundamental expression for P(z | x),
> > > > > where x is a finite input state vector running from time 0 to
> > > > > time K and z is a finite observation state vector running over
> > > > > the same interval, the observation being made over a channel with
> > > > > memoryless noise. His expression involves the transition
> > > > > states xi[k] (Greek letter xi) where xi[k] = (x[k+1],x[k]).
> > > > > Here's where I run into a real basic conceptual problem. The
> > > > > probability we seek to evaluate is P(z | x), i.e., "the probability
> > > > > of z GIVEN x", does not have ANYTHING to do with the transition
> > > > > states. That is, we are GIVEN x - it matters not one whit how
> > > > > probable x is. Now since we are given x it seems that the only
> > > > > thing needed to evaluate P(z | x) are the *CHANNEL* transition
> > > > > probabilities P(z[k] | x[k]), NOT the source transition probabilies
> > > > > P(x[k+1] | x[k]).
> > > > > In other words, when Forney gives the relation
> > > > > P(z | x) = sum_{k=0}^{K-1} P(z[k] | xi[k])
> > > > > I don't see that xi[k] has anything to do with it. We are GIVEN
> > > > > x - the likelihood of x is 1!!!!
> > > > > --
> > > > I don't have the paper, and actaully never really read it, but if I
> > > > recall correctly from my spoon feed viterbi lessons, I would guess that
> > > > the relationship above will be converted to
> > > > P( x | z) by Bayes rule further into the paper , which is the thing
> > > > that needs to be maximized.
> > > I don't think so, Stan. The thing you really want is the value of
> > > x that maximizes P(x,z), i.e., the Maximum A-Posteriori Estimate (MAP).
> > > This (using Bayes rule) is
> > If you have Van Trees vol I, look around page 50. The MAP is actaully
> > P(x|z)
> Yep - I was wrong yesterday when I said the MAP is the value that
> maximizes P(x,z). Rather, the MAP is max(P(x|z)).
> > and this doesn't depend on the random parameter being
> > differentiable.
> Huh?
> > Van Trees writes P(x|z) as
> > P(z|x)*P(x)
> > P(x|z)= -----------
> > P(Z)
> > and then uses the monoticity of the log to write
> > ln(P(x|z))=ln(P(z|x))+ln(P(x))-ln(P(z))
> It's probably a nit, but you mean the monotonicity is required
> to say that maximizing one maximizes the other, right? The
> fact that the ln expression above is true doesn't depend on
> monotonicity.
Right
> > His next step is to observe we need to maximize ln(P(x|z)) wrt x and
> > then you end
> > up dropping P(z) because its not a function of x, which is marginalized
> > out.
> Right!
> > maximize P(z|x)P(x) , so we really are basicly saying the same thing.
> Right!!
> > Van Trees goes on to assume a differential form for P(z,x)
> -- which is equal to P(z|x)P(x) --
> > which you can't do
> > for viterbi because x is a discrete parameter.
> Right!!! Gotcha!
> I definitely have a better understanding now, but what still
> bothers me is the use of the transition sequences for the conditional
> probability P(z|x). Forney defines memoryless noise to be
> noise in which the observation at time k (z[k]) depends
> only on the transition at time k (xi[k]), and so
> P(z|x) = P(z|xi).
> This definition of memoryless noise is intuitively unsatisfying,
> and it also conflicts (it seems) with the definition of memoryless
> noise (or channel) given by Lee and Messerschmitt as a channel
> in which the current output z[k] is independent of all inputs
> except the current input x[k].
I think I might understand your confusion and since I really haven't
looked at the paper, this is a guess. Way back (before the world's most
popular OS existed) in my comms class, I was taught two versions of
Viterbi, one was called "soft decision" and the other "hard decision".
Soft works on discrete symbols recieved over an AWG channel, which
results in continuous data. Hard was based on strictly discrete data,
like using viterbi as a convolutional decoder. Hard could also be used
on the output of some memoryless M-ary detector. We spent some time
looking at this configuration because although soft worked better,
before engineers could use the world's most popular OS to solve real
life commerical problems, the hard decision algorithm was cheaper to
implement. Looking over your original post, it seems like your thinking
in terms of a discrete information theoretic channel, and the paper
might be covering a soft reciever.
Quote:> >I'll try to get a copy
> > of Forney's paper and get a look at it. It takes a while to get this
> > stuff through the corporate library, but I might have to do some HMM
> > work in the near term so spending some time on it is justifiably OK.
> Stan, it sure is great to have someone to bounce this stuff off of. Thanks
> for your help. This is USENET (and its people) in its finest form.
Thanks - but the dark powers of entropy are destined to make this short
lived. Does anyone want to port Windoz 95 on a TRW 1010J?
> > > P(x,z) = P(z|x)*P(x)
> > > (from p.271).
> --
> %% Randy Yates
> %% DSP Engineer
> %% Ericsson / Research Triangle Park, NC, USA
> .