## Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

Hi Group,

I have a few very basic questions/misunderstandings about the "discrete-time
Markov process" model Forney uses in his 1973 landmark paper
"The Viterbi Algorithm" (Proceedings of the IEEE, vol 61, no 3,
March, 1973).

On p.269, Forney arrives at a fundamental expression for P(z | x),
where x is a finite input state vector running from time 0 to
time K and z is a finite observation state vector running over
the same interval, the observation being made over a channel with
memoryless noise. His expression involves the transition
states xi[k] (Greek letter xi) where xi[k] = (x[k+1],x[k]).

Here's where I run into a real basic conceptual problem. The
probability we seek to evaluate is P(z | x), i.e., "the probability
of z GIVEN x", does not have ANYTHING to do with the transition
states. That is, we are GIVEN x - it matters not one whit how
probable x is. Now since we are given x it seems that the only
thing needed to evaluate P(z | x) are the *CHANNEL* transition
probabilities P(z[k] | x[k]), NOT the source transition probabilies
P(x[k+1] | x[k]).

In other words, when Forney gives the relation

P(z | x) = sum_{k=0}^{K-1} P(z[k] | xi[k])

I don't see that xi[k] has anything to do with it. We are GIVEN
x - the likelihood of x is 1!!!!
--
%% Randy Yates
%%  DSP Engineer
%%  Ericsson / Research Triangle Park, NC, USA

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> Hi Group,

> I have a few very basic questions/misunderstandings about the "discrete-time
> Markov process" model Forney uses in his 1973 landmark paper
> "The Viterbi Algorithm" (Proceedings of the IEEE, vol 61, no 3,
> March, 1973).

> On p.269, Forney arrives at a fundamental expression for P(z | x),
> where x is a finite input state vector running from time 0 to
> time K and z is a finite observation state vector running over
> the same interval, the observation being made over a channel with
> memoryless noise. His expression involves the transition
> states xi[k] (Greek letter xi) where xi[k] = (x[k+1],x[k]).

> Here's where I run into a real basic conceptual problem. The
> probability we seek to evaluate is P(z | x), i.e., "the probability
> of z GIVEN x", does not have ANYTHING to do with the transition
> states. That is, we are GIVEN x - it matters not one whit how
> probable x is. Now since we are given x it seems that the only
> thing needed to evaluate P(z | x) are the *CHANNEL* transition
> probabilities P(z[k] | x[k]), NOT the source transition probabilies
> P(x[k+1] | x[k]).

> In other words, when Forney gives the relation

>   P(z | x) = sum_{k=0}^{K-1} P(z[k] | xi[k])

> I don't see that xi[k] has anything to do with it. We are GIVEN
> x - the likelihood of x is 1!!!!
> --

I don't have the paper, and actaully never really read it, but if I
recall correctly from my spoon feed viterbi lessons, I would guess that
the relationship above will be converted to
P( x | z) by Bayes rule further into the paper  , which is the thing
that needs to be maximized.

- Show quoted text -

> %% Randy Yates
> %%  DSP Engineer
> %%  Ericsson / Research Triangle Park, NC, USA

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> > Hi Group,

> > I have a few very basic questions/misunderstandings about the "discrete-time
> > Markov process" model Forney uses in his 1973 landmark paper
> > "The Viterbi Algorithm" (Proceedings of the IEEE, vol 61, no 3,
> > March, 1973).

> > On p.269, Forney arrives at a fundamental expression for P(z | x),
> > where x is a finite input state vector running from time 0 to
> > time K and z is a finite observation state vector running over
> > the same interval, the observation being made over a channel with
> > memoryless noise. His expression involves the transition
> > states xi[k] (Greek letter xi) where xi[k] = (x[k+1],x[k]).

> > Here's where I run into a real basic conceptual problem. The
> > probability we seek to evaluate is P(z | x), i.e., "the probability
> > of z GIVEN x", does not have ANYTHING to do with the transition
> > states. That is, we are GIVEN x - it matters not one whit how
> > probable x is. Now since we are given x it seems that the only
> > thing needed to evaluate P(z | x) are the *CHANNEL* transition
> > probabilities P(z[k] | x[k]), NOT the source transition probabilies
> > P(x[k+1] | x[k]).

> > In other words, when Forney gives the relation

> >   P(z | x) = sum_{k=0}^{K-1} P(z[k] | xi[k])

> > I don't see that xi[k] has anything to do with it. We are GIVEN
> > x - the likelihood of x is 1!!!!
> > --

> I don't have the paper, and actaully never really read it, but if I
> recall correctly from my spoon feed viterbi lessons, I would guess that
> the relationship above will be converted to
> P( x | z) by Bayes rule further into the paper  , which is the thing
> that needs to be maximized.

I don't think so, Stan. The thing you really want is the value of
x that maximizes P(x,z), i.e., the Maximum A-Posteriori Estimate (MAP).
This (using Bayes rule) is

P(x,z) = P(z|x)*P(x)

(from p.271).
--
%% Randy Yates
%%  DSP Engineer
%%  Ericsson / Research Triangle Park, NC, USA

.

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> > > Hi Group,

> > > I have a few very basic questions/misunderstandings about the "discrete-time
> > > Markov process" model Forney uses in his 1973 landmark paper
> > > "The Viterbi Algorithm" (Proceedings of the IEEE, vol 61, no 3,
> > > March, 1973).

> > > On p.269, Forney arrives at a fundamental expression for P(z | x),
> > > where x is a finite input state vector running from time 0 to
> > > time K and z is a finite observation state vector running over
> > > the same interval, the observation being made over a channel with
> > > memoryless noise. His expression involves the transition
> > > states xi[k] (Greek letter xi) where xi[k] = (x[k+1],x[k]).

> > > Here's where I run into a real basic conceptual problem. The
> > > probability we seek to evaluate is P(z | x), i.e., "the probability
> > > of z GIVEN x", does not have ANYTHING to do with the transition
> > > states. That is, we are GIVEN x - it matters not one whit how
> > > probable x is. Now since we are given x it seems that the only
> > > thing needed to evaluate P(z | x) are the *CHANNEL* transition
> > > probabilities P(z[k] | x[k]), NOT the source transition probabilies
> > > P(x[k+1] | x[k]).

> > > In other words, when Forney gives the relation

> > >   P(z | x) = sum_{k=0}^{K-1} P(z[k] | xi[k])

> > > I don't see that xi[k] has anything to do with it. We are GIVEN
> > > x - the likelihood of x is 1!!!!
> > > --

> > I don't have the paper, and actaully never really read it, but if I
> > recall correctly from my spoon feed viterbi lessons, I would guess that
> > the relationship above will be converted to
> > P( x | z) by Bayes rule further into the paper  , which is the thing
> > that needs to be maximized.

> I don't think so, Stan. The thing you really want is the value of
> x that maximizes P(x,z), i.e., the Maximum A-Posteriori Estimate (MAP).
> This (using Bayes rule) is

If you have Van Trees vol I, look around page 50.  The MAP is actaully
P(x|z) and this doesn't depend on the random parameter being
differentiable.
Van Trees writes P(x|z) as
P(z|x)*P(x)
P(x|z)= -----------
P(Z)

and then uses the monoticity of the log to write

ln(P(x|z))=ln(P(z|x))+ln(P(x))-ln(P(z))

His next step is to observe we need to maximize ln(P(x|z)) wrt x and
then you end
up dropping P(z) because its not a function of x, which is marginalized
out.

maximize P(z|x)P(x) , so we really are basicly saying the same thing.

Van Trees goes on to assume a differential form for P(z,x) which you
can't do
for viterbi because x is a discrete parameter.  I'll try to get a copy
of Forney's paper and get a look at it.  It takes a while to get this
stuff through the corporate library, but I might have to do some HMM
work in the near term so spending some time on it is justifiably OK.

- Show quoted text -

>   P(x,z) = P(z|x)*P(x)

> (from p.271).
> --
> %% Randy Yates
> %%  DSP Engineer
> %%  Ericsson / Research Triangle Park, NC, USA

> .

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> > > > Hi Group,

> > > > I have a few very basic questions/misunderstandings about the "discrete-time
> > > > Markov process" model Forney uses in his 1973 landmark paper
> > > > "The Viterbi Algorithm" (Proceedings of the IEEE, vol 61, no 3,
> > > > March, 1973).

> > > > On p.269, Forney arrives at a fundamental expression for P(z | x),
> > > > where x is a finite input state vector running from time 0 to
> > > > time K and z is a finite observation state vector running over
> > > > the same interval, the observation being made over a channel with
> > > > memoryless noise. His expression involves the transition
> > > > states xi[k] (Greek letter xi) where xi[k] = (x[k+1],x[k]).

> > > > Here's where I run into a real basic conceptual problem. The
> > > > probability we seek to evaluate is P(z | x), i.e., "the probability
> > > > of z GIVEN x", does not have ANYTHING to do with the transition
> > > > states. That is, we are GIVEN x - it matters not one whit how
> > > > probable x is. Now since we are given x it seems that the only
> > > > thing needed to evaluate P(z | x) are the *CHANNEL* transition
> > > > probabilities P(z[k] | x[k]), NOT the source transition probabilies
> > > > P(x[k+1] | x[k]).

> > > > In other words, when Forney gives the relation

> > > >   P(z | x) = sum_{k=0}^{K-1} P(z[k] | xi[k])

> > > > I don't see that xi[k] has anything to do with it. We are GIVEN
> > > > x - the likelihood of x is 1!!!!
> > > > --

> > > I don't have the paper, and actaully never really read it, but if I
> > > recall correctly from my spoon feed viterbi lessons, I would guess that
> > > the relationship above will be converted to
> > > P( x | z) by Bayes rule further into the paper  , which is the thing
> > > that needs to be maximized.

> > I don't think so, Stan. The thing you really want is the value of
> > x that maximizes P(x,z), i.e., the Maximum A-Posteriori Estimate (MAP).
> > This (using Bayes rule) is

> If you have Van Trees vol I, look around page 50.  The MAP is actaully
> P(x|z)

Yep - I was wrong yesterday when I said the MAP is the value that
maximizes P(x,z). Rather, the MAP is max(P(x|z)).

Quote:> and this doesn't depend on the random parameter being
> differentiable.

Huh?

Quote:> Van Trees writes P(x|z) as
>          P(z|x)*P(x)
> P(x|z)= -----------
>            P(Z)

> and then uses the monoticity of the log to write

> ln(P(x|z))=ln(P(z|x))+ln(P(x))-ln(P(z))

It's probably a nit, but you mean the monotonicity is required
to say that maximizing one maximizes the other, right? The
fact that the ln expression above is true doesn't depend on
monotonicity.

Quote:> His next step is to observe we need to maximize ln(P(x|z)) wrt x and
> then you end
> up dropping P(z) because its not a function of x, which is marginalized
> out.

Right!

> maximize P(z|x)P(x) , so we really are basicly saying the same thing.

Right!!

Quote:> Van Trees goes on to assume a differential form for P(z,x)

-- which is equal to P(z|x)P(x) --

Quote:> which you can't do
> for viterbi because x is a discrete parameter.

Right!!! Gotcha!

I definitely have a better understanding now, but what still
bothers me is the use of the transition sequences for the conditional
probability P(z|x). Forney defines memoryless noise to be
noise in which the observation at time k (z[k]) depends
only on the transition at time k (xi[k]), and so

P(z|x) = P(z|xi).

This definition of memoryless noise is intuitively unsatisfying,
and it also conflicts (it seems) with the definition of memoryless
noise (or channel) given by Lee and Messerschmitt as a channel
in which the current output z[k] is independent of all inputs
except the current input x[k].

Quote:>I'll try to get a copy
> of Forney's paper and get a look at it.  It takes a while to get this
> stuff through the corporate library, but I might have to do some HMM
> work in the near term so spending some time on it is justifiably OK.

Stan, it sure is great to have someone to bounce this stuff off of. Thanks
for your help. This is USENET (and its people) in its finest form.

Quote:

> >   P(x,z) = P(z|x)*P(x)

> > (from p.271).

--
%% Randy Yates
%%  DSP Engineer
%%  Ericsson / Research Triangle Park, NC, USA

.

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> > > > > Hi Group,

> > > > > I have a few very basic questions/misunderstandings about the "discrete-time
> > > > > Markov process" model Forney uses in his 1973 landmark paper
> > > > > "The Viterbi Algorithm" (Proceedings of the IEEE, vol 61, no 3,
> > > > > March, 1973).

> > > > > On p.269, Forney arrives at a fundamental expression for P(z | x),
> > > > > where x is a finite input state vector running from time 0 to
> > > > > time K and z is a finite observation state vector running over
> > > > > the same interval, the observation being made over a channel with
> > > > > memoryless noise. His expression involves the transition
> > > > > states xi[k] (Greek letter xi) where xi[k] = (x[k+1],x[k]).

> > > > > Here's where I run into a real basic conceptual problem. The
> > > > > probability we seek to evaluate is P(z | x), i.e., "the probability
> > > > > of z GIVEN x", does not have ANYTHING to do with the transition
> > > > > states. That is, we are GIVEN x - it matters not one whit how
> > > > > probable x is. Now since we are given x it seems that the only
> > > > > thing needed to evaluate P(z | x) are the *CHANNEL* transition
> > > > > probabilities P(z[k] | x[k]), NOT the source transition probabilies
> > > > > P(x[k+1] | x[k]).

> > > > > In other words, when Forney gives the relation

> > > > >   P(z | x) = sum_{k=0}^{K-1} P(z[k] | xi[k])

> > > > > I don't see that xi[k] has anything to do with it. We are GIVEN
> > > > > x - the likelihood of x is 1!!!!
> > > > > --

> > > > I don't have the paper, and actaully never really read it, but if I
> > > > recall correctly from my spoon feed viterbi lessons, I would guess that
> > > > the relationship above will be converted to
> > > > P( x | z) by Bayes rule further into the paper  , which is the thing
> > > > that needs to be maximized.

> > > I don't think so, Stan. The thing you really want is the value of
> > > x that maximizes P(x,z), i.e., the Maximum A-Posteriori Estimate (MAP).
> > > This (using Bayes rule) is

> > If you have Van Trees vol I, look around page 50.  The MAP is actaully
> > P(x|z)

> Yep - I was wrong yesterday when I said the MAP is the value that
> maximizes P(x,z). Rather, the MAP is max(P(x|z)).

> > and this doesn't depend on the random parameter being
> > differentiable.

> Huh?

> > Van Trees writes P(x|z) as
> >          P(z|x)*P(x)
> > P(x|z)= -----------
> >            P(Z)

> > and then uses the monoticity of the log to write

> > ln(P(x|z))=ln(P(z|x))+ln(P(x))-ln(P(z))

> It's probably a nit, but you mean the monotonicity is required
> to say that maximizing one maximizes the other, right? The
> fact that the ln expression above is true doesn't depend on
> monotonicity.

Right

- Show quoted text -

> > His next step is to observe we need to maximize ln(P(x|z)) wrt x and
> > then you end
> > up dropping P(z) because its not a function of x, which is marginalized
> > out.

> Right!

> > maximize P(z|x)P(x) , so we really are basicly saying the same thing.

> Right!!

> > Van Trees goes on to assume a differential form for P(z,x)

> -- which is equal to P(z|x)P(x) --

> > which you can't do
> > for viterbi because x is a discrete parameter.

> Right!!! Gotcha!

> I definitely have a better understanding now, but what still
> bothers me is the use of the transition sequences for the conditional
> probability P(z|x). Forney defines memoryless noise to be
> noise in which the observation at time k (z[k]) depends
> only on the transition at time k (xi[k]), and so

>   P(z|x) = P(z|xi).

> This definition of memoryless noise is intuitively unsatisfying,
> and it also conflicts (it seems) with the definition of memoryless
> noise (or channel) given by Lee and Messerschmitt as a channel
> in which the current output z[k] is independent of all inputs
> except the current input x[k].

I think I might understand your confusion and since I really haven't
looked at the paper, this is a guess.  Way back (before the world's most
popular OS existed) in my comms class, I was taught two versions of
Viterbi, one was called "soft decision" and the other "hard decision".
Soft works on discrete symbols recieved over an AWG channel, which
results in continuous data.  Hard was based on strictly discrete data,
like using viterbi as a convolutional decoder. Hard could also be used
on the output of some memoryless M-ary detector. We spent some time
looking at this configuration because although soft worked better,
before engineers could use the world's most popular OS to solve real
life commerical problems, the hard decision algorithm was cheaper to
in terms of a discrete information theoretic channel, and the paper
might be covering a soft reciever.

Quote:> >I'll try to get a copy
> > of Forney's paper and get a look at it.  It takes a while to get this
> > stuff through the corporate library, but I might have to do some HMM
> > work in the near term so spending some time on it is justifiably OK.

> Stan, it sure is great to have someone to bounce this stuff off of. Thanks
> for your help. This is USENET (and its people) in its finest form.

Thanks - but the dark powers of entropy are destined to make this short
lived.  Does anyone want to port Windoz 95 on a TRW 1010J?

- Show quoted text -

> > >   P(x,z) = P(z|x)*P(x)

> > > (from p.271).
> --
> %% Randy Yates
> %%  DSP Engineer
> %%  Ericsson / Research Triangle Park, NC, USA

> .

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> > [...]
> > I definitely have a better understanding now, but what still
> > bothers me is the use of the transition sequences for the conditional
> > probability P(z|x). Forney defines memoryless noise to be
> > noise in which the observation at time k (z[k]) depends
> > only on the transition at time k (xi[k]), and so

> >   P(z|x) = P(z|xi).

> > This definition of memoryless noise is intuitively unsatisfying,
> > and it also conflicts (it seems) with the definition of memoryless
> > noise (or channel) given by Lee and Messerschmitt as a channel
> > in which the current output z[k] is independent of all inputs
> > except the current input x[k].

>    I think I might understand your confusion and since I really haven't
> looked at the paper, this is a guess.  Way back (before the world's most
> popular OS existed) in my comms class, I was taught two versions of
> Viterbi, one was called "soft decision" and the other "hard decision".
> Soft works on discrete symbols recieved over an AWG channel, which
> results in continuous data.  Hard was based on strictly discrete data,
> like using viterbi as a convolutional decoder. Hard could also be used
> on the output of some memoryless M-ary detector. We spent some time
> looking at this configuration because although soft worked better,
> before engineers could use the world's most popular OS to solve real
> life commerical problems, the hard decision algorithm was cheaper to
> implement. Looking over your original post, it seems like your thinking
> in terms of a discrete information theoretic channel, and the paper
> might be covering a soft reciever.

No, the problem I'm having is prior to the receiver but after the
channel. Let's say it's the receiving antenna's output.

All I'm trying to say is that, in my simpleton mind, "memoryless"
noise takes a single sample, x[k], and adds a noise term to it
so that the output z[k] is just

z[k] = x[k] + n[k].

Thus the noise in z[k] has nothing to do with a transition from x[k] to
x[k+1].
--
%% Randy Yates
%%  DSP Engineer
%%  Ericsson / Research Triangle Park, NC, USA

.

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> > > [...]
> > > I definitely have a better understanding now, but what still
> > > bothers me is the use of the transition sequences for the conditional
> > > probability P(z|x). Forney defines memoryless noise to be
> > > noise in which the observation at time k (z[k]) depends
> > > only on the transition at time k (xi[k]), and so

> > >   P(z|x) = P(z|xi).

> > > This definition of memoryless noise is intuitively unsatisfying,
> > > and it also conflicts (it seems) with the definition of memoryless
> > > noise (or channel) given by Lee and Messerschmitt as a channel
> > > in which the current output z[k] is independent of all inputs
> > > except the current input x[k].

> >    I think I might understand your confusion and since I really haven't
> > looked at the paper, this is a guess.  Way back (before the world's most
> > popular OS existed) in my comms class, I was taught two versions of
> > Viterbi, one was called "soft decision" and the other "hard decision".
> > Soft works on discrete symbols recieved over an AWG channel, which
> > results in continuous data.  Hard was based on strictly discrete data,
> > like using viterbi as a convolutional decoder. Hard could also be used
> > on the output of some memoryless M-ary detector. We spent some time
> > looking at this configuration because although soft worked better,
> > before engineers could use the world's most popular OS to solve real
> > life commerical problems, the hard decision algorithm was cheaper to
> > implement. Looking over your original post, it seems like your thinking
> > in terms of a discrete information theoretic channel, and the paper
> > might be covering a soft reciever.

> No, the problem I'm having is prior to the receiver but after the
> channel. Let's say it's the receiving antenna's output.

> All I'm trying to say is that, in my simpleton mind, "memoryless"
> noise takes a single sample, x[k], and adds a noise term to it
> so that the output z[k] is just

>   z[k] = x[k] + n[k].

> Thus the noise in z[k] has nothing to do with a transition from x[k] to
> x[k+1].

OK - maybe the notation is bad, but x[k] is a random variable, so lets
say x[i] is from a binary alphabet :

P(x[k])=\sum_{0,1} P(x[k] | x[k-1]={0,1})P(x[k-1]={0,1})

Right?

- Show quoted text -

> --
> %% Randy Yates
> %%  DSP Engineer
> %%  Ericsson / Research Triangle Park, NC, USA

> .

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

Hi Randy,

I didn't read the Forney paper but did some work on Viterbi decoding few month
ago. When I remember right you should distinguish between  the sequence of
the  transmitted symbols x(k) and the state sequence s(k), which is the path
through the trellis. The Tx symbol x(k) directly depends on the state
transition
x(k) = f( s(k-1), s(k)).. If the channel noise is memoryless then the
observation z(k) at the receiver side depends only on the transmitted symbol
x(k), as you already stated. But Forney is also right because there is a one
to one relation between the  Tx symbol x(k) and the state transition s(k-1) ->
s(k). If you know one you know
the other:

P(z(k) | x(k)) = P(z(k) | s(k-1) -> s(k))

Bye,
Alex.

Randy Yates schrieb:

> > > [...]
> > > I definitely have a better understanding now, but what still
> > > bothers me is the use of the transition sequences for the conditional
> > > probability P(z|x). Forney defines memoryless noise to be
> > > noise in which the observation at time k (z[k]) depends
> > > only on the transition at time k (xi[k]), and so

> > >   P(z|x) = P(z|xi).

> > > This definition of memoryless noise is intuitively unsatisfying,
> > > and it also conflicts (it seems) with the definition of memoryless
> > > noise (or channel) given by Lee and Messerschmitt as a channel
> > > in which the current output z[k] is independent of all inputs
> > > except the current input x[k].

> >    I think I might understand your confusion and since I really haven't
> > looked at the paper, this is a guess.  Way back (before the world's most
> > popular OS existed) in my comms class, I was taught two versions of
> > Viterbi, one was called "soft decision" and the other "hard decision".
> > Soft works on discrete symbols recieved over an AWG channel, which
> > results in continuous data.  Hard was based on strictly discrete data,
> > like using viterbi as a convolutional decoder. Hard could also be used
> > on the output of some memoryless M-ary detector. We spent some time
> > looking at this configuration because although soft worked better,
> > before engineers could use the world's most popular OS to solve real
> > life commerical problems, the hard decision algorithm was cheaper to
> > implement. Looking over your original post, it seems like your thinking
> > in terms of a discrete information theoretic channel, and the paper
> > might be covering a soft reciever.

> No, the problem I'm having is prior to the receiver but after the
> channel. Let's say it's the receiving antenna's output.

> All I'm trying to say is that, in my simpleton mind, "memoryless"
> noise takes a single sample, x[k], and adds a noise term to it
> so that the output z[k] is just

>   z[k] = x[k] + n[k].

> Thus the noise in z[k] has nothing to do with a transition from x[k] to
> x[k+1].
> --
> %% Randy Yates
> %%  DSP Engineer
> %%  Ericsson / Research Triangle Park, NC, USA

> .

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

On Thu, 19 Aug 1999 14:18:43 -0400, Stan Pawlukiewicz

>   I think I might understand your confusion and since I really haven't
>looked at the paper, this is a guess.  Way back (before the world's most
>popular OS existed) in my comms class, I was taught two versions of
>Viterbi, one was called "soft decision" and the other "hard decision".
>Soft works on discrete symbols recieved over an AWG channel, which
>results in continuous data.  Hard was based on strictly discrete data,
>like using viterbi as a convolutional decoder. Hard could also be used
>on the output of some memoryless M-ary detector. We spent some time
>looking at this configuration because although soft worked better,
>before engineers could use the world's most popular OS to solve real
>life commerical problems, the hard decision algorithm was cheaper to
>in terms of a discrete information theoretic channel, and the paper
>might be covering a soft reciever.

Eep!  This is a little strange.   I think what most people refer to
nowadays as soft-decision Viterbi and hard-decision Viterbi only has
to do with the quantization of the input (channel) sequence.  In other
words, if the channel symbols are quantized to more than one bit each
(and usually in a log-like scale), then this is considered
soft-decision.  If the input stream is strictly a one-bit-per-symbol
binary representation of the polarity of the received symbol, then
this is hard-decision.  Using both to decode an AWGN channel sequence
yields about 2dB improvement for the soft-decision case (so nearly
everybody uses soft decision unless there's a complexity problem).

And to further confuse the issue, lots of folks are now using
Soft-Output-Viterbi-Algorithm (SOVA) decoders for Turbo applications,
where not only the input is soft decision, but the output is a
likelihood scale of the decoded bit decision rather than the usual
binary data.

Eric Jacobsen
Minister of Algorithms, EF Data Corp.
http://www.primenet.com/~ericj

### Discrete-Time Markov Processes and Forney's Paper on the Viterbi Algorithm

> On Thu, 19 Aug 1999 14:18:43 -0400, Stan Pawlukiewicz

> >   I think I might understand your confusion and since I really haven't
> >looked at the paper, this is a guess.  Way back (before the world's most
> >popular OS existed) in my comms class, I was taught two versions of
> >Viterbi, one was called "soft decision" and the other "hard decision".
> >Soft works on discrete symbols recieved over an AWG channel, which
> >results in continuous data.  Hard was based on strictly discrete data,
> >like using viterbi as a convolutional decoder. Hard could also be used
> >on the output of some memoryless M-ary detector. We spent some time
> >looking at this configuration because although soft worked better,
> >before engineers could use the world's most popular OS to solve real
> >life commerical problems, the hard decision algorithm was cheaper to
> >implement. Looking over your original post, it seems like your thinking
> >in terms of a discrete information theoretic channel, and the paper
> >might be covering a soft reciever.

> Eep!

Eep?

Quote:>This is a little strange.   I think what most people refer to
> nowadays as soft-decision Viterbi and hard-decision Viterbi only has
> to do with the quantization of the input (channel) sequence.  In other
> words, if the channel symbols are quantized to more than one bit each
> (and usually in a log-like scale), then this is considered
> soft-decision.

well I'm no authority, but I don't agree.  It might be a matter of which
community you use it in, but my understanding Viterbi, is a path
maximization (or min) algorithm.  Soft, in my understanding has to do
with using the map likelihood (which is continous) and hard has to do
with using direct decisions.  If I understand your comment, it seems to
imply that only a binary state information stream can have a hard
decision decoder, which I don't think is right.  I do have some
experience in multi-target tracking and there hard decisons mean hard
assignments.  Soft decisions usually mean anything other than hard
decisions and often means using likelihoods.

- Show quoted text -

Quote:>If the input stream is strictly a one-bit-per-symbol
> binary representation of the polarity of the received symbol, then
> this is hard-decision.  Using both to decode an AWGN channel sequence
> yields about 2dB improvement for the soft-decision case (so nearly
> everybody uses soft decision unless there's a complexity problem).

> And to further confuse the issue, lots of folks are now using
> Soft-Output-Viterbi-Algorithm (SOVA) decoders for Turbo applications,
> where not only the input is soft decision, but the output is a
> likelihood scale of the decoded bit decision rather than the usual
> binary data.

> Eric Jacobsen
> Minister of Algorithms, EF Data Corp.
> http://www.primenet.com/~ericj

What newsgroup or list might be best to find  a person
to communicate with on the Viterbi algorithm as a tool
to help with Markov model analysis?

I'm asking because I'm exploring the Viterbi algorithm
and have a few questions I would like to ask
someone who is very familiar with it.

Harry