Probabilistic Pos Tagging: limited scope for syntactic dependencies

Probabilistic Pos Tagging: limited scope for syntactic dependencies

by Laurens Ludovicus Michielsen -
Number of replies: 2

Hi everyone

I have a question about the probability we are maximizing. After applying both hypotheses we get the following: 


I am just wondering if P(T1k) is equal to. Just as an example lets take k = 3. Is the sequence equal to

  • P(T1) * P(T2) * P(T3
  • P(T1) * P(T2|T1) * P(T3|T2 , T1)

I am inclined to go for the second option as it makes more sense but the notation in the slide confuses me slightly so I just want to be sure.


Thanks for your help!

Laurens

In reply to Laurens Ludovicus Michielsen

Re: Probabilistic Pos Tagging: limited scope for syntactic dependencies

by Jean-Cédric Chappelier -
It's nothing else more than P(T_1, ..., T_k) : the probability to start with that k-gram of tags. Notice that k is the "size" of the model: the "support" for the parameters are k-grams.
If k=3, this is P(T_1T_2T_3), and that's it! (as usual, there is some implicit in the notation: _1, _2 and _3 indicate that it's an initial probability: probability to start with it.)
Sure you can rewrite it with your second formula (not the first one, which is wrong), but I don't see the point: for k=3, P(T_1 = t_1, T_2= t_2, T_3= t_3) are parameters (to be learned).
Does it clarify?