CS-431: Clarification about question 6 of hands-on about PoS tagging 1

The probability that is maximized is the whole joint probability. The fact that we parametrize it left-to-right does not change that: the goal is still the whole joint probability (look for instance at the necessity of a backward step in the Viterbi algorithm). So the general answer is that any tag assignment to a given word depends on every other (global max).

Now, when some words are non-ambiguous and we're using an order-1 HMM, such a non-ambiguous word makes its left part assignment independent of its right part. Mathematically speaking, it's because the maximization of the whole product can be done by maximizing two independent functions (themselves being products of non-independent terms). Visually, you can see this by drawing the lattice used by the Viterbi algorithm in such a case.
If it's an order-2 HMM, you need two consecutive non-ambiguous words to reach the same effect. etc.

Questions & Discussions (about the course or NLP in general)

Clarification about question 6 of hands-on about PoS tagging 1

Re: Clarification about question 6 of hands-on about PoS tagging 1