Hello,
(updates in blue)
Yes, the actor only outputs the mean. (meaning only the mean depends on the state)
It's also valid to output the (log) standard deviation in the actor as well. Here we have chosen not to; it's an implicit bias to make things more stable. (meaning make the std also state-dependent)
You can refer to Appendix B.8 and decision C59 in this paper https://arxiv.org/pdf/2006.05990 for an empirical discussion.