[Correia et al. 2020]

Correia G., Niculae V., Aziz W., & Martins A. Efficient marginalization of discrete and structured latent variables via sparsity. In Advances in NIPS, 11789–11802 (2020).

[Devlin et al. 2019]

Devlin J., Chang M., Lee K., & Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL, 4171–4186 (2019).

[Dozat & Manning 2017]

Dozat T. & Manning C. Deep biaffine attention for neural dependency parsing. In Proceedings of ICLR, (2017).

[Dozat & Manning 2018]

Dozat T. & Manning C. Simpler but more accurate semantic dependency parsing. In Proceedings of ACL, 484–490 (2018).

[Eisner 2000]

Eisner J. Bilexical Grammars and their Cubic-Time Parsing Algorithms, pages 29–61. Springer Netherlands, Dordrecht, 2000. URL:

[Eisner 2016]

Eisner J. Inside-outside and forward-backward algorithms are just backprop (tutorial paper). In Proceedings of WS, 1–17 (2016).

[Eisner & Satta 1999]

Eisner J. & Satta G. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of ACL, 457–464 (1999).

[Gal & Ghahramani 2016]

Gal Y. & Ghahramani Z. Dropout as a bayesian approximation: representing model uncertainty in deep learning. In Proceedings of ICML, 1050–1059 (2016).

[Goodman 1999]

Goodman J. Semiring parsing. Computational Linguistics, 573–606 (1999).

[Hwa 2000]

Hwa R. Sample selection for statistical grammar induction. In Proceedings of ACL, 45–52 (2000).

[Kim et al. 2019]

Kim Y., Rush A., Yu L., Kuncoro A., Dyer C., et al. Unsupervised recurrent neural network grammars. In Proceedings of NAACL, 1105–1117 (2019).

[Kitaev & Klein 2020]

Kitaev N. & Klein D. Tetra-tagging: word-synchronous parsing with linear-time inference. In Proceedings of ACL, 6255–6261 (2020).

[Koo et al. 2007]

Koo T., Globerson A., Carreras X., & Collins M. Structured prediction models via the matrix-tree theorem. In Proceedings of EMNLP, 141–150 (2007).

[Lafferty et al. 2001]

Lafferty J., McCallum A., & Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML, 282–289 (2001).

[Li & Eisner 2009]

Li Z. & Eisner J. First- and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of EMNLP, 40–51 (2009).

[Ma & Hovy 2017]

Ma X. & Hovy E. Neural probabilistic model for non-projective MST parsing. In Proceedings of IJCNLP, 59–69 (2017).

[Martins & Astudillo 2016]

Martins A. & Astudillo R. From softmax to sparsemax: a sparse model of attention and multi-label classification. In Proceedings of ICML, 1614–1623 (2016).

[McDonald & Pereira 2006]

McDonald R. & Pereira F. Online learning of approximate dependency parsing algorithms. In Proceedings of EACL, 81–88 (2006).

[McDonald et al. 2005]

McDonald R., Pereira F., Ribarov K., & Haji\vc J. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of EMNLP, 523–530 (2005).

[Mensch & Blondel 2018]

Mensch A. & Blondel M. Differentiable dynamic programming for structured prediction and attention. In Proceedings of ICML, 3462–3471 (2018).

[Peters et al. 2018]

Peters M., Neumann M., Iyyer M., Gardner M., Clark C., et al. Deep contextualized word representations. In Proceedings of NAACL, 2227–2237 (2018).

[Sarawagi & Cohen 2004]

Sarawagi S. & Cohen W. Semi-markov conditional random fields for information extraction. In Advances in NIPS, 1185–1192 (2004).

[Smith & Eisner 2008]

Smith D. & Eisner J. Dependency parsing by belief propagation. In Proceedings of EMNLP, 145–156 (2008).

[Stern et al. 2017]

Stern M., Andreas J., & Klein D. A minimal span-based neural constituency parser. In Proceedings of ACL, 818–827 (2017).

[Wang et al. 2019]

Wang X., Huang J., & Tu K. Second-order semantic dependency parsing with end-to-end neural networks. In Proceedings of ACL, 4609–4618 (2019).

[Wang & Tu 2020]

Wang X. & Tu K. Second-order neural dependency parsing with message passing and end-to-end training. In Proceedings of AACL, 93–99 (2020).

[Yang & Deng 2020]

Yang K. & Deng J. Strongly incremental constituency parsing with graph neural networks. In Advances in NIPS, 21687–21698 (2020).

[Yang et al. 2021]

Yang S., Zhao Y., & Tu K. Neural bi-lexicalized PCFG induction. In Proceedings of ACL, 2688–2699 (2021).

[Zhang et al. 2020a]

Zhang Y., Li Z., & Zhang M. Efficient second-order TreeCRF for neural dependency parsing. In Proceedings of ACL, 3295–3305 (2020a).

[Zhang et al. 2020b]

Zhang Y., Zhou h., & Li Z. Fast and accurate neural crf constituency parsing. In Proceedings of IJCAI, 4046–4053 (2020b).