For an enlarged version, click on the poster or download the file:
Regarding a Brazilian reference corpus, Sketch Engine has a 3.9 billion corpus of Portuguese, but its annotation has been criticised. Corpus Brasileiro, an ongoing project from the post-graduate program in Applied Linguistics at the PUCSP, a Brazilian university, currently has 1,249,100 words. I know it will be better annotated, but I am not sure it is large enough to be a reference corpus.
i) What do you think would be better: a large corpus with bad annotation or a small corpus with good annotation? Do you have any other suggestions?
ii) Any suggestions/ideas regarding my next steps are welcome too!