Three reviewers have been tasked to evaluate the categories into which 200 contentions fall. The categories are groundwater, corrosion, drip, and dose. What is an applicable test?

The null hypothesis is that there will be a statistically significant degree (.05) to which the three reviewers will agree on the category into which each contention will fall.

I would look at Cronbach alpha or more generally interrater reliability measures which were specifically designed to test this.