Help with inter-observer agreement

Please help if you can (novice level)

I have a gold standard test that I would like to compare to a new test.

This new test will be assessed as categorical binary outcome (positive or negative) by 2 observers.

I can compare reliability of the two observers with kappa. That's easy.

How should I compare my gold standard to the findings of the 2 observers? Do I set up a 2x2 table for only values in which the observers agreed? How can I compare my standard to the new test in which the observers had disagreement?