Shadow significance

I've faced specific issue recently and kindly ask you to help.

Imagine standard linear supervised learning framing for binary classification problem (X,y, OLS, p-vals, etc.).
One can develop common solution for this problem and demonstrate high quality in sense of necessary metric (roc auc, rmse, etc.). Suppose, developer get another dataset X1 which is completely unmarked, but it have to be included into analysis and conclusions are needed.
  1. Previosly developed solution is important, moreover it shoud be used as a basis for analysis with X1 unmarked dataset
  2. This is true, that factors significant for X-problem could be insignificant for X1-problem
  3. Analyst cant merge two datasets and add "X/non-X" - binary flag as a factor, because she have to exercise "correct" decision-making into X1-dataset and she doesn't have adequate understanding of relation between X and X1 observations (in sense of target-variable, of course, all distributions are known)
Finally, my question is: do we have any approach for estimation developed model's quality over X1unmarked dataset? Can we make any solid conclusion about significance of choosen (in X-problem solution) factors over X1-dataset?
Thank you for your support!

Sorry for non-specificity, but I can't imagine even abstract approach for this issue.

Probably, related example can be described as next:
Imagine you have to predict probability of car accident for standard ICE vehicles. You will use one of many supervised learning methods and produce solution with good quality so ranking system can be developed and you are sure (from statistical point of view) the system is correct and you are capable to prove it for anyone. But in one moment Elon Musk's Tesla appears and Tesla vehicles should be included in the ranking system. You understand that existing solution, probably, can be worse with these Tesla-cases, but you can't develop another Tesla-specific solution and no one car accident was observed since appearance (even if 5 or 10 accidents were observed it will not help you cant use this information). So, this is the issue for ranking system owner, to find a way (statistical) to be sure that current system can handle new Tesla-cases, or can't handle but, again, with clear (less or more) understanding why.