Legitimacy of statistics in determining whether something has an effect

Hi all. I was reading about the ganzfeld experiments https://en.wikipedia.org/wiki/Ganzfeld_experiment and watching some youtube vids. They have one person outside a room to pick an image from 4 images, and the person inside the room try to guess which one. Numerous experiments with hundreds of individual trials has given a result higher than 25%, and meta analysis seem to converge to 32%. Now I'm a non materialist so I don't have a prejudice against these things. I feel however 32% is hardly impressive even those many experiment (I think it's more than 10).

I know the central limit theorem says the distribution of the sample average must converge to the mean as n->infinity, but that's just "theory" and in real life I feel even with 100 million trials 32% is hardly impressive. I'd only feel the results to be significant if they have 90%+. Another article I read was about precognition by Daryl Bem, he got 55% with thousands of trials and it was published in a reputable journal (50% is theoretical chance). I mean 55% with 50% chance is enough to conclude ANYTHING AT ALL seriously!? It must be close to 100% to conclude anything.

This made me question a thing I never thought about before: are statistical methods really legit in determining whether A has an effect on B? My questions are

(1) has there been any experiments (of anything) where A gives a result a bit different from chance (like 32% with real chance 25%), and removal of A gives a result very close to chance (e.g. 25.3% with real chance 25%), and they flipped A on and off a lot of times and each time same things happen?
(2) are there any experiments with a different-from-chance result such that later people put B under the microscope and confirmed A really does have an effect on B?

I'm sorry if this question sounds stupid and confusing, but I've always felt stats is only useful for surveys, or pattern recognition at best. It seems to me many if not most results in psychology rely on these statistical methods where the deviation from chance is not impressive at all.
Forget the Central Limit Theorem. Google "p-hacking," instead. The statistical methods are, in principle, fine. Applying them to just the results you like, not so much.
I feel even with 100 million trials 32% is hardly impressive. I'd only feel the results to be significant if they have 90%+
Why do you think that? Presumably, if it exists, psi would be a relatively weak effect (I don't think it exists). Most 'legitimate' effects in social science are small. There is so much individual variation, and there are so many other effects involved. That doesn't mean the effect itself doesn't exist. A 100 million trials with 32% accuracy (if the study is preregistered and conducted correctly and ethically) would be extremely convincing evidence for the existence of psi, in my view.

The problem is more, as j58 notes correctly notes, is that p-values prove little in the context of p-hacking and file drawer effects.