I'd like to perform an ANOVA or similar with data whose residuals are nonnormal. My dependent variable is 1-7, but about 30% of my respondents provided 1's and the plurality of the rest provided 2's. I've tried log and square root transformations with no luck. Any advice on how to proceed?

Basically, the distribution looks like this (with each 1 representing an interval of 10 cases):

1

1

1

1

1

1

11

11

11

11 1

11111

1111111

1111111

Gratefully,

we have to groups of subjects, A and B, and N measures.

If we test all N measures for differences between A and B we obtain N p values (let's call this set P0).

The, we propose a way to extract subgroups from one of the two groups say A1, A2 (whose intersection is null and whose union is A).

Now, we have N p values for the test A1 vs B, and N p values for the test A2 vs B (let's call these sets P1 and P2 respectively).

What is the correct way to compare P0, P1, and P2?

Accuracy of Regression Results?

I have sets of data I want to analyze but I'm not sure how I go about it. I am looking at sprinting training/competition data and I am looking to do some sort of regression in order to get a basic predictor model for future races. Unfortunately, it's not as easy as doing a linear regression since the outcomes are not continuous but discrete (1st place, 2nd place, 3rd place...

Analyzing data with continuous variables + discrete outcomes

As a requirement for express entry I need to show the average balance of my bank account for the last six-month, problem is that I don’t know the formula, could you please help.

I was asked the other day if I could determine whether two mean scores from a survey were statistically significantly...

Survey Statistics

I want to spot if there is a correlation between the number of Aces made by a tennis player during the matches and the number of matches won

I have two variable: the Aces and the matches won

The I have the months: Genuary , february, April

How do I need to categorize these (the months)? as variables? As factors? Sorry I am new

I would like a simple example to help me because the GUI is not working good, I have to use the terminal...

correlations

Perhaps unwisely as a newcomer to stats, I have taken on a scale validation project. I’ve searched the literature and can’t find anything so this is the issue:

I’m developing a scale that (hypothetically) will identify, differentiate and measure two different constructs occurring in parental/child relationships.

I’m allowing my participants to redo the scale questions for each child that they have (using Qualtrics), so each parent may complete it once or twice or nine times depending...

Factor Analysis newbie seeks your help

We tried comparing the 2 data sets...

Assessing how good a model is

Here's my situation. I need to determine if the incidence of a categorical phenomenon that I have observed in a sample is significantly different from an accepted incidence of the phenomenon published in the literature. I would generally use a Chi square test in this situation but my reference incidence is reported as only a percentage...

Statistical significance of observed incidence

I am unsure with an interpretation of the results ofmy fixed-effects model.

I conduct a fixed effects analysis, similar to this one:

xtreg number_of_cigarettes = constant + beta0*year + beta1*treatment + X, fe

Now, the number of cigarettes is only stated for individuals who previously answered to smoke.

My beta1 gives me a negative coefficient of let's say -2.

Is it correct for me to say that the effect can be driven by two factors:

1. individuals smoke less (that one is...

Fixed effects interpretation - dropouts

I was thinking to use Google Cloude machine learning / deep learning to forecast the odds of an event. I would be thankful if you suggested to me a software for predicting models which it would be simple to use for someone who is not pratic on statistic and coding. I would be able to set my own PERSONAL variables give them a value and play around with these variables, without needing to code

Any suggest for a...

Best machine learning algo to have odds on a match?

This is the grahic:

Given these numbers:

2, 3, 4, 8, 10, 15, 22

and here the number of days (each number for the correspective day, so 2 *1day, 3*2day, 8* 4):

1 2 3 4 5 6 7

Calculate the Weighted mobile average and the Esponential moving average

I know the formula for calculating the Weighted mobile average, but I can't understand the formula for Esponential moving average:

EMA...

EMA and WMA computation

I have a problem related to statistics, and I hope that you can help me out. First, an attempt at a short version of my problem:

I have 2 data sets that consist of 3 sub-sets each that I want to compare with a methodology that takes the order of the data points into account.

Now, for the actual details.

I have multiple data sets of 3 sub data sets each, the latter of which sort of belong together. For any given set, data point 1 of sub data sets 1, 2, and 3 are based on the...

Comparing groups of data sets with the order of data points being taken into account

Now, we aim to do a couple of tests to check the quality of the survey, like:

- Basic validity of questions

- Ceiling effects

- Inter-rater reliability

- Do the items load properly, and are the 4-item scales reliable?

However, we keep running into the problem that people only...

Performance review

I have approximately 800 items, that were

independently

assigned into 3 categories (Buy, maybe, Don´t Buy), by

two reviewers.It looks like:

Item 1; Buy; Buy;

Item 2; Don´t Buy; Buy;

Item 3; maybe...

Assess Reviewer Performance

To assess the impact of the environmental gradients on samples or taxa, they should be projected orthogonally on the...

Canonical Correspondence Analysis: how to interpret results

I don't know many terms, and what it's super-difficult is also to learn R language and tools in R, how do they work.

I am mining data on tennis. I have a lot of data about a tennis player.

I want to find some correlations.

I target tennis players. I want to know if there is a correlation between a tennis player's...

Prediction models and correlation in R

I'm a psychology masters student in the UK. I failed GCSE maths and have dyscalculia, so stats is challenging for me, though weirdly enjoyable.

Two shock levels: .2ma and .8a.

Two paradigms with 4 levels: NRIA, NR1a, NRCFC, NRC1a

The NRIA and NRCFC groups experienced the .2ma while the NR1a and NRC1a got the .8ma shock.

I want to see if the shock level and paradigm matters for cFos cell quantification. If it helps, I normalized the values of all four groups to Naive controls (collapsed groups as naive values).

I'm thinking that I have 2IVs (shock level and paradigm which are controlled during the experiment)...

Please help, which ANOVA should I use?

SS=∑(x2)

how do I create the ssw and ssb with that data?

the data is: Mean group A: 6 (n=9) , Mean group B: 13 (n=9), Mean group C: 11 (n=9)

I have >100 genes of which are either altered (=1) or not altered (=0) in each group.

I would like to analyze whether the genes are altered (=1) more often than by chance within the groups (mutant and non-mutant).

I know if I had one or two genes, I could just use Fisher's or Chi square, but how does one run each one with so many independent variables? Thanks

as per title, I am trying to wrap my head around the issue I am experiencing.

I am working on a function out of my 'GmAMisc' package; the function is part of a new version which is currently under construction.

I am testing the new version, and I have hit 'install/reload', and all is ok. But there is a problem with one function: it does not work (i.e., it returns an error) when I try to run it for the first time; but if I copy/paste it in the R console, it works smoothly.

I know...

Function out of my package only works after being copied/pasted...Why?

I am doing a research and will study the how user think

about shopping online.

In my questionnaire, I have 3 questions - using Likert Scale

1 - Strongly Agree, 2 Agree, 3 Nuetral, 4 - Disagree, 5 - Strongly Disagree.

My questionnaire is as follow.

1. I find that Internet is secured for shopping

2. My general intention to shop online is high.

3. I find that shopping online is convenient.

The Hyphothesis that I want to test is

H1 - Users has confidence to...

One Sample T-Test with Likert Scale Questionnaire

In a Chi Square goodness of fit test, and also in a Chi Square test of independence, what are the criteria for when direction can/must be stated in the conclusion?

For example, "A boy is interested in whether cricket captains call heads and tails equally at the pre-game toss. A survey of 100 matches shows that heads was called 62 times and tails 38 times. What would you conclude of the captains' calling habits?"

Running a chi square goodness of fit test I have Chi-square(1, n=100)...

When is direction implied for a Chi Square test?

This is a difficult post... take your time to read please, and I am sorry if I have not explained myself very well

I would like to calculate the average frequency of an event in a tennis match: For example, let's say that Rodger Federer in the most of the tennis games he won, he won the game in this way: 15-0, then 15-15 then 30-15, then 40-15 . I would like to calculate (and report into a graphic ) the frequency of those kind of games and all statistical...

Statistic software to calculate this?

In city A. 30% of the families don't own a car, 45% own 1 car, 20% 2 cars and 5% 3 cars.

E(X)=1 , V(X)=0.7

A. What is the probabilty, that a parking yard with 34 parking spots will be enough to fit 36 families of city A?

B. A contractor plans to build a new house designed to fit 36 families. The municipality demands the...

Please help me with this problem

-one column of binary data (0 or 1)

-18 columns of variables which go roughly between -200 and 200

Aim:

To create a model which can predict the dependent variable based on the independent variables. In the future on data which is not available yet.

Note:

The model doesn't need to give a prediction (0 or 1) all the time. It's fine if the model would say "I don't know". That would be better than giving a wrong prediction on the dependent variable.

Does anybody have an idea of how...

Binary logistic regression

Hello,

If I have a sample of 137 individuals with three assessment results (first, 1YR follow-up and 2-Yr follow-up), I am assuming there is a way to calculate this for the group based on differences in two results for each individual (e.g., First to 1 YR score, First to 2 YR score).

Then the formula is Post-test result minus pre-test result divided by the Standard Error of the difference.

For significance, I assume I would use the threshold of t>=1.96 or t<=-1.96 for...

How to calculate Reliable Change Index using SPSS?

I have a matrix X that contains industrial data, where each column is a variable.

Since PCA is done by the correlation matrix, should I use:

. original X;

. X after Centering (subtracting each column of X by the mean of the respective column);

.X after Standardizing? (subtracting each column of X by the mean and dividing by the standard deviation);

I think autoscaling is better, but I dont know if it is a must or just a good practice

This is about my PhD thesis where I am trying to see the pattern of memory deficit among different types of hypothyroidism. I have used a between subject group design with 1 IV with four levels. Thus, I have 4 four groups which are:

Subclinical Hypothyroid group N= 14

Overt Hypothyroid group N= 15

Euthyroid group N = 12

Healthy Control Group N = 15

I am using multiple types of memory measures broadly speaking I have administered 1 test of verbal memory with 6 types of scores (DVs)...

When 'not' to use MANOVA?

I have a super basic question. I'm at the stage with my data analysis where I've been looking at the numbers so long I've started to see them as living things...and just want to check something before my mind goes completely!! I have a continuous covariate (among others) and a binomial outcome variable (1 = presence of knowledge, 0 = absence of knowledge). I have a significant effect and a z score of -6.55. I am correct in interpreting the minus as indicating that the less of the...

Logistic Regression help

I'm now working on the recommendation system to recommend football match lottery to the users. The users may choose a match to bet arbitrary amount of money, e.g. $2, $4, or even $20,000. I want to compare if two recommendation algorithms are different (or if one outperforms another), in terms of the averaged bet money per user. The problem is that the distribution of the bet money looks like power low distribution (long tailed). There are nearly a half of users who don't buy...

Choosing the proper statistical test for a lottery recommendation system

So...I'm not entirely sure if this is possible but thought I'd post here to ask. Additionally, I've never used R before (except for screwing around for an hour or two in Swirl) so please bear with me. I'm trying to run a repeated measures, zero-inflated negative binomial GLM. Unfortunately, it's not possible to do in JMP, which is what I typically use. The catch is that some of my factors change for each time interval. Let me try to explain everything which will hopefully clear...

Repeated measures, ZI negative binomial GLM?

In my experiment I collected data (dependent variable) from linear ccd camera every seconds during the period of 90 seconds. This procedure I have repeated five times. The design of the study requires comparison of the control and experimental group (each group size is set to 26), so I am in doubt whether to use ANOVA with repeated measures like 90x5 matrix, bootstrapping or some other test?

Many thanks for considering my question.
Andreja Vujanac

how to restructure a dataset for discrete time survival analysis?

I've been trying to manually calculate the sum of squares for 2 independent variables in an ANOVA table and can only seem to work out two total sum of squares exaplianed by the two independent vairables.

Is there any way that I can work out how many sum of squares is attributed to each independent variable?

Any help would be really appreciated

In june the same tennis A reaches the maximum odd of 4 every 6 matches he makes against players that usually reach the odd of 3 every 3 matches they make

(so things changed a little better for tennis A)

How can I calculate the probability that the tennis...

could you help me with this problem?

Possible dumb question

Exponential smoothing models

What statistical test is the best when the dependen variable is categorical with three levels [correct, partially correct and incorrect], and the independent variables are also categorical with three levels? One of the IV is a time, so I tested subjects in 3 time intervals. I assume this means I have repeated measures.

Thank you for any help you can give me.

I am investigating the influence of motivations on dependants such as clothes bought.

Also, what % of variance explained is bad, good and excellent (etc.)?

I can't show the data because it is confidencial, but I will explain you with an example.The data is collected by...

¿Any suggestions? GLM, HLM, MLM Problem