I have checked and the data is approximately normally distributed. I can use the t-distribution and t test in STATA. For the small sample size am I able to use the normal STATA function ttest groupa==groupb, level(90) as per the normal ttest or do I need to...

Small Sample Size Confidence intervals difference in mean]]>

Could an average US person with Covid more likely die from medical error than from Covid?]]>

that is, the ration between 0-1 are infinite, logarithmic, but after 1 to ∞ the ration is not logarithmic.

for example the distance on the graph between 0-1 is 1cm; so the ratio of 1:1 is 1cm into the graph.

the ratio of 10:1 is 9cm distant away from 1:1 on the graph, but the reverse is only ~0.7cm away from the 1:1,

this gives a skewed perspective,

this effect probably...

basic question: surely that his graph is visually misrepresenting the data?]]>

0.2 - small

0.5 - medium

0.8 - large

I know that Cohen's interpretation is only a rule of thumb, so I'm not sure if there is a "one" answer.

but what is the interpretation when d is between the values?

Is d=0.3 small? or medium?

Range interpretation

0-0.2 meaningless

0.2-0.5 small

0.5-0.8 medium

0.8 > large

(0.2+0.5)/2=0.35

(0.5+0.8)/2=0.65

Range interpretation

0-0.35 small

0.35-0.65 medium

0.65 > large]]>

whether it is true that if the distribution of quantitative variables in the group is normal, means with mean statistical deviation should be provided. If abnormal - the median with the highest and lowest values.]]>

I have 10 patients. We have some proprietary software that has used the data collected from a specific region X the hearts of each of the 10 patients and summarised it for us. Because of heterogeneity in patients, region X in one patient may have 40 separate...

Calculating a weighted mean/SD of x number of means/SDs?]]>

I was reviewing a regression analysis for a colleague the other day. I know that linear regression works fine with negative values for the Y variable. But then I noticed that 20% of the values for the response = 0 exactly. Because of this, I doubt that the model is correctly specified amongst other things. I've never worked with what I would call 'non-positive and zero-inflated continuous data'. Can anyone suggest an alternative? I thought about just leaving out the...

Linear Regression with Negative Response Variable]]>

I'm struggling to work out the right functions to use in compute variable to create a new variable. I hope you will be able to help me as I cannot process this in Excel as my data file is too large.

I am trying to create a new variable based on how another variable changes.

The data is related to how often a window open and closes throughout the day. As an example, I have the following data and I want to create the column in red as my new variable in SPSS...

Compute New Variable based on another variable]]>

Time invariant independent variable]]>

I have a data which represents some items indices (x axis) and the mean rating (y axis). I want to draw a trendline to understand the tendency of the data. Please look at the following figure. I used polyfit method in numpy, the r square is 0.05.

How can I interpret this? What else you suggest to understand the data more?

The confidence interval of the model, I used R confint(model,level=0.95):

2.5 % 97.5 %

(Intercept) 3.6288915640 3.7105907379

x...

data fitting]]>

Could someone explain to me in layman's terms the following:

I have complete data on a population size of ~37,000. I can show, across the entire population, that there are trends between certain variables and the amount of debt (e.g. age, length of tenure etc).

What I'm trying to do is to establish whether there is a statistically significant variation in the effect of these independent variables on debt between geographical location.

Partitioning the...

Sufficient sample size]]>

So what problems do I need to look out for, and what diagnostics should I use. Since the data will automatically be heteroscedastic I am not sure there is any point in testing for that.]]>

In order to analyze the Odds Ratio, the values that were...

Are the sum values 'No' and 'Not informed' in the Odds Ratio analysis correct? See the example.]]>

I'm hoping I can get some help through this forum as I am new to the world of statistics and data analysis. There's a wealth of videos online but it's better to be able to talk these things through with people.

Cheers]]>

F=13.2 P= 0.003

what is tested

reject H0?

conclusions:]]>

I´m searching for a magical symbol I guess... The issue is following:

I´m trying to assign certain string of text a specific value in new variable. Imagine a situation when I am searching that string of text in eg. people´s opinions on whatever and my task is to separate those ones who used a specific swear word, in this case it´s (let´s say) "pencil". That´s not problem though. The issue is (and that´s what´s important) that my task is to separate people who used various...

syntax if variable includes more less specific string of text - search 4 some magic]]>

"To explore the relationship between personal characteristics and employment outcome rates, a multiple regression was used. Since many of the personal characteristic variables were coded categorically, a general linear model was used to run the multiple regressions."

Is this just another way of saying multiple linear regression with dummy predictors?]]>

For my thesis I'm doing a factorial Mixed ANOVA in which I look at gender differences. As it's a mixed ANOVA, everyone goes through the same conditions. The problem however is that I have 10 men and 26 women in my sample. Therefore the groups that I compare when looking at gender, aren't equal. I was wondering if there was a specific test I could do to calculate how problematic this difference in gender ratio is and if there is a test I could do to make up for this discrepancy...

Mixed ANOVA Unequal Gender Ratio]]>

I'm super bad at stats and i rly need your help.

So here's the pb : i want to compare 2 different ingredients (Ingredient A/ ingredient B). Each ingredient is described by 4 independant quantitative variables.

I would like to know if these 2 ingredients are significantly different or not depending on these 4 variables.

How do i check if my data follows a normal distribution? Which test can i use when i have 4 factors ? I know how it works but with 4 factors i dont know which test use...

Which test should i use ?]]>

I am studying the effects of framing on food neophobia and the purchase likelihood of novel foods.

Currently, I have made a pre-test where I have 4 conditions and 1 control condition where they are pictures with messages framed a certain way (emotion-promotion, emotion-prevention, information-promotion, information-prevention, and a neutral frame). I expose participants to only 2 of these conditions, followed by

What statistical analysis and how?]]>

In a scientific study they say that the low energy diet mice ate 30% less calories but ate the same amount of protein as the high energy diet mice, but the protein intake was in the same proportions in the low and high energy diets so I don't see how it's possible to eat 30% less calories and have the same amount of protein because you either eat 30% less calories...

i don't understand: mice ate 30% less calories but ate the equivalent amount of protein]]>

My homework relates to the

- A disease has a 2% prevelance rate in country X
- A test for the disease has a true positive rate of 0.999 and a false positive rate of 0.01
- Country X has a population of 50 million people with the two largest cities having a population of 2 million and 1 million, respectively.

Exercise on conditional probability (Bayes' Theorem)]]>

How do I determine the minimum number of observations needed per category in categorical predictor variables?]]>

Dependent variable: 2 groups, continuers and dropouts

Independent variables: some continuous tests (Wisconsin, Stroop, IPO, EDEQ etc) and age level (4 levels), education level (4 levels).

Which test should I use?]]>

Dimension

Q1 | x | x | x | x |

Q2 | x | x | x | x |

Q3 | x | x | x | x |

Q4 | x | x | x | x |

intervals

4 - x (not satisfied at all)

x - x

x - x

x - 16 (Totally satisfied)

how would you calculate the range?]]>

Ok I’m comparing the difference between a couple pediatric risk of mortality scores. I want to take the difference and see if telemedicine or telephone consults made a difference in risk of mortality.

2 of the 4 scores are so skewed we wanted to log transform them. So I log transformed them, then standardized them, then took the difference, then ran a regression. How do I know whether this is adequate? Or whether that’s too many transformations and I’m...

Strategy for diff indiff analysis]]>

My dependent variable is a 0/1 outcome (have health insurance or not) and my independent variables are age, sex, education, race dummy variables, immigration status, etc.

I want to see how these independent variables affect the dependent variable for each time period...

regression for repeated cross sectional data?]]>

Premise 1: A 'sampling distribution' of a statistic (e.g., a sample mean) is a piece of knowledge that tells us what we should expect a statistic to be (given some null hypothesis)

Premise 2: A bayesian 'prior distribution' is a piece of knowledge that we use to describe our degree...

Can a frequentist 'sampling distribution' be interpretted as a bayesian 'prior'?]]>

*The table in the image is just a piece of the original 324-line table.

]]>

Code:

```
library(smooth)
tsdatatr=ts(mydata$Spend,start=c(2014,12),frequency=12,end=c(2019,11))
tsdata=ts(mydata$Spend,start=c(2014,12),frequency=12,end=(c(2020,11))) # a training data set to choose the best model
esmtr<-es(tsdatatr, model = "ZZZ")
esmtr # it will show the chosen model
```

library smooth]]>

I am currently working with monthly data and I try to calculate confidence intervals for the monthly average. I have data from from 2010 to 2019 and there seems to be some seasonality.

The statement I want to make, is: In december 2020, we expect a value of x which lies with a certainty of 95% between y and z.

For the expected value x, I use the average december values of 2010 to 2019.

For the confidence interval, I am not sure:

My initial guess was to use all months (Jan-Dec) to...

Calculating confidence intervals for monthly data]]>

I have an issue of machine learning/anomaly detection. Indeed, I have a variable Y and several other variables X. The purpose is to quantify the degree of abnormality of the data on Y but I have to take into account the values on the other variables (the relationship between Y and X).

Normally, an anomaly detection algorithm would find anomalies but on the whole data (Y + X), but in my case I want to zoom in on Y because it is a very important variable. If I wanted to quantity the...

How to determine the abnormality of a specific variable by taking into account all the other variables in the data?]]>

How can I

Thank you]]>

i started to learn simple and multiple regression, and there is one thing i can't understand.

i used data frame with values of hindrance, inhibition, and negative effect.

when i predict the value of negative effect by simple regression, using inhibition alone, i get its coefficient.

but if i try multiple regression - predicting negative by both inhibition and hindrance, the coefficient of inhibition suddenly changed.

(from-> -10 to -> -12)

i thought the coefficient shows the change it...

difference in value of same coefficient.]]>

can you help me answering if there is a solution for this scenario or not? and if there is a solution can you explain why there is one? and what is different to a usual sequence of coin flipps.

OK here we go:

imagine a scenario:

you do Coin flipps:

- outcome is for each 50%; fair coin toss; H=Head; T=Tails

- in this scenario we determinate that formation of "HHT" will appear...

how to weight a coin toss]]>

According to this link,

https://data.library.virginia.edu/interpreting-log-transformations-in-a-linear-model/,

MLR Models: Log Transformation Interpretations and Collinearity]]>

The basic statistical model framework under WIOA is the fixed effect model specified as follows (Sutter [7]): Consider the linear model for our data observations grouped into states j = 1, ..., j, for each quarterly time period t = 1, ..., t: yjt = aj + βxjt + ε jt; ε ijt ~ N(0, σ y 2 ). (3) The effect of x on y, denoted β, is the primary quantity of interest. After accounting for the...

"Fixed effect" regression]]>

I've been working on a Stats problem for days and think I've gone wrong somewhere. Something in me says this problem is very straightforward, and I am over complicating it. Maybe someone can have a read of what I've done, and let me know if I'm on the right tracks.

(I am using SPSS)

I have 3 variables -

The question being asked is:

On the right tracks? One-way ANOVA]]>

I know that the chi squared test is an option, but this test does not tell me if the significant difference lies between ASA 1 and ASA 2 or between ASA 2 and ASA 3,...

Is there a statistical test I can use to know between which independent categories the significant difference is situated?]]>

I'm having trouble working out/finding an apprioriate statisical anayslsis to use. Basically, I tested a number of tadpoles in a maze, then tested the same tadpoles, as frogs, on the same maze.

As I don't care what the actual times are (eg; average time might be longer for tadpoles than frogs), I'm having trouble finding an apprioriate statisical anayslsis to...

Statisical test for whether better individual performance in set 1 means better performance in set 2?]]>

A pair is independent. Another pair dependent. another pair mutually exclusive.

i have crated joint probability distribution tables for

1st independent pair

2nd dependent pair

3rd disjoint pair.

i'm stuck at how to create joint probability distribution for all three X,Y,Z?]]>

I am trying to predict the type of parenting style of an individual using a multinomial regression model. In the model, I include an interaction between two nominal variables, but the output is difficult to interpret due to the many categories in both variables.

My interaction is between the country of origin (v1) and the migrant status (v2). The country of origin has 10 categories, while the migrant status has 3 categories (native, first generations, second generations).

Would...

Multinomial regression - Interaction with too many categories]]>

λ is a constant.

- Am I correct to understand that η represents the minimum salary that is taken into account, and that the probability for every single salary is in relation to it? i.e the bigger the minimum salary, the smaller the probability becomes for every salary in the function.
- How can I calculate an...

An estimator for a Cumulative distribution function]]>