- Confidence intervals- Should they only be expressed as decimal points, or can they be percentages i.e. '
*The overall complication rate was 36% (n=14; CI=0.21-0.53)'*or '*The overall complication rate was 36% (CI=21%-53%)'.* - Is it ok to present a statistically significant p value from an ANOVA without performing a post hoc test to assess which means were different? My sample size is too small to perform a post hoc...

So far I've just used a simple bivariate model. Candidate ballot position (1st to 42nd) is the independent variable and number of votes as a dependent variable. Is it acceptable to use an ordinal as the independent variable in this way?

Of course neither variable is normally distributed. For the...

Relationship between name order and votes in an election]]>

I think Telegram is at the top of my list so far but would love to hear what others here think and what you use.]]>

I have been investigating the change of weights of fourteen mice, seven are part of the test cohort to which I have been applying a chemotherapeutic drug, and the other seven are the control. I have hence collected the weight of each mouse at day 0, day 7, day 14, day 21 and day 28.

Now this is the part I am unsure about: I calculated the difference in weight between day 7 and 0, day 14 and 7, day 21 and 14 and day 28 and 21, then calculated the mean of the differences. I did this...

Mice weights]]>

I have identified 22 studies which meet my search criteria and I am in the process of assessing them. I would like to carry out a meta-analysis but I have not done one before and I'm not sure if it will be possible.

The studies I have might just be too diverse in terms of the types of cold water...

When is meta-analysis appropriate?]]>

I am interested in calculating the probability of fraud in the 2022 election in Arizona and the 2020 presidential election. In both those election one candidate got the majority of the vote on election day and then the other candidate got the majority of the vote afterwards. A pew poll asked both Democrats and Republicans if they voted on election day or on a different day. The simplest way to approach this is to first calculate the probability without taking into...

probability of election fraud]]>

It's prpbably a dumb question, but ist the first time that I operate with "wild" data that was not provided to me by my universtiy. And neither the model nor any of its indicators show any significance. And until now we only went through the assumptions to see if the model overestimates anything. Sorry for the stupid question, have a nice day!]]>

So I have read several tutorials to skewness and kurtosis and I feel myself getting denser.

Could anybody tell me if I am interpreting this correctly? I think I don’t have sufficient evidence to say that my residuals are not normally distributed. Am I correct? I somehow cannot find a benchmark for the interpretation. Thank you very much in advance!

Variable | Obs Pr(skewness) Pr(kurtosis) Adj chi2(2) Prob>chi2...

Am I reading this correctly?]]>

I'm new here and I hope you can help me

Sorry for my bad english, it's not my native language

Im kind of new to the program minitab and I was wondering if it can be used for the following problem (I'm pretty sure it can):

I have three groups (of players in this example):

name points --- name points --- name points

playerA 10 ----- playerD 12 ----- playerG 3

playerB 15 ----- playerE...

Find best combination]]>

I have got a panel data (see the following as an example). Stkcd represents unique company ID. Trddt represents dates. The data is sorted by Stkcd in an ascending order and by Trddt in an ascending order. The variable 'specialtreatment' is coded as 1 if a firm experiences an event at a specific date and coded as 0 if a firm does not experience the event at a specific date. Where a firm experiences an event for a period, I would like to extract the beginning date and ending date...

how to automatically extract beginning and ending dates when a firm experiences an event]]>

For my course, I am in the process of writing my research proposal. Now I have arrived at the bit where I need to set up my data analysis plan. Unfortunately, I am not a hero in statistics and even after going through all my statisctic summaries, I just can't figure it out.

I am using existing data collected by my supervisor in a previous study of higher education business administration students. This research was an experiment in three different subjects and it examined...

How can I test my hypothesis with these three variables? (Likertscale, randomisation)]]>

*Period*represents elapsed number of months*stateX*represents the number of times elements in the population reach a state of X- It's binary: either an element reaches X or not; X is a "dead state" in that an element can only hit X once
- I have data for the other states A-G for each element for each month that are "non-dead" and an element's movements across those other states are...

What are options in R for time-series forecasting and deriving a probability distribution?]]>

In the below...

Is the binomial distribution appropriate for this time-series data?]]>

Sample problem: 510 people applied to the Bachelor’s in Elementary Education program at a certain state college. Of those applicants, 57 were men. Find the 90% confidence interval of the true proportion of men who applied to the program.

Step 1: Read the question carefully and figure out the following variables:

α: subtract the given confidence interval from 1.

1 - .9 = .1

zα/2: divide α by 2, then look up that area in the z-table.

.1 / 2 = .0500

The closest...

CONFIDENCE INTERVAL FOR A POPULATION]]>

Suppose that an archer tries to hit a target 500 times with probability of successfully hitting the target being 70%. To calculate the probability of hitting the target exactly 300 times using the normal approximation, we use the first rule given above and calculate P(299.5<X<300.5) using the normal distribution.

n = 500

p = 0.7

x = 300

np(mean) = 350

S2 is variance - npq = 105

S = 10.24 SD

X1 > 299.5

X2< 300.5

z score = datapoint- mean/SDeviation

Z1 = (299.5 – 350 ) / 10.25 =...

Z Score confusion]]>

I need your help in choosing the statistical method.

I have a sample of patients who have had treatment. For some, the treatment induces an improvement, no change or regression. I would now like to determine from a multidue of variables, if there is a "typical profile" of patient to anticipate that would be improved following the treatment.

For more details on my research:

Population of patients with adolescent idioapthic scoliosis, having undergone 4 weeks of intensive...

Help with the statistics method]]>

I interpret the problem as asking for the probably that the sample mean of the two samples is <= 11. I know how to find the probably for one sample mean, but confused how to find it for two.]]>

I have some problems concerning missing values in time-series data.

I have 11 values for each subject, with values being hormone dosage at different time points (0, +40, +50, +55...). Some of these values are missing. I want to impute data and I have to use univariate, non equi-spaced, time-series imputation methods. I tried the "zoo" package, with na.approx() and na.spline() functions.

The na.approx() replaces NA by linear interpolation while na.spline() replaces NA by cubic...

How to deal with missing values at the end of a time-serie?]]>

I am currently researching the effect on the market of the decision to disclose synergies in press releases in m&a. I also have done textual analysis on 3 different levels of the press release to better explain the disclosure decision.

Will the correct way to form this regression be:

y = a + bx_1 + bx_2 + bx_3 + bx_4 + bx_5 + bx_2*x_3 + bx_2*x_4 + bx_2*x_5 + e

Where:

x_1 is my control variables

x_2 is a dummy of the decision to disclose

x_3 to x_5 are the three different...

Interactionconfusion]]>

Hi, above you can find the task problem I am trying to solve.

Here's data2:

Below is my answer.

h1 and h2 are correct.

However, t and pval are incorrect. I don't know why...

T-procedure, Unknown Equal Variances (Should be a simple task)]]>

I would like to perform a case-cohort study, starting from a cohort of about 2,800 individuals. Of those, 50 develop the outcome of interest during follow-up; in selecting the subcohort I would like to prioritize the presence of a covariate Z that will be needed for further effect modification analysis. Normally, with a case-cohort design I would proceed by creating the subcohort through random sampling (stratified on the covariate Z) and then add the cases that were not included...

Case cohort sampling]]>

In Life

Some mock me for doing statistics

Some loathe me and statistics

Some don’t understand what statistics are

Why is it that statistics

Put a calm smile on my face?

Because of statistics I can solve the deepest mysteries

Because of statistics I will not be lonely again, playing in the data

Because of statistics I can rearrange the stars in the skies above

(by Chinese statistician Wang Jiaowei [translated],

The...

Statistics Poetry]]>

Thank you]]>

I want to conduct a One-Way ANOVA to investigate if the 20% highest scoring students on the test have a higher level of well-being than the other students (the 80%).

First I have to create two groups. I want to compute a mean score of the subtests, but I want to control it for students' age (because age influences...

How do you compute a new variable, controlled for age?]]>

SDQ Mean Scores Whole group

17.48 (n =413) 14.83***]]>

I am trying to work out the probabilities of a dice system for a game I am working on.

I am rolling six sided dice (D6), and scoring a success on a 1, 2, or 3. If I roll a '1', my dice explodes - allowing me to roll a 4 sided dice (again scoring a success on a 1, 2, or 3). My D4 will not explode if I roll additional '1s'.

My working out has been: I have a 50% chance of scoring a success on the D6, and a 16.7% (rounding...

Dice Probability - Exploding dice]]>

In a study of age-related cancer progression, I investigate patient's ages in different cancer stages (stage 1-4). Cancer stage is defined by growing tumor size in centimeters. Data looks something like this:

Patient number................ Stage 1 (<1cm).............Stage 2 (1-2cm)............Stage 3 (2-3cm).............Stage 4 (>3cm)

1.......................................35 years........................37 years........................43 years.........................44 years...

Disease progression analysis]]>

SPSS Issue]]>

Is it common practice to exclude outliers in the data before producing a Van Westendorp chart? If so, which technique would you advise - looking at the IQR, Grubbs test etc?

Many thanks in advance for any comments on this.]]>

I have a sample of 30 values from a production process that should be normally distributed. In fact, it shows a distribution heavy on the left. Any idea why?]]>

I am Mathieu Bossuyt, a last year dentistry student at the University of Ghent, Belgium. I have some difficulties finding the right statistical test for my master thesis and was wondering if you could maybe help me, would appreciate this a lot.

Study flow, showed above is used. The data collected at different time points is tongue posture (in sagittal and frontal view) and swallowing pattern. Following distinction is made...

What statistical test to use?]]>

I want to detect continuous changes in the environment based on events. I imagine these events should be coded dichotomously by date. I have come across two useful tools, lag correlation (using astsa package in R) and Point Bi-serial Correlation. Is there such thing as a point bi-serial lag correlation? I cannot find anything. The idea here is I want to detect lag correlations between dichotomous and continuous vars.

thanks!]]>

23.4586

23.4591

23.4566

23.4578

23.4580

23.4581

23.4589

23.4587

23.4587

23.4582

23.4597

23.4577

23.4581

23.4574

23.4579

23.4587

23.4568

23.4582

23.4573

23.4588

23.4560

23.4560

23.4596

23.4579

23.4560

23.4563

23.4579

23.4583

23.4583...

How many observations should i take to get the accuracy of arithmetic mean ̅≤ 0.1 milimetres?]]>

Recently we had to write up a mock research proposal for uni. For my paper, I was trying to measure and compare prosocial behaviours between two groups of teenagers utilising an economic game. Essentially, in each game, the player had two choices, pro-social vs. selfish choice.

Under the ‘participant’ section I had suggest 95 participants based on this calculator:

https://select-statistics.co.uk/calculators/sample-size-calculator-two-proportions/

Here is the screenshot...

Binomial distribution and power in a research proposal]]>

I have assessed many manuscripts submitted to a medical journal. I see that three replications are used in ANOVA commonly.

OK, I know that the analyses are expensive in terms of time and/or money. However, I am surprised and believe that the results are unreliable. The problem refers to the assumptions: normal data distribution in the compared groups and variance homogeneity. When n=3, the power of the test used in verifying those assumptions (Shapiro-Wilk's, Levene's etc.) is close...

Three replications in ANOVA]]>

Broken Social Scene - Anthems For A Seventeen-Year-Old-Girl]]>

I'm trying to figure out what the best statistical test is for analyzing and distinguishing between following treatment groups: A, B and C.

I have endpoint: Incidence of hospitalization - total and by reasons (non-cardiac,cardiac,Heart Failure (HF))

Can I use simple chi-squared test or Fisher exact test to compare treatment groups by each reason (if I will be considered each reason as independent event)?

Note that one patient may have several reasons of hospitalisation...

Which statistical test should be used to compare categories]]>

As a result of suggestions I received here, I have made good progress in my weighted rating project. The suggestion to use z scores was a game changer. I now have a way to transform almost any kind of rating to common scale. That allows me to combine them effectively.

I think I have just a couple of questions.

Here's a table with some sample ratings. There are 8 "products"...

How to assign a Z Score for missing values]]>

a) assuming a normal distribution of the data, since the same specimen was subjected to the 2 processes (demineralization and remineralization), is it mandatory to use a 2-way repeated measure ANOVA or is there any exception condition?

b) assuming a non-normal distribution of the...

which statistic parametric test should I use?]]>

I have to write an essay at the moment and unfortunately I am stuck on a question, maybe you can help me with it.

The question is as follows: "Describe in your own words how an approximation method that determines a confidence interval for the mean of a population from a sample can be derived from the urn model. Also state a necessary requirement."

Unfortunately, I can't find any literature for this question. I believe this question is not too complicated, but unfortunately I...

Essay Question]]>

I am currently planning a research project for my university dissertation but I am not clear on potential tests to run let me give you a brief overview.

First I will take 4 scenarios with 10 items measured on a Likert scale for each item (question) so from my understanding they would be the IV

the goal is to measure the each scenario against 1 scale as the DV.

From my understanding (with no data at present) a potential test to run would be in-between subjects ANOVA.

Am I correct or...

Help with what test to run]]>

let's say we have 20 air quality sensors that crisscross a city. Resulting in N observations (each sensor has n_k observations), each observation z is linked to the coordinates where it was collected.

first, I use a gaussian process to model my data: I assume that the hidden process Y (which represents the dispersion of the pollutants) follows a gaussian process with an exponential function as...

Mixed models and geostatistics : interpretation]]>

I want to be sure I am using the correct statistics for pre vs post-test data. Any help is greatly appreciated. I need to know if the % correct post is statistically significant from the % correct pre and the effect size.

Example: Multiple choice question with 3 response (A,B &C). Answer B is the correct response. Here is the data that I have:

Number of responses to each answer choice in the pre-test (N=60)

A- 19

C- 17

Number of responses to each answer choice in the...

Simple Stats for Pre vs Post Sample Comparison]]>

New member here..

I received a request from a close friend about Dr Taylor transformation.... I only know Johnson Transformation from Minitab so i did a search but nothing came up about Dr Taylor method..

Do we even have Dr Taylor method for transformation for non-normal data???

Thanks for your help.]]>

For a research (prospective analytical study of a cohort of patients with and without a disease state. Trying to looking at prediction of a disease state with the use of few clinical variables, some of which are categorical, while few are continuous. The cut off point for the continuous variables has been found by AUC. So now there are twelve categorical variables which are been checked to predict the disease state.

On univariate logisitic analysis eight of these variables have been...

Multivariate Logistic regression]]>

I have a directed graph represented as a joint distribution table between three binary variables;

a b c p(a,b,c)

0 0 0 0.192

0 0 1 0.144

0 1 0 0.048

0 1 1 0.216

1 0 0 0.192

1 0 1 0.064

1 1 0 0.048

1 1 1 0.096

How do I prove the of ordered numbering of the nodes and that there are no such links going to a low-numbered node which also explains that there is no directed cycle?

My understanding - A topological sort of a DAG G is a linear ordering of all its vertices such that if G...

How do I solve a DAG from a given joint distribution function?]]>