I am currently doing Master in data science. I came across the function PDF probability density function which is used to find probability(range) of a continuous random variable.

The PDF probability density function is plotted against probability density in y axis and Random variable in x axis.

I am not able to understand how to convert an experiments observation in terms of probability density so that it can be plotted against the random variable in the PDF plot...

probability density]]>

The first suggests that when you say you've 'drawn a random variable from a population', your variable represents a single member of that population, randomly chosen. The second, and other definitions of the CLT or E[X] for...

Can a Random Variable be a Sample or just a Single Value?]]>

I recently took an MBA class in Regression and fell in love. Was wondering if I could get some help in creating a regression analysis for this stadium trip I want to plan next year.

Goal is to go to every baseball stadium in the US and watch one game per day then travel to the next city. I will have to add in distance between each city/stadium, the number of miles can be adjusted depending on how much I want to drive on the trip and creating the least amount of driving for the...

Regression Analysis - Stadium Trip - Least Miles]]>

But I’m struggling to understanding why those two things aren’t interchangeable? Of course with alpha this is decided when...

p-value and alpha confusion]]>

Maybe this is stupid, but when we chose a hotel for a holiday on TripAdvisor and Bucking.com, I had an idea about how to compare two hotels by their rating(despite the fact that there is already a general rating). For example, TripAdvisor: Grades are divided into 5 categories (Ordinal), from best to worst. And there is a general rating from one to five, taking into account the assessment in each category and the number of people who gave the assessment. For example, in hotel A...

Which hotel is better?]]>

Based on the forecast turnover, I am supposed to predict various activities (such as the number of manually entered orders, number of phone calls etc. (17 activities in total) in a company.My first thought was to do a linear regression per activity to be predicted. Means to conclude from the forecast sales for month x to the number of activities x in this month.Unfortunately, that doesn't take into account how the activities interact with each other. Then I thought about...

Forecast Model]]>

Scenario: 3 treatments (Solutions 1, 2 and 3) with 4 levels each (Red, green, orange and blue). This creates 12 Petri dishes, each with a different treatment. Numerator degrees of freedom (3-1)*(4-1) = 6. In each dish we are measuring how much dye is absorbed in wooden pellets. There are 50 small pellets...

Degrees of Freedom]]>

Broken Social Scene - Anthems For A Seventeen-Year-Old-Girl]]>

Bonett, D. G., & Wright, T. A. (2000). Sample size requirements for estimating Pearson, Kendall and Spearman correlations.

Would appreciate someone helping me run the formula, not sure the mean, variance and confidence interval since its ranked data.

Thanks]]>

I got a case where we need to do regression on previous energy consumption data and we predict our future consumption.

-What we do is, We develop some energy saving products and prove that by using those products, we reduced the Energy consumption(carbon emissions) for the upcoming years and we claim certificates (VEECs).

-For this two choose two periods baseline and Operating. Baseline is the period where our products are not installed and operating is the period...

Improving R2 score in Regression model]]>

The IV is the number of distractions and the DV is the accuracy of their scores

Participants will see an image and answer a question about the image they saw. Some of the images will have distractions.

All the participants will be exposed to every condition. There are 4 conditions with a differing number...

Which statistical test should I use?]]>

(Assume paper 1/2/3 are equally important, but how about if they have different weights? )]]>

Using online calculators or the formula behind these calculators where each cells has say 1000 names, a standard deviation of 0.5, calculated at 95% confidence and 5% margin of error then my sample size appears to be at least 278.

Simple Sample Size Calcs]]>

I am new to IRT and I have read 100s of papers still could not wrap around my head with

My question is suppose I have response data of 20,000 students on 10 items, now I want to estimate the item difficulty for those items.

To do so I have two options, either I can use CTT to calculate the p values(which is difficulty) or fit an IRT model to estimate the item difficulty...

Item Response theory In variance]]>

I am new to regression analysis.

My problem:

Scenario, there are 3 sets of workers (T1, T2 and T3) that input into a project. Their data is measured in minutes. The project has an overarching timeframe in which it must be met, which is measured in days (OT). The threshold is 20 days.

All three sets of workers need to finish their jobs before project completion. So if one runs over time, the project is over time.

Here are my basic questions....

- What are IV and DV? I thought...

Regression Analysis]]>

I am trying to calculate t -statistics (formula below). I am not sure if I need to use a one-tail or two-tail test. I have attached an excel sheet with the calculation (i used t.inv.2t - line q25).

The issue is; I am not getting the expected value which is 0.113976896 (the value I am getting is 0.131208542287959).

]]>

average should be located to minimise the fraction nonconforming. What would

the value of the fraction nonconforming be under these conditions? (data are normally distributed)

Variance = 77.92

Upper Limit = 20

Lower Limit = 0

\(minize P[ (20-\bar{x})/77.92 < Z < (0-\bar{x})/77.92 ] = minimize( P[Z> (20-\bar{x})/77.92] + P[Z < (0-\bar{x})/77.92 ]) = ...???\)

]]>

I work in Compliance for a mortgage company, and have no real background in statistics. My company closes loans and sells them to other investors. Between the time that we close a loan and an investor buys a loan there are issues with the loans (called conditions) that have to be resolved before investors will buy the loans.

Each day that passes without the investor buying a loan there is a cost (usually substantial) that occurs, calculated as Expected Revenue (minus) Actual...

Estimate lost income per mortgage condition]]>

Thanks]]>

Choosing the appropriate confidence interval for an odds ratio]]>

for my master thesis, I'm analyzing a bunch of patient data. I have very limited statistical experience hence the question here.

The data set includes both categorical variables and continuous variables. the continuous variables don't seem to be normally distributed. additionally, these measures from the continuous variables are from a patient group and there is no data available on those specific variables in a healthy control group.

What would be the best way to analyze...

What tests to use for analyzing data that is categorical and continuous]]>

Does centering at the upper level (group level) actually change the slope not just the intercept. I know it only changes the intercept at level 1.

There is also disagreement if you...

Multilevel Models]]>

We work for an organization controlling for unobserved heterogeneity (unobserved variables) through fixed effect models who do not use data at different points in time. Other than these facts and they...

Fixed effect regression that is not panel data]]>

I'm currently in the process of carrying out a systematic review and I've gathered cost estimates from studies for a specific type of treatment (with two different approaches), I am interested in finding out if these costs have decreased over time.

I have 40 cost estimates for the one treatment approach and 12 cost estimates for the other treatment approach, spanning from 2012 - 2020. When I plot a simple scatter graph (y) Cost vs (x) Year, with a line of best fit I can see that...

What statistical test should I use?]]>

Normality: The sense I get is few concern themselves with this any more with large data sets. That suggests not reviewing it.

Heteroskedasticity. I am unclear what the importance of this is anymore. Some suggest just using White...

Importance of regression assumptions]]>

sum(y_i*theta_i), but I'm confused as to why c(y_i,phi) is not included, as this is also a function of the range.

Basically, I am wondering why the orange squared part is not part of the kernel along with the pink square. Can someone help, please? Thank you.

]]>

I am looking at the impact of an intervention on students' knowledge scores (continuous var).

- My study has a control group (50 schools) and an intervention group (another 50 schools) which are randomly selected (cluster RCT) from 6 countries. Knowledge scores are reported at baseline and at post-intervention for both intervention and control groups.

- Here is my model (standard linear regression):

+ Dependent var: post-test score

+ Key independent var: intervention (1) and...

How do I perform generalized estimating equation for clustered data in SPSS?]]>

I generated a random-effects logistic regression model and included within-person predictors by person-mean-centering.

Not all individuals in my dataset have multiple data points, i.e. the individuals gave interviews, and while a large number of them did multiple interviews, many only did one interview.

My research question is whether interviewer characteristics affect how interviewers measure respondent characteristics. To answer the question, I analyze the within-person effects of...

Random-effects regression with within-person predictors]]>

Dummy variables proc genmod]]>

I'm trying to do some stats on a small dataset but I'm a bit of a novice and could use some expert advice. I basically have 5 samples each from 6 different participants, which have been stored under different conditions before testing. I am comparing to look at the impact storage condition has on the eventual test result. So for each sample I have a 'control', tested on the day it was taken, one stored for 7 days at room temp, one stored for 7 days at 37C, one stored for 14 days at...

Comparing groups - using ANOVA but unsure on various points, help greatly appreciated!]]>

I have a questionnaire that looks at peoples willingness, motivators and barriers to using anxiety-focused apps. For each section the participants are shown a list of statements and are asked to what extent they agree with it (Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree).

From all of the results I have been able to create a summary table - below is an example of this summary table for the Willingness section using my pilot data (please ignore spelling mistakes, this...

Ordering Results from Likert Questions]]>

I've been a bit stuck lately on which test to choose or to implement for an assignment. Little bit of context : from a dataframe containing answers of a survey from several hundreds of thousand of people, I needed to ask a scientific question then analyse the data and make inference from it. Since the data range from 1972 to 2012 with a lot of informations on the participant, I choose to analyze the difference of opinion on abortion through the years, and then based on either...

Comparison of proportions with several variables]]>

I'm looking for a method of doing correlation test on these two measures:

Time to first fixation - time measures of finding an web object

Likert-type scales of satisfaction of ease to find.

It is inspired by this:

Can anybody help how I would be able to that (in SPSS)?

I'm not really an educated statistician. I've read something about the likert data has to be ordinal? - not sure I...

Correlation test on likert type scales and time measures]]>

I'm really struggling to interpret some of my results. I am exploring whether lie detection abilities can predict victimisation in autistic vs. neurotypical samples.

Firstly, I conducted a correlation analysis between the independent variables to check for multicollinearity. It suggested that the variables correlate differently between the diagnostic groups. In the autistic group, I found no correlations between the variables while in the neurotypical group, some of the variables...

Interpretation help]]>

I'm new to statistcs really and struggling a little to get my head around some aspects of the Johansesn test for cointegration.

I'm looking at the eigenvectors specifically, there are a number of columns, its my understanding that the ratios in the first colum result in the greatest conintegration, the second column the second best etc etc.

My question is how do I know which 'asset' to apply the hedge ratios to? I'm assuming each column shows the ratios for a different combination of...

Johansen cointegration test and eigen vectors]]>

But then I thought this. Say high ages lowers income (the DV) and regression shows this. Say we develop a new program to deal with age (we give them new training that leads them to be more successful). Then age...

Control variables]]>

This is what SAS's senior statistician said.

Russell,

As noted in replies to your post in the Statistical Procedures Community, the model with...

Linear Probability Model]]>

In practice, how do you draw a stratified sample with several stratification variables? So far, I am using Excel and it takes a lot of time. Do you use a specific software?

Thank you in advance!]]>

a) allows missing data AND

b) provides p-values?

Thanks a lot!]]>

I am trying to find out how to do a non-parametric Mann-Kendall trend test to detect monotonic trends.

here is some examples

Years: 2014, 2015,2016, 2017

X events: 43 (44.8%), 68 (42.5%), 55 (40.7%), 16 (19.8)

Y events: 51(53.1%), 87 (54.4%), 75 (55.6%), 64 (79.0%)

Z events: 1 (1%), 4 (2.5%), 4 (3.0%), 0

K events: 0, 1 (0.6%), 0, 1 (1.2%)

Thanks]]>

Does anyone know how to do such a test. I understand MAR is tied to being associated with the predictors and the MNAR with Y. But not how to do a formal test of this.]]>

Impact are the regression slopes for dummy variables.

I should say that the excluded reference group here is not a good idea to me, they are less than 16 of which we have extremely few and most likely they earn very little. I can not change it, it was decided by the federal government.

That said I don't see how every dummy variable can be positive. Some have to earn less than others. Is...

Interpreting dummy variables.]]>

I think of Fisher's approach like this:

We choose a test statistic whose distribution is calculated under H0. H0 being a simple hypothesis of preference

We break down the distribution according to significance thresholds

The significance thresholds are defined by what we consider to be an extreme result, that is to say a result which would happen very little and therefore would put us in doubt on the veracity of H0.

We calculate the p-value which corresponds to the probability...

Are Fisher's tests of significance mathematically correct?]]>