Search results

  1. rogojel

    Simulating a logistic regressio - scary results

    Hi, I tried to work out the necessary sample size for a logistic regression by simulations and got some scary results. If anyone could check the code below, it would be a great help. I simulate a logistic regression with two normalized variables, one having a fixed odds ratio of 1.4 the other a...
  2. rogojel

    Support vector machines - what's the point?

    hi, I just finished the chapter on SVMs from the Statistical Learning book of Hastie and Tibshirani. Their focus is to use the SVMs for classification - and they also show that the SVMs are equivalent to the logistic regression with nonlinear predictors (there is even an impressive exercise to...
  3. rogojel

    testing the bootstrap

    hi, below is an experiment I just did testing whether using a bootstrap I could get better results as with simple repetead sampling. Is there any error in the logic/code? It seems that bootstrapping would just amplify the sampling error without adding any value - what do I miss? This is...
  4. rogojel

    Bootstrap and hypothesis test

    Hi, I have to decide whether two groups have the same variance or not. My sample size is less then impressive, 13 data points per group. I ran Bartletts test and got a p-value of 0.09 - but I suspected that the relatively high value might be due to the small sample size. So, I ran a...
  5. rogojel

    Turning an R script into an exexutable

    hi, before R I used to be big fan if Perl- and there we had an option to turn a script into an exe file so that people could use it without having to install Perl. Is there any similar solution for R? regards
  6. rogojel

    p-values in regression trees

    Hi, I am analyzing some pretty hopeless datasets where the link between the DVs and IVs is quite weak. I observed however that when I take the two groups resulting from the first partition in the tree I can generallly get a nicely low p-value with a t-test . Is this some property of the trees I...
  7. rogojel

    Bookclub: ISLR

    Hi, I just pledged to work my way through this book (Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani). Anyone willing to join? Keeping the discipline would be more fun if we worked together. regards
  8. rogojel

    Pitching R in Pharma

    hi, I will pitch R to a pharma company on Monday. Their biggest fear is that they can not validate R, so they can not use it in any situation related to regulatory, like proving something statistically to the FDA , because they can not prove, that the packages they used in R are correct...
  9. rogojel

    Model vs. test

    Hi, we just had an interesting discussion about the chi-squared test, that could be generalized. I think the core of the discussion was whether one should prefer a simple statistical test or a statistical model - e.g. a chi-squared test vs. a logistic regression. Given that in practice one is...
  10. rogojel

    Physical interpretation of an AR(n) series

    Hi, I am learning TS analysis but my angle is probably diffent from the usual. Being in six sigma I need to reduce the variability of a process that I can prove is basically AR(3) which with 3 runs on the average per day means one day's runs influence the next day in the mathematical model...
  11. rogojel

    Interpretation of an AR model

    hi, looking at process times of a rather complex machine I found that an AR(2) model describes it quite well. Does it make sense to interpret this as an indication that the machine has a "memory" of about 2 runs ? E.g. insufficient cleaning between runs? thanks a lot!
  12. rogojel

    What does Python offer that R can't?

    Hi, I am looking at Coursera data anlysis specializations and most (basically all except John Hopkins) offer Python as the language of choice for their courses. I really like R but I started to wonder if I am missing something? I just see no advantages in Python for data analysis, that would...
  13. rogojel

    beating the rules at discrete sampling

    hi, I am looking at the question of how to sample rare occurances, like a rare defect in a manufacturing line. If I use the standard sample size formula the sample size goes into hundreds, which is unreasonable due to the long waiting times. My idea s to measure the time intervals between...
  14. rogojel

    Interpretation of a one - way ANOVA

    hi, I am working through a designed experiments texbook and I would like to see if I got one exercise right. The story is about measuring the effect of baking powder on the height of biscuits. The exoerimental factor is the amount of soda with, say, 3 levels - let us call it low amount, medium...
  15. rogojel

    Elegant way to do the inverse of aggregation

    hi, I have a data frame with several factor variables and one numeric variable that gives the number of times a row with the given combination of factor values was obseved. I would like to disaggregate (if there is a such a word) the data frame to get one where a factor combination is...
  16. rogojel

    Nested and crossed

    hi, I have the folowing design: Two paper suppliers, slips of paper and measurement points on each slip. For each supplier we take, say, 3 slips of paper and on each slip we define 6 measurement points. Each slip is painted (same paint, same machine, same operator) and gloss is measured in...
  17. rogojel

    bonferroni correction in multivariate regression

    hi, I just read something that made me think about this, the p-values we calculate in a multiple regression have no adjustment for multiple test ( like a Bonferroni correction), right? Does this mean tjat more independent variables I have the more probable it is that I get false positives ...
  18. rogojel

    Regular Sampling

    hi, I had the following question and I really wonder if I gave the right answer: Assume that you can measure a process once a day ( for the sake of an example assume sampling the quality of water from a stream). Does it make sense to sample at random time points or is it enough to sample...
  19. rogojel

    Multi- and Megavariate Data Analysis -

    hi, I am thinking about buying this book: I got the first chapters on Kindle and it looks really interesting but it...
  20. rogojel

    Nested ANOVA evaluation in R

    hi, I have an experiment to check factors that influence the results of a measurement which is done like this: Several pieces of substrate are selected (a kind of paper), on each substrate a left and a right side area is defined and in each area five measurement points going from top...