machine learning

  1. S

    PyCM : New statistical analysis library for post classification in Python

    A classifier is expected to face with many datasets with different characteristics such as being unbalanced. Besides, their missions are different, for example, categorizing data into just two classes or more than two. There are many different parameters for evaluating the performance of a...
  2. L

    Estimating Likelihood of NFL Game Outcomes

    Hey all, similar to last year, I'm passing along the tracker for the top 4 statistical model trackers for the season. Good news is that they are all at 100% so far...
  3. L

    Don't understand the meaning of ''training a dataset''

    Hi, I have a set of data. Those data are based on data mining from my website. I have the number of users per month who go to my website ( X ) , and the time they spent on the website on each webpage ( y ) . Now with this dataset as an example, could you...
  4. M

    Calculate type 2 error using one data file

    I am a bit confused on how to calculate type 2 error to check whether the sample data I am using is sufficient. I have a data file that I have used to build a machine learning model. This data file consists of 500 entries describing information about entities and the funding these entities have...
  5. S

    Probability of a subjective event based on historical subjective data . Forecasting

    Lately i developed an idea of understanding the behavior of kids in their childhood. I was wondering how kids are molded into different adults in no time. So i created a problem statement based on my ideology and hoped to solve using applications of mathematics. Coming from a non engineering and...
  6. M

    Combining data from different sources

    Hi, I have data from two very different sources (bird counts from ships and bird counts from the shore) which aim to estimate the same population/outcome. Both data have different covariates/predictors and thus it is difficult to combine both data in the same regression model. Does...
  7. C

    Basic questions from newbie

    Hi, I am currently learning ML algorithms and implementing in R. I have a couple basic questions. 1.)Is dimensionality reduction same as feature selection? I know that in R specifying importance=T parameter in randomForest function gives you the important features based on info.gain.I was...
  8. V

    mathematical model to build a ranking/ scoring system

    I want to rank a set of sellers. Each seller is defined by parameters var1,var2,var3,var4...var20. I want to score each of the sellers. Currently I am calculating score by assigning weights on these parameters(Say 10% to var1, 20 % to var2 and so on), and these weights are determined based on...
  9. N

    Identifying numeric algorithm for data analysis

    I want to Study and analyzing of algorithms and make predictions on key data for different sports. There are 3 #'s. 1. the screens predicted number 2. our predicted number 3. the outcome number Prediction works like, #2 predicts the right side of #1 with #3 at a rate of 57% or higher...
  10. K

    Clustering heterogenous groups based on similarity of heterogeneity

    Disclaimer: I am not a stats major, and would love it if people shred my question to bits if it contains any obvious logical flaws. I am not a native English speaker, but I try my best to be concise. The reason why I am writing here is to get to the proper statistical lingo/jargon to better...
  11. Y

    Statistical/ML models when observations have different amounts of input

    Let's say we're predicting an employee's performance review score for the following year based on that emplyee's metrics from each previous year of his/her employment. We might have these training observations below. Note that "2014i" means "that employee's set of input values from 2014", which...
  12. C

    Price Prediction

    I have a large set of search data from a particular website. A sample data set is attached here. Data set includes nearly 11,000 rows. What I want to do is to predict the price. I want to predict the price for a particular holiday id, particular Inhouse rating, particular star rating...
  13. P

    Multiple Polynomial Regression in R

    Hi all, I need to do predict the values of some variables from a set of predictors. The variables I need to predict are called Y1 and Y2 belonging to Y. The predictors I have are X0, ..., X400 belonging to X. I tried to predict Y from X with several techinques (linear regression, lasso...
  14. Z

    precision of two independent classifiers

    Consider two classifiers A and B giving binary labels to a big set of candidates, say, we have a million cats and a million dogs for labeling. Assume the two classifiers give independent predictions. Now if classifier A gives a list (500,000) of candidates that are most likely to be dogs (top...
  15. S

    Career change from signal processing to statistics

    Hope some of you can offer some advice here. I have a Ph.D. in statistical signal processing (part of electrical engineering) and I've worked for almost 15 years in the defense industry on digital communication systems. I'm currently working from home for a small company and the work is sort of...
  16. W

    Statistical Significance of a learning Model

    I built a learning model (for classification) based on a Random Forest classifier and i am asked to assess the statistical significance of its performances. Up to now, i trained and tested it on two different datasets A and B, respectively. What kind of test can i use?
  17. E

    finite Mixture of exponential in matlab

    Hi all can any body here help me with ((finite Mixture of exponential in matlab)) I want to simulate it in matlab or R for any given data ?
  18. G

    [Matlab] - Stratified Sampling of Multidimentional Data

    I want to divide a corpus into training & testing sets in a stratified fashion. The observation data points are arranged in a Matrix A as A=[16,3,0;12,6,4;19,2,1;.........;17,0,2;13,3,2] Each column of the matrix represent a distinct feature. In Matlab, the cvpartition(A,'holdout',p)...
  19. C

    Machine learning techniques for biomedical prediction

    Apologies for cross posting, I have a general question about why ML is not used more in translational research. I know that some/most bioinformaticians are aware of these approaches. Having now done some serious reading and got a bit of practice with WEKA, the relevant packages in R...
  20. I

    Advice on distance learning and grad schools

    Hello All, I would like to pursue further education in machine learning/ statistics. The most probable option for me would be to earn a distance learning degree. I have a background in mechanical and chemical engineering. For my masters degree I studied non-linear regression and a few...