machine learning

  1. A

    Set Level Prediction in Tennis

    Hi all, hope someone can point me in the right direction to help me with the problem below! The Data Lets assume that the dataset I have is solely Set-Level win/loss data for every single professional tennis match in the past X years. PlayerA, PlayerB, Set#, playerA_win Nadal, Federer, 1, 1...
  2. Y

    Prediction of label by numerical variables

    Hi, I have the following dataset, and I would like to predict column A by Column B & C column A: 3 lables: a,b,c colum B: numrical values column C: numerical values How can I do it?
  3. H

    Trying to make sense of this Decision Trees question

    Hi everyone, I'm trying to figure out how the solution to this decision trees question from college makes sense but there's not much explanation and in the notes there are no similar examples either so I'm lost as to where to begin. I'll attach the question/solution to this post. Thanks!
  4. A

    AWS Machine Learning Specialty a Good Career Option

    Start your Preparation for AWS MLS-C01 and become AWS Certified Machine Learning - Specialty certified with Here you get online practice tests prepared and approved by AWS certified experts based on their own certification exam experience. Here, you also get the detailed and...
  5. A

    Comparing two survival curves

    Hi I have two survival curves/functions given by two sets of data points (t1, P(T>t1)), (t2, P(T>t2) ), ..., (tn, P(T>tn) ) (t1, P*(T>t1)), (t2, P*(T>t2) ), ..., (tn, P*(T>tn) ) where the time points are t1, t2, ..., tn and for each time point I have an estimated probability of survival...
  6. A

    z test and t test

    I understand that we go for t test in regression , after the data has been normally disributed using z distribution. Need to know why cant we use t - test in all the phases , i mean why is it required that t- test requires that the data is normally distributed.....why t-test can't handle this...
  7. N

    Which text similarity algorithm should I use to compare the context of Instagram hashtags?

    For a study I am comparing companies based on the posts written by their Instagram followers. I apply the following technique: Nike has 1.000.000 followers. 2000 random followers of Nike are selected and the posts created during the last 365 days by these profiles are obtained. The posts of...
  8. A

    confusion on outliers

    I am not able to distinguish the outliers - When to go with std. dev and When do we need to go with Median. My understanding on std. dev. is - if the data is away from mean by more than 2 std dev. we consider that as outlier. Similarly for Median, we say that any data that is not in-between q1...
  9. S

    PyCM : New statistical analysis library for post classification in Python

    A classifier is expected to face with many datasets with different characteristics such as being unbalanced. Besides, their missions are different, for example, categorizing data into just two classes or more than two. There are many different parameters for evaluating the performance of a...
  10. L

    Estimating Likelihood of NFL Game Outcomes

    Hey all, similar to last year, I'm passing along the tracker for the top 4 statistical model trackers for the season. Good news is that they are all at 100% so far...
  11. L

    Don't understand the meaning of ''training a dataset''

    Hi, I have a set of data. Those data are based on data mining from my website. I have the number of users per month who go to my website ( X ) , and the time they spent on the website on each webpage ( y ) . Now with this dataset as an example, could you...
  12. M

    Calculate type 2 error using one data file

    I am a bit confused on how to calculate type 2 error to check whether the sample data I am using is sufficient. I have a data file that I have used to build a machine learning model. This data file consists of 500 entries describing information about entities and the funding these entities have...
  13. S

    Probability of a subjective event based on historical subjective data . Forecasting

    Lately i developed an idea of understanding the behavior of kids in their childhood. I was wondering how kids are molded into different adults in no time. So i created a problem statement based on my ideology and hoped to solve using applications of mathematics. Coming from a non engineering and...
  14. M

    Combining data from different sources

    Hi, I have data from two very different sources (bird counts from ships and bird counts from the shore) which aim to estimate the same population/outcome. Both data have different covariates/predictors and thus it is difficult to combine both data in the same regression model. Does...
  15. C

    Basic questions from newbie

    Hi, I am currently learning ML algorithms and implementing in R. I have a couple basic questions. 1.)Is dimensionality reduction same as feature selection? I know that in R specifying importance=T parameter in randomForest function gives you the important features based on info.gain.I was...
  16. V

    mathematical model to build a ranking/ scoring system

    I want to rank a set of sellers. Each seller is defined by parameters var1,var2,var3,var4...var20. I want to score each of the sellers. Currently I am calculating score by assigning weights on these parameters(Say 10% to var1, 20 % to var2 and so on), and these weights are determined based on...
  17. N

    Identifying numeric algorithm for data analysis

    I want to Study and analyzing of algorithms and make predictions on key data for different sports. There are 3 #'s. 1. the screens predicted number 2. our predicted number 3. the outcome number Prediction works like, #2 predicts the right side of #1 with #3 at a rate of 57% or higher...
  18. K

    Clustering heterogenous groups based on similarity of heterogeneity

    Disclaimer: I am not a stats major, and would love it if people shred my question to bits if it contains any obvious logical flaws. I am not a native English speaker, but I try my best to be concise. The reason why I am writing here is to get to the proper statistical lingo/jargon to better...
  19. Y

    Statistical/ML models when observations have different amounts of input

    Let's say we're predicting an employee's performance review score for the following year based on that emplyee's metrics from each previous year of his/her employment. We might have these training observations below. Note that "2014i" means "that employee's set of input values from 2014", which...
  20. C

    Price Prediction

    I have a large set of search data from a particular website. A sample data set is attached here. Data set includes nearly 11,000 rows. What I want to do is to predict the price. I want to predict the price for a particular holiday id, particular Inhouse rating, particular star rating...