Hi,
I was wondering whether there is a way of determining cut-off point of your dataset? Let say I have the following dataset , which includes the following columns: text and emotion (negative, positive , or neutral):
(Lets presume that there are 1000 rows with similar text - I have just presented you the first 5). I want to calculate the mean frequency of each word in order to select the most important words. Lets say that the output (this is just an example) is:
I was wondering whether there is a way of determining cut-off point of your dataset? Let say I have the following dataset , which includes the following columns: text and emotion (negative, positive , or neutral):
| Text | Emotion |
| I like yellow. | Positive |
| I hate blue. | Negative |
| I love yellow. | Positive |
| I dislike orange. | Negative |
| I am ok with any colour. | Neutral |
| I like yellow. | Positive |
| I hate blue. | Negative |
| I love yellow. | Positive |
| I dislike orange. | Negative |
| I am ok with any colour. | Neutral |
(Lets presume that there are 1000 rows with similar text - I have just presented you the first 5). I want to calculate the mean frequency of each word in order to select the most important words. Lets say that the output (this is just an example) is:
WORD MEAN
I 0.76
like 0.56
hate 0.45
love 0.03
dislike 0.34
am 0.33
ok 0.22
with 0.10
any 0.02
colour 0.05
yellow 0.20
blue 0.18
orange 0.76
pink 0.05
How will I be able to determine the cut-off point of my dataset in order to select the most important words (i.e. the words that appear more frequently)?I 0.76
like 0.56
hate 0.45
love 0.03
dislike 0.34
am 0.33
ok 0.22
with 0.10
any 0.02
colour 0.05
yellow 0.20
blue 0.18
orange 0.76
pink 0.05