How to calculate and what model should I use?


So I am new here and currently starting to work on my bachelorthesis. It is about the prediction of success of a game by looking at the amount of sentiment used per couple. My problem is how I should calculate the amount of sentiment and what statistical model I should use.
Example data: social game consists of pairs with n = amount of turns, s = sentiment used in turn, and y = success of game having two levels, yes or no. (t = total amount of sentiment per couple which I don't know how to calculate)
Let's say couple A has 200 turns with 200 detected sentiment, couple B has 100 turns with 100, obviously couple A has higher sentiment score here. Should I factor in that it is twice because 200 is 2 times 100? Another example: 200 with 125 vs. 300 turns with 275, then couple B has higher sentiment but how should I calculate this? Let's say the sentiment used each turn is equal. I just want to know how to calculate t = total sentiment with different amount of turns. T should be my predictor for success.

Is there any way to calculate this in R?
I got some information about hierarchical models, does this take care of problem?
As y has two levels, yes or no, I think I want to use a binomial logistic regression, but not sure either.

Thanks for reading!
Last edited:
Not a statistician, but I think you need to describe more about this quantification of sentiment. The Merriam-Webster Dictionary defines sentiment as
  1. a : an attitude, thought, or judgment prompted by feeling : predilection
    1. b : a specific view or notion : opinion
  2. a : emotion
    1. b : refined feeling : delicate sensibility especially as expressed in a work of art
    2. c : emotional idealism
    3. d : a romantic or nostalgic feeling verging on sentimentality
  3. a : an idea colored by emotion
    1. b : the emotional significance of a passage or expression as distinguished from its verbal context
I would guess your sentiment score is an ordinal scale but not interval. That is 200 is greater than 100 but not twice as much, however it is impossible to say with the information you have given.
Oh right, sentiment has a score from 0.00 to 1.00, so would say it is a ratio. The turns consists of sentences with sentiment scores regarding the sentence. The analysis to get these scores is already done. (With LIWC tool I want to calculate the scores per pair, who have different amount of turns. It’s just that some turns have score 0 sentiment and the amount of turns is different
Last edited:
OK that is clearer, but you also need to outline multiple things to receive sensible advice
  • by pair, do you mean human couple, or paired data? If paired data, exactly how is it paired?
  • What do you mean "success of game having two levels"? Do you mean your primary outcome/dependent variable is the dichotomous winning or loosing the game? Is there 1 winner and 1 looser, or do the participants as a whole either complete or fail to complete a set objective? Is this game played between 2 partners within a couple (i.e. partners playing each other = 2 participants) or between different couples (i.e. 2 couples playing against eachother = 4 participants, with each couple playing as a team)?
  • Did each and every couple in your analysis only play the game once?
  • Why are there different amount of turns? Is that the nature of the game? Are there implications for the analysis to having a different number of turns (e.g. poorer performance).
  • Are there other potential predictors or confounders for your outcome that you have data for and would like to account for e.g. years couple has been together, mean age or age difference of couple, heterosexual couple?
Last edited:
The game is as follow: it's a mazegame between two people who have to solve the maze by chatting and telling eachother where to go. They have a timer to solve each game, and some of the paths are blocked, but opened by your partner if they stand on the spot to open the path. (you try to get him to help you and vice versa to solve the maze together)
- By pair: the two people who are assigned a maze.
- Success of game two levels: the outcome of the game with solved or not solved within time. So the participants either complete or fail a objective.
- Each total mazegame has 12 rounds. Every couple plays the whole game once.
- So the different amount of turns is that: the couple needs to find a way together to guide eachother to solve the puzzle. Some people chat more, some people don't understand eachother.
- No confounders, all participants are randomly selected from the university as a lab experiment and randomly assigned a partner.

Because I want to divide the pairs in 4 categories:
- more sentiment with success
- more sentiment with fail
- little sentiment with success
- little sentiment with fail

And see if there is any relation between the category and success of the next game.
So my biggest concern is how to calculate the sentiment, with each pair having different turns, averaging doesn't work because 200 turns full vs. 100 turns full.

Thanks for reading and thinking!
OK it is starting to look like what i would refer to as balanced panel data (ballance meaning each couple participant has ecactly the same number of data entries), and simple logistic regression doesn't sound appropriate. Is each maze completely different to the others or is any maze repeated? would you expect learning/improvement in either the style of communication or the chance of success as the game continues?
12 complete different mazes. Learning is improved as the game continues because they can make a system to communicate how the maze looks like. Also the ~first and second game the participants are trying to figure out what to do. But I mostly want to account for success by looking at the sentiment used in their sentences, and question remains on how to calculate this.
OK you have clearly put a lot of work into the execution of this trial, and I would have thought you can obtain not only your thesis, but also a publication in a peer reviewed social sciences journal. However certainly before analysing for publication, you would need a statisticians involvement, as analysing this data well is not a simple task. If you can go down that route (and I would really encourage you to), everything you have written in answer to every question in this thread will be important information to take to the statistician as a single document/summary.
most importantly
  1. you have balanced panel data with each panel member (a pair of participants randomly assigned to 1 another) containing12 observations - each representing a round of a game. Each round has a binary outcome.
  2. participating teams (of 2) learn on subsequent rounds of the game. This learning will effect the primary independent variable being tested (sentiment score), but there may also be learning relevant to succeeding in the game independent of the sentiment score.
  3. A potential interaction term between the total sentiment score and the number of turns per maze needs to be examined.
  4. You wish to look whether the sentiment score in the previous attempt(s) effects the likelihood of success in the next game, not just the sentiment score for the round being played. Likely some kind of accumulator.
This is getting into quite complex statistics, which is beyond me (appologies if you feel I have wasted your time - initially I had thought I may be able to help, but the problem and the questions have progressively got more complex as it has been fleshed out). I would guess a statistician may use some kind of mixed effects model. If you belong to a university, I find it hard to imagine they don't provide consulting statisticians for such purposes (via some kind of video calling facility during lockdown).
Last edited:
Ah yeh, was just wondering if it there was any easy way to calculate the sentiment per pair given the random turns. As I am mostly stuck on this.
Thanks for your effort anyway!