The final version of Polymorphous Perversity includes a diagnosing system that ranks general player performance on the game based on seven factors.

The diagnostic uses pooled data from a sample of players from the testing phase, and compares the player’s performance to that sample on each of the seven factors. The final result is a percentile value corresponding to the percentage of players from the initial sample that ranked less than you. For example, if you scored 80 in the Homosexuality factor, that means your score was higher than that of 80% of the players from the sample.

Most modern psychological tests rank subjects based on percentile values based on a sample of testing subjects, obtained by validation studies. Polymorphous Perversity is NOT a psychological test, nor does it intend to have any scientific validity. The diagnosing system is just an extra feature of the game’s mechanics.

Diagnosing statistics

This is how the diagnosing system was constructed.

The game automatically records frequencies and other types of data from pretty much everything the player does. It records how often he uses certain items, certain moves, how often he has sex with specific NPCs, how often he explores certain options, etc.

On my second testing phase, I had a group of players play the game normally, and send me the savefiles with all that data. I grouped all the data I got and started running statistics.

Initially, I wasn’t sure how I was going to score performance. I knew I wanted to have different factors, but I needed an objective way to determine what’s high, and what’s low. As an example, let’s take the Homosexuality factor. There are a few moments in the game where you have to have sex with men, and many other moments where you can do it by choice. I could obviously not score someone as a homosexual just by having sex with men a few times. But what should I consider few, and what should I consider many?

My solution was, as mentioned above, gathering data from a sample of players, obtaining percentile values, and scoring by comparison to that sample.

My second decision was abandoning predefined factors/categories of performance. I could call “Homosexuality” every instance of the player having sex with a man, but in that there would be an underlying assumption that there is a common “drive” every time the player engages in homosexual activities in the game. This is not necessarily true (especially for a game), so I looked for ways to obtain factors/categories that made no prior assumptions, but were based on the data themselves.

What I did was running correlation tests between all 120+ performance variables recorded by the game. I used Spearman’s rank correlation tests, since I wasn’t confident about using parametric tests. After I got a giant 120x120 table of correlations, I looked for correlations with a significance p < 0.01, and flagged those correlations as significant. After that, I manually grouped variables that showed significant correlations among each other.

The next steps were a bit more subjective and arbitrary. After grouping correlating variables, I re-ran correlation tests to spot variables that correlated to many others within that group, and variables that correlated with only a few others. I also examined which correlations were high, and which were low. Variables that didn’t correlate well within the group were eliminated.

This process left me with seven groups of 5-7 variables each. What I did then was naming each group, and this was as purely subjective judgment. Sometimes it was easy: a group that included many instances of having sex with men and doing other actions that are generally considered homosexual was named Homosexuality. Other groups were harder to name, but I’m somewhat confident about the terms I chose. What’s interesting about this is that, even in groups that show a clear tendency, there are always one or two variables that show no clear thematic relation with the rest of the group. For example (and this is a fictitious example, since I don’t want to spoil anything), the Homosexuality group could include having sex with guy1, having sex with guy2, doing some other homosexual stuff, and eating ice creams. Now, it sounds strange to say that players who eat more ice cream would score higher on the Homosexuality factor. But that’s based on data which shows me that players who engage in homosexual acts are the same who eat more ice cream. Intriguing data.

Having established the name of the factors and the variable grouped within them, I ran descriptive statistics tests for each variable to determine which ones were more frequent, and which were less. Variables that were less frequent and/or represented actions that were less likely to happen in the game had a higher weight. That is, doing something that required you to move a lot, spend resources and more time should weigh more than an action that could be done many times with less effort. The exact weight of each variable was arbitrarily determined.

Having weighed each variable, I had a formula for each of the seven factors. The formula was ran for all players from the testing sample, and the seven scores were obtained for each. That left me with a set of data to serve as reference for the percentile values. Each factor data was divided in cutpoints for 20 equal groups, giving me 5, 10, 15, ...,95 percentile values. Those are the values to which player’s performance is compared in the ending of Polymorphous Perversity to generate the diagnosing percentiles.


Numerous arguments can be raised about the validity of my diagnosing system. I would be interested in discussing them, but like I said, Polymorphous Perversity is not a psychological test, and there is no scientific pretension behind it. Also, I’m not willing to change the game in any way after it’s released.

My greatest limitation was probably the number of testers. I was aiming for 100, which would represent a fair sample. I got almost 300 people testing the game, but for numerous reasons, only 23 sent me their savefiles. So I only got data from 23 players. Yes, not very representative.

A second problem would be the frequency-based scoring. In theory, a player who does everything in the game many times would score high on everything, which is not appropriate for diagnosis. A player who is very pragmatic and rushes through the game would score low on most factors. I expect this to be true only in extreme cases, though. In my sample there were players who explored more, acted more, did more things, and there were players who skipped parts of the game and rushed to the end. Players who explored more did not necessarily scored higher in everything. The factors seem to discriminate players well.

Lastly, there was a high amount of subjectivity in creating the factors and their formula. I arbitrarily included variables, eliminated some, weighed them, and named the factors. By arbitrarily I mean subjectively, not randomly! A different person would have made different judgments on those aspects.

In conclusion, Polymorphous Perversity sexual diagnosing system is not perfect, but it is there. Interpret your scores as you wish.

1 comment:

  1. I am having trouble getting the game to load on a Mac. It is installed, just not running. Any suggestions?