Gini

Written by Evispot. Posted in g, Glossary.

The Gini coefficient is a well-established method to quantify the inequality among values of a frequency distribution, and can be used to measure the quality of a binary classifier. Gini is measured between 0 and 1. A Gini index of 0 expresses perfect equality (or a totally useless classifier), while a Gini index of 1 expresses maximal inequality (or a perfect classifier). A gini of 1 should however be met with suspicion. The closer to 1 we get, the better the results are.

The Gini index is based on the Lorenz curve. The Lorenz curve plots the true positive rate (y-axis) as a function of percentiles of the population (x-axis).

The Gini is calculated by the following equation:

Gini = (AUC x 2) -1

The Lorenz curve represents a collective of models represented by the classifier. The location on the curve is given by the probability threshold of a particular model. (i.e., Lower probability thresholds for classification typically lead to more true positives, but also to more false positives.)

The Gini index itself is independent of the model and only depends on the Lorenz curve determined by the distribution of the scores (or probabilities) obtained from the classifier.

Tip The higher gini, the better classifier, however if you get a gini of 1, you should be very suspicious and most likely a variable exist in the data set that is equivalent with your target variable, (e.g tarket leakage)

Gini

Product

Company

Contact