Allow use of sample weights for dealing with imbalanced data

### Description

I'm trying to see the reliability diagram of data which is sampled. 
Since my dataset is very imbalanced (rows with positive targets are less than 1% of the unsampled data). A common approach to dealing with imbalanced dataset is to use random undersampling of the negative targets. Therefore, each row of a negative target actually "represents" 20x as many rows. 
When I try to plot the reliability plot my probabilities are all way off. This is to be expected as I didn't account for the sampling to create the diagram. Obviously the probability computed without accounting for the sampling does not reflect the true probability I want my calibration to output. By using sample weights we can fix this issue (Most of sklearn's models have support for the sample_weight parameter.)

### What I Did

I fixed this issue by changing a few lines in the plot_reliability_diagram function. 
I added an optional parameter ```weights=None```. 
```
    if weights is None:
        weights = np.ones_like(x)
    
    mean_count_array = np.array([[np.average(y[digitized_x == i], weights=weights[digitized_x == i]),
                                  sum(weights[digitized_x == i]),
                                  np.average(x[digitized_x == i], weights=weights[digitized_x == i])] 
                                  for i in np.unique(digitized_x)])
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow use of sample weights for dealing with imbalanced data #32

Description

What I Did

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow use of sample weights for dealing with imbalanced data #32

Description

Description

What I Did

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions