Skip to content

Commit 0af2b84

Browse files
authored
Merge pull request #11 from UBC-MDS/cor_map
Updated README and added cor_map docstring
2 parents 67af6c6 + 744123f commit 0af2b84

File tree

2 files changed

+47
-5
lines changed

2 files changed

+47
-5
lines changed

README.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,39 @@
22

33
![](https://github.com/chuangw46/eda_utils_py/workflows/build/badge.svg) [![codecov](https://codecov.io/gh/chuangw46/eda_utils_py/branch/main/graph/badge.svg)](https://codecov.io/gh/chuangw46/eda_utils_py) ![Release](https://github.com/chuangw46/eda_utils_py/workflows/Release/badge.svg) [![Documentation Status](https://readthedocs.org/projects/eda_utils_py/badge/?version=latest)](https://eda_utils_py.readthedocs.io/en/latest/?badge=latest)
44

5-
Python package that contains util functions for eda process
5+
## Overview
6+
7+
As data rarely comes ready to be used and analyzed for machine learning right away, this package aims to help speed up the process of cleaning and doing initial exploratory data anslysis (EDA). The package focuses on the tasks of dealing with outlier and missing values, scaling and correlation visualization.
68

79
## Installation
810

911
```bash
1012
$ pip install -i https://test.pypi.org/simple/ eda_utils_py
1113
```
1214

13-
## Features
15+
## Functions
16+
17+
The four functions contained in this package are as follows:
18+
- Function 1: A function to identify and impute missing values
19+
- Function 2: A function to identify and deal with outliers
20+
- Function 3: A function to scale numerical values in the dataset
21+
- `cor_map`: A function to plot a correlation matrix of numeric columns in the dataframe
22+
23+
24+
## Our Place in the Python Ecosystem
25+
26+
While Python packages with similar functionalities exist, this package aims to simplify the amount of code necessary for these functions and outputs. Packages with similar functionality are as follows:
1427

15-
- TODO
28+
- [Sklearn.preprocessing]( https://scikit-learn.org/stable/modules/preprocessing.html)
29+
- [Altair Heatmap](https://altair-viz.github.io/gallery/layered_heatmap_text.html)
1630

1731
## Dependencies
1832

19-
- TODO
33+
- TBD
2034

2135
## Usage
2236

23-
- TODO
37+
- TBD
2438

2539
## Documentation
2640

eda_utils_py/eda_utils_py.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
2+
def cor_map(dataframe, num_col):
3+
"""
4+
A function to implement a correlation heatmap including coefficients based on given numeric columns of a data frame.
5+
6+
Args:
7+
dataframe (pandas.DataFrame): The data frame to be used for EDA.
8+
num_col (list): A list of string of column names with numeric data from the data frame.
9+
10+
Returns:
11+
(altair): A correlation heatmap plot with correlation coefficient labels based on the numeric columns specified by user.
12+
13+
Examples:
14+
import pandas as pd
15+
from eda_utils_py import cor_map
16+
17+
data = pd.DataFrame({
18+
'SepalLengthCm':[5.1, 4.9, 4.7],
19+
'SepalWidthCm':[1.4, 1.4, 1.3],
20+
'PetalWidthCm:[0.2, 0.2, 0.2],
21+
'Species':['Iris-setosa','Iris-virginica']
22+
})
23+
24+
numerical_columns = ['SepalLengthCm','SepalWidthCm','PetalWidthCm']
25+
26+
cor_map(data, numerical_columns)
27+
28+
"""

0 commit comments

Comments
 (0)