Skip to content

Commit c55ffcc

Browse files
authored
Merge pull request #13 from wangjc640/main
update docstring and readme
2 parents 0af2b84 + ed9e654 commit c55ffcc

File tree

2 files changed

+37
-1
lines changed

2 files changed

+37
-1
lines changed

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,15 @@ While Python packages with similar functionalities exist, this package aims to s
3333
- TBD
3434

3535
## Usage
36+
The eda_utils_py package help you to build exploratory data analysis.
3637

37-
- TBD
38+
eda_utils_py includes multiple custom functions to perform initial exploratory analysis on any input data describing the structure and the relationships present in the data. The generated output can be obtained in both object and graphical form.
39+
40+
The eda_utils_py is capable of :
41+
- Diagnose data quality : Resolve skewed data by identifing missing data and outlier and provide corresponding remedy.
42+
- Discover data: Plot correlation mattrix to help explore data to understand the data and find scenarios for performing the analysis.
43+
- Machine learning pereperation : Perform column transformations, derive scaler automatically to fulfill further machine learning need
44+
3845

3946
## Documentation
4047

eda_utils_py/eda_utils_py.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,32 @@ def cor_map(dataframe, num_col):
2626
cor_map(data, numerical_columns)
2727
2828
"""
29+
30+
31+
def outlier_identifier(dataframe, columns = None, method = "somefunction"):
32+
"""
33+
A function that identify and deal with outliers based on the method the user choose
34+
35+
Key arguments:
36+
dataframe [pandas.DataFrame]:
37+
The target dataframe where the function is performed.
38+
columns [list] : None
39+
The target columns where the function needed to be performed. Defualt is None, the function will check all columns
40+
method [string] : "somefunction"
41+
The method of dealing with outliers.
42+
43+
Returns:
44+
dataframe :
45+
The dataframe which the outlier has already process by the chosen method
46+
47+
Examples:
48+
data = pd.DataFrame({
49+
'SepalLengthCm':[5.1, 4.9, 4.7],
50+
'SepalWidthCm':[1.4, 1.4, 9999999.99],
51+
'PetalWidthCm:[0.2, 0.2, 0.2],
52+
'Species':['Iris-setosa','Iris-virginica']
53+
})
54+
55+
outlier_identifier(data)
56+
57+
"""

0 commit comments

Comments
 (0)