You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A function to implement imputation functionality for completing missing values.
10
11
11
12
Parameters
12
13
----------
13
-
dataframe : pandas.core.frame.DataFrame
14
+
df : pandas.core.frame.DataFrame
14
15
a dataframe that might contain missing data
15
16
strategy : string, default="mean"
16
17
The imputation strategy.
17
-
- If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data.
18
-
- If “median”, then replace missing values using the median along each column. Can only be used with numeric data.
19
-
- If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned.
20
-
- If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.
21
-
fill_value : string or numerical value, default=None
22
-
When strategy == “constant”, fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and “missing_value” for strings or object data types.
23
-
18
+
- If "mean", then replace missing values using the mean along each column. Can only be used with numeric data.
19
+
- If "median", then replace missing values using the median along each column. Can only be used with numeric data.
20
+
- If "most_frequent", then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned.
21
+
- If "constant", then replace missing values with fill_value. Can be used with strings or numeric data.
22
+
fill_value : numerical value, default=None
23
+
When strategy == "constant", fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data.
24
+
24
25
Returns
25
26
-------
26
27
pandas.core.frame.DataFrame
27
28
a dataframe that contains no missing data
28
-
29
+
29
30
Examples
30
31
---------
31
-
>>> import pandas as pd
32
-
>>> from eda_utils_py import cor_map
33
-
34
-
>>> data = pd.DataFrame({
35
-
>>> 'SepalLengthCm':[5.1, 4.9, 4.7],
36
-
>>> 'SepalWidthCm':[1.4, 1.4, 1.3],
37
-
>>> 'PetalWidthCm':[0.2, None, 0.2]
38
-
>>> })
32
+
>> import pandas as pd
33
+
>> from eda_utils_py import cor_map
34
+
35
+
>> data = pd.DataFrame({
36
+
>> 'SepalLengthCm':[5.1, 4.9, 4.7],
37
+
>> 'SepalWidthCm':[1.4, 1.4, 1.3],
38
+
>> 'PetalWidthCm':[0.2, None, 0.2]
39
+
>> })
39
40
40
-
>>> imputer(data, numerical_columns)
41
+
>> imputer(data, numerical_columns)
41
42
SepalLengthCm SepalWidthCm PetalWidthCm
42
43
0 5.1 1.4 0.2
43
44
1 4.9 1.4 0.2
44
45
2 4.7 1.3 0.2
45
46
"""
46
-
pass
47
47
48
+
# Tests whether input data is of pd.DataFrame type
49
+
ifnotisinstance(df, pd.DataFrame):
50
+
raiseTypeError("The input dataframe must be of pd.DataFrame type")
51
+
52
+
# Tests whether input strategy is of type str
53
+
ifnotisinstance(strategy, str):
54
+
raiseTypeError("strategy must be of type str")
55
+
56
+
# Tests whether input fill_value is of type numbers or None
0 commit comments