Skip to content

Question regarding NaN value handling and filtering order in preprocessing #2

@slingyu918

Description

@slingyu918

Hello,
I am studying the AIOSA project and the associated paper "AIOSA: An Approach to the automatic identification of obstructive sleep apnea events based on deep learning." I have a question regarding the details of data preprocessing, specifically concerning the handling of NaN values in signals and the order of filtering operations.
In section 3.2. Data preprocessing of the paper, for the Stroke Unit dataset, it is mentioned that:
A Butterworth bandpass filter was applied to the ECG waveform signals.
In a subsequent step, missing values were addressed by removing instances where ECG or SpO2 signals had more than 50% NaN values, and then the remaining NaN values were replaced with -1.
This sequence suggests that filtering was performed before NaN values were replaced with -1.
However, in the preprocess_dataset.ipynb file:
Cell 4, which generates patient_map_features, concatenates raw data segments (which may contain NaN values) using np.concatenate(temp[col].values). This implies that NaN values are preserved in the resulting continuous time series.
Cell 5, which creates windowed data based on patient_map_features, also does not show explicit NaN filling or filtering steps.
My questions are:
If NaN values are present in the ECG signals, applying a bandpass filter directly would typically lead to more NaNs or inaccurate results in the filtered output. Was there any form of temporary handling for NaNs (e.g., interpolation) in the ECG signals before the filtering step described in the paper? Or does the Butterworth filter implementation used have specific robust handling for NaNs?
How does the waveform_data generated in preprocess_dataset.ipynb (which may contain NaNs) align with the paper's workflow of filtering first and then replacing NaNs with -1? At what stage and how is the filtering operation applied to the raw signals that might initially contain NaNs?
I've encountered issues with NaN propagation when trying to filter signals containing NaNs using libraries like MNE. Therefore, I would appreciate clarification on how this process was specifically handled in this project to better understand and reproduce the results.
Thank you for your time and assistance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions