Skip to content

Conversation

@mraves2
Copy link
Contributor

@mraves2 mraves2 commented Jan 16, 2026

Deze feature zorgt ervoor dat er extra QC informatie vanuit de DIMS pipeline in de eindmail komt, zodat de gebruiker in 1 oogopslag de kwaliteit van de run kan beoordelen.
Verschillende stappen van de pipeline, met name AverageTechReplicates en GenerateQCOutput, genereren extra txt bestanden, die als content opgenomen worden in DIMS.nf.

Copy link
Contributor

@BasMonkey BasMonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some remarks about performance and code duplication. Please see the comments left on the code.

file = paste(outdir, "sample_names_nodata.txt", sep = "/"),
row.names = FALSE, col.names = FALSE, quote = FALSE
)
if (!is.null(sample_names_nodata)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition check seems redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second if statement replaced by else condition.

Comment on lines 61 to 62
for (sample_name in sample_names_nodata) {
repl_pattern[[sample_name]] <- NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this for loop wil also run for the sample "none" if there is no data, which seems to be not the most elegant way. I would suggest to leave the value empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem solved by adding else condition in if statement.

Comment on lines +204 to +242
if (dims_matrix == "Plasma") {
# pos
for (line_index in seq_len(nrow(is_pos_selection_subset))) {
is_selected <- is_pos_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$plasma$pos[which(all_is_thresholds$names$pos == is_selected)]
if (is_pos_selection_subset$Intensity[line_index] < thresh_selected) {
is_below_threshold <- rbind(is_below_threshold, is_pos_selection_subset[line_index, ])
scanmode_is <- c(scanmode_is, "pos")
}
}
# neg
for (line_index in seq_len(nrow(is_neg_selection_subset))) {
is_selected <- is_neg_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$plasma$neg[which(all_is_thresholds$names$neg == is_selected)]
if (is_neg_selection_subset$Intensity[line_index] < thresh_selected) {
is_below_threshold <- rbind(is_below_threshold, is_neg_selection_subset[line_index, ])
scanmode_is <- c(scanmode_is, "neg")
}
}
} else if (dims_matrix == "DBS") {
# pos
for (line_index in seq_len(nrow(is_pos_selection_subset))) {
is_selected <- is_pos_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$dbs$pos[which(all_is_thresholds$names$pos == is_selected)]
if (is_pos_selection_subset$Intensity[line_index] < thresh_selected) {
is_below_threshold <- rbind(is_below_threshold, is_pos_selection_subset[line_index, ])
scanmode_is <- c(scanmode_is, "pos")
}
}
# neg
for (line_index in seq_len(nrow(is_neg_selection_subset))) {
is_selected <- is_neg_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$dbs$neg[which(all_is_thresholds$names$neg == is_selected)]
if (is_neg_selection_subset$Intensity[line_index] < thresh_selected) {
is_below_threshold <- rbind(is_below_threshold, is_neg_selection_subset[line_index, ])
scanmode_is <- c(scanmode_is, "neg")
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is almost the same, except for the name change. I believe there is a more efficient way to achieve the same result. For example, move the for loop in a function, and pass the type as argument.

# pos
for (line_index in seq_len(nrow(is_pos_selection_subset))) {
is_selected <- is_pos_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$plasma$pos[which(all_is_thresholds$names$pos == is_selected)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The which() method preforms a linear search action per row. A more efficient way is to make use of the match() method.

is_selected <- is_pos_selection_subset$HMDB_name[line_index]
thresh_selected <- all_is_thresholds$plasma$pos[which(all_is_thresholds$names$pos == is_selected)]
if (is_pos_selection_subset$Intensity[line_index] < thresh_selected) {
is_below_threshold <- rbind(is_below_threshold, is_pos_selection_subset[line_index, ])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid rbind() in a loop, since it repeatedly reallocates and copies the data frame, which is inefficient and may use a huge amount of ram for larger datasets. Consider collecting indices or rows first and binding once at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants