-
Notifications
You must be signed in to change notification settings - Fork 10
Description
I am rather confused about how MSPC handles the p-values of the input bed files, particularly for MACS, as the score column is actually the -log10(qvalue)*10 as specified in their docs (emphasis mine):
- score - Indicates how dark the peak will be displayed in the browser (0-1000). Thus, it’s for the purpose of displaying on genome browser. In MACS3 callpeak output, we use the -log10qvalue*10. However, it may happen when the value in this column goes above 1000, and cause trouble while loading it in genome browsers. In this case, use the following awk command to fix: awk -F'\t' '{ if ($5 > 1000) $5=1000; OFS="\t"; print }' peak.narrowPeak
I don't think this is a MACS3 change and think this has been the default for a while now (perhaps always).
While I have looked at the parser configuration options, it appears to have different expectations than what MACS provides.
The -log10(p-value) is in the 8th column of the typical MACS narrowPeak output. Would it be possible to make the parser argument(s) direct parameters in the rmspc R package rather than using a JSON file? It'd make things simpler. Maybe just have it take a named list?
The vignette is confusing, as it's clearly using MACS files, but I don't know if it's appropriate given what the score values actually are (or if those files were adjusted/parsed upstream).