[Rabies] Initialize Lyssavirus rabies all-clades community dataset #333
[Rabies] Initialize Lyssavirus rabies all-clades community dataset #333xonq wants to merge 7 commits intonextstrain:masterfrom
Conversation
|
Thanks! Seems to be working As a dev I can only review the technical side. And I will let our scientists to check the sciency bits :) The virus is quite diverse it seems - lots of mutations. But this is probably expected. If you have an public repo where you prepare trees and other data for the dataset, it would be a great help to the users of your dataset if you add it to the readme. We typically use a boilerplate like this in Nextstrain datasets: But that's not mandatory. All looks good to me. Smooth work! |
|
Thank you. Rabies is indeed very diverse - I contemplated creating independent datasets for each clade, but the genotyping of this "all-clades" dataset has been sufficient for our SME partners. Additionally, there were issues with sub-clade metadata quality that limit the improvements more refined datasets may provide. I do not have a repository for tree building - I built the Nextclade dataset from the Nextstrain rabies build as a template, though I ended up deviating with the tree building methodology and metadata acquisition. The methodology is hopefully adequately documented for users in this PR's README. |
|
Thanks a lot for contributing this dataset! Overall, this looks very good. But I have a few suggestions to make it better.
|
|
hey @rneher, just wanted to reply and inform you that I cannot return to this to address your concerns until a later date. not sure when, but hopefully within the next several weeks. Thanks for your suggestions and my apologies for my ignorance to some of the standardized procedures. RE: alignment parameters: I'm not really certain how to systematically adjust these parameters - do you have specific recommendations/procedures to determine what parameters are more ideal, or do you suggest dragging and dropping the linked pathogen.json you sent? RE: Apart from the tree-building, the workflow was performed with AUGUR. With this in mind, do these steps deviate from Nextclade like you're suggesting?: Alignment:Tree building:performed as discussed in the README Refinement:Trait application:Nucleotide mutation calling:Translation:Clade mutation extraction (non-AUGUR):Clade mutation application:Export: |
This pull request initializes a Lyssavirus rabies (rabies) Nextclade dataset with clade-subclade resolution. Created in collaboration with @kimandrews and with subject matter expertise/user input from Massachusetts Department of Public Health. Please review the README.md for information on dataset creation and citations.