Skip to content

Conversation

@talwrii
Copy link

@talwrii talwrii commented Sep 10, 2016

No description provided.

Tal Wrii added 2 commits September 10, 2016 00:19
In preparation for adding more options, namely
bootstrap replacement and samples for various
pseudorandom distribution
@bagrow
Copy link
Owner

bagrow commented Sep 12, 2016

Prefer to avoid argparse dep. Possible to rewrite using the same argument parsing style as the other code?

@talwrii
Copy link
Author

talwrii commented Sep 12, 2016

arpgarse is part of the standard library so it isn't a dependency, but I agree there is conceptual overhead for non-python programmers and non-programmers - is this what you are concerned about?

Umm, so I was planning to add support for arbitrary distributions here. This was mostly me making room / splitting work into pieces.

I want to do things like:

rsample --distribution normal --mean 100 --stddev 10

The use case being, "I have no idea how strange this graph for my data is, I should see what it looks like with some normal data".

My experience suggests that this will become unreadable without argparse, and the documentation of argparse is valuable. However, we could split these things off into separate binaries like:

rbinom
rnorm
rpoisson

This has some impacts on documentation / discoverability, but does result in simpler programs that are more readable by non/semi-programmers.

Philosophicaly muttering

Opinions? I have a general misgiving that one might end reimplementing R / numpy with pipes instead of broadcasting. There's a question about what this library represents in the shadow of tools like R and numpy. I mostly like the idea because I am loathe to leave the shell, and am not terribly keen on all the state that comes along with using ipython notebook / babel.

I've hacked up a tool called RPipe before that works like so

seq 100 | RPipe 'diff(d)' | plot 

There's a similar tool called pyline that does a similar thing with python.

@talwrii
Copy link
Author

talwrii commented Sep 20, 2016

Anyway, here's a branch where rsample selects from a normal distribution. See what you think:

https://github.com/talwrii/datatools/tree/talwrii--normal-data--2016-09-20

  • Does this functionality deserve to exist at all (I couldn't find any tools to produce it on the command line)
  • Would you prefer this to exist in a separate file called rnorm?
  • In this context, what's your opinion of an argparse dependency

@bagrow
Copy link
Owner

bagrow commented Sep 20, 2016

So for a while I had a package of scripts in parallel with datatools called randtools. These were about generating random numbers according to distributions, etc. After a while I found myself only using rsample, so I moved that into datatools and dropped randtools.

What this reads to me is that you think randtools would be worthwhile. That's great! It turned out that I didn't need it, but you might, so go and build it (maybe I'll send some PRs!).

There's a question about what this library represents in the shadow of [...]

Yes, I agree. You like datatools for the same reason I do, staying inside the shell. However, R/Python are so good that baking too much into datatools isn't worth it because if what you're doing is complex enough it's better to do it in that context. This is my overriding motivation for keeping datatools small and focused.

What's your opinion of an argparse dependency?

Not in datatools please.

@talwrii
Copy link
Author

talwrii commented Sep 21, 2016

Cool cool. My motivation for the pull requests is "here's a library for command line data analysis, it doesn't have the tools I want, I shall implement them, now I've implemented them I may as well give you a pull request"

Umm, so I'm going to implement a version of rsample, possibly with a different name, that generates data from different distributions. I'm assuming you don't want it in datatools, so will put it in a differently named repro / leave in in my ~/bin. Just say if you actually want it.

Do you want sampling with replacement in rsample? If so I'll strip out the argparse dependency for you.

More generally, I'm probably going to carry on tweaking these tools here and making complementary tools as I go about my day-to-day activities. I don't know how you want to interact with them: your goal of minimality may be at odds with my goal of "create tools for all the things I do"

I could:

  • Carry on feeding you pull requests
  • Shove stuff in my fork so you can go looting when you feel bored.
  • Try to put new tools in a different repro ("moredatatools"!), to avoid the problem of "buggy, more feature-complete fork." Again you could go looting when bored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants