-
Notifications
You must be signed in to change notification settings - Fork 69
[Issue-308]: Add support for reading directly from file handles #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
[Issue-308]: Add support for reading directly from file handles #309
Conversation
1c9535f to
c2060eb
Compare
|
hi, thanks a lot for the PR! It is really great! Is this ready or WIP? I read your PR quickly, and I have a couple of questions, a couple of asks and one suggestion: questions:
suggestions:
asks:
|
|
Hey @ofajardo , thanks for your comment. I think all of your observations are valid. I should've created the PR as a draft (I just converted it to one just now). I first wanna make sure our team can use it for what they need, and if it proves to be usable, then I'll get back to the PR and make it much tighter, and also address all of your concerns. |
|
Awesome! Thanks! |
version 1.3.2
af8d838 to
01bf38f
Compare
- Add custom I/O handlers in _readstat_parser.pyx to bridge Python file objects to ReadStat C library - Modify run_conversion() to detect file-like objects and use custom handlers - Works for all read functions (SAV, SAS7BDAT, DTA, XPORT, POR) - Add test in test_basic.py and test_narwhalified.py (pandas/polars compatible) - Add README documentation with zip archive example Addresses maintainer feedback: - Uses run_conversion() directly instead of separate wrapper - Preserves apply_value_formats and all other options - File handle support works for all formats, not just SAV
01bf38f to
d9e7200
Compare
Test demonstrates reading SAV files from remote sources via HTTP, showing real-world usage of file handle support. Uses built-in http.server - no external dependencies needed.
|
@ofajardo We finally tested this in the scope of our software, and it does what we want it to. I've updated the PR addressing all of your concerns. Do you think it's cool to put it from draft to an actual PR? |
|
sure please go ahead. I am a bit busy these days, but will try to review soon. |
|
I changed the base to the dev branch, as I would merge into there. I had a quick look at the changes and they look great! I am a bit confused though, because I only see 3 recent commits from a couple of days ago, and I do not see any of the changes we discussed before, for example, I do not see any change at all to pyreadstat.pyx ... or maybe you just discarded the previous commits as those changes were not needed anymore? BTW I see that the signature of run_readstat_parser has changed in the pyx file but in the pxd file has not been updated. |
|
OK, I have reviewed now and everything looks good. I assume you just got rid of the previous commits and changes. |
File Handle Support for SAV Files
Summary
Adds support for reading SAV files directly from file-like objects (e.g., zip archives, BytesIO) without extracting to disk.
Use Case
Read large SAV files from zip archives without temporary file extraction:
import zipfile
import pyreadstat
with zipfile.ZipFile('data.zip', 'r') as zf:
with zf.open('survey.sav') as f:
df, meta = pyreadstat.read_sav(f) # No extraction needed!
Implementation
Testing
Changes