Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/lint-and-type-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,11 @@ jobs:
enable-cache: false
python-version: "3.13"

- name: Install project
run: uv sync --all-extras

- name: Run mypy
run: uv run mypy .
run: uv run mypy

- name: Run ruff lint
run: uv run ruff check .
Expand Down
16 changes: 10 additions & 6 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,17 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

- name: Install the project
run: uv sync --all-extras
- name: Run tests without extras
run: uv run coverage run -m pytest -vv

- name: Run tests and generate coverage
run: |
uv run coverage run -m pytest -vv
uv run coverage xml
- name: Run tests with extras
run: uv run --all-extras coverage run --append -m pytest -vv

- name: Show coverage
run: uv run coverage report -m

- name: Generate coverage
run: uv run coverage xml

- name: Upload coverage to Codecov
uses: codecov/codecov-action@18283e04ce6e62d37312384ff67231eb8fd56d24 # v5.4.3
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,5 +160,4 @@ cython_debug/
#.idea/

.coverage
_media.gql
test.py
94 changes: 94 additions & 0 deletions NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
A small notes file to keep track of the behaviors and quirks of different
archive formats, or the libraries I’m using for them, that I’ve discovered
during development.

I often found myself repeatedly testing various operations on each library to
see how they behave, and then weeks later, I would forget the results and do
it all over again. So I decided it is worth writing it all down in one place.

## Reading Non-Files (e.g., directories)

| Archive Type | Behavior when attempting to read a directory |
| ------------------ | -------------------------------------------- |
| tarfile.TarFile | Returns `None` |
| zipfile.ZipFile | Returns empty `b""` |
| rarfile.RarFile | Raises `io.UnsupportedOperation` |
| py7zr.SevenZipFile | Raises `KeyError` |

I find returning empty bytes unacceptable, since you can no longer tell
apart an empty file and a directory. So I am left with either returning
`None` or raising an Exception. I ended up going with an Exception (I think
I'll call it `ArchiveMemberNotAFileError`) because 2 out of 4 libraries do
it and it seems the most "correct" behavior to me. Reading a directory is
an error (pathlib will also raise if you try reading a directory with
`.read_bytes()`) and it should be treated as such.

## Trailing `/` in Directory Names: Who Cares?

1. `tarfile.TarFile`

- A trailing `/` doesn’t matter.
- Methods like `getmember(name)` automatically strip it (See: [`tarfile.py#L2059`](https://github.com/python/cpython/blob/be388836c0d4a970e83ca5540c512d94afd13435/Lib/tarfile.py#L2059)).
- `getnames()` and `TarInfo.name` never include the trailing `/`.

2. `zipfile.ZipFile`

- A trailing `/` is significant.
- `getinfo(name)` requires the exact directory name; omitting the `/` raises a `KeyError`.
- `namelist()` and `ZipInfo.filename` preserve the trailing `/`.

3. `py7zr.SevenZipFile`

- Behaves much like `tarfile.TarFile`.
- `getinfo(name)` strips the trailing slash (See: [`py7zr.py#L944`](https://github.com/miurahr/py7zr/blob/9a5a5b9bc39bc0afaac60f3cc7ee6842bd167f35/py7zr/py7zr.py#L944)):
- `namelist()` and `FileInfo.filename` do not include trailing `/`.

4. `rarfile.RarFile`
- Strips trailing `/` for lookups (See: [`rarfile.py#L1093`](https://github.com/markokr/rarfile/blob/db1df339574e76dafb8457e848a09c3c074b03a0/rarfile.py#L1093)).
- But `getinfo().filename` and `namelist()` keep the trailing `/`.

### Summary

| Archive Type | `/` Stripped for Lookup | `/` Preserved in Listing |
| ------------------ | ----------------------- | ------------------------ |
| tarfile.TarFile | ✅ Yes | ❌ No |
| zipfile.ZipFile | ❌ No | ✅ Yes |
| py7zr.SevenZipFile | ✅ Yes | ❌ No |
| rarfile.RarFile | ✅ Yes | ✅ Yes |

For `ArchiveFile`, I cannot simply discard the trailing `/`
because `zipfile.ZipFile` requires it. Under the hood,
names are normalized for lookups so that calls like:

```python
archive_file.get_member("foo/bar/")
```

will work regardless of the archive format.

However, normalization alone doesn’t solve everything.
Consider this snippet:

```python
member = archive_file.get_member("foo/bar/")
assert "foo/bar/" in archive_file.get_names()
assert member.name == "foo/bar/"
```

The goal is for this code to pass consistently,
no matter which archive format is used. This is achieved by:

1. Normalizing lookups according to the preferences of the underlying archive format.
- For example, `/` is stripped for SevenZipFile, but kept for ZipFile.

2. Preserving trailing `/` in listings so that checks like:

```python
"foo/bar/" in archive_file.get_names()
archive_file.get_member("foo/bar/").name == "foo/bar/"
```
work reliably. For formats that discard the `/` (TarFile and SevenZipFile), we add it back in.

Not doing this would force users to know the details of the
underlying archive format just to perform a simple check
which would be tedious and error-prone.
16 changes: 16 additions & 0 deletions justfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
set shell := ["sh", "-c"]

default: lint test

lint:
uv run ruff check . --fix
uv run ruff format .
uv run mypy

test *args:
rm -f .coverage
uv sync --locked
uv run coverage run -m pytest {{args}}
uv sync --all-extras --locked
uv run coverage run --append -m pytest {{args}}
uv run coverage report -m
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ repo_url: https://github.com/Ravencentric/archivefile
edit_uri: edit/main/docs/

theme:
language: en
icon:
repo: fontawesome/brands/github
edit: material/pencil
Expand Down
101 changes: 63 additions & 38 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,34 +7,30 @@ requires-python = ">=3.10"
readme = "README.md"
license = "Unlicense"
keywords = [
"archive",
"archivefile",
"zipfile",
"tarfile",
"sevenzip",
"rarfile",
"archive",
"archivefile",
"zipfile",
"tarfile",
"sevenzip",
"7z",
"rarfile",
]
classifiers = [
"License :: OSI Approved :: The Unlicense (Unlicense)",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: System :: Archiving",
"Topic :: System :: Archiving :: Compression",
"Typing :: Typed",
]
dependencies = [
"rarfile>=4.2",
"py7zr>=0.21.1",
"pydantic>=2.8.2",
"typing-extensions>=4.12.2",
"License :: OSI Approved :: The Unlicense (Unlicense)",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Topic :: System :: Archiving",
"Topic :: System :: Archiving :: Compression",
"Typing :: Typed",
]
dependencies = []

[project.optional-dependencies]
bigtree = ["bigtree>=0.19.3"]
rich = ["rich>=13.7.1"]
all = ["bigtree>=0.19.3", "rich>=13.7.1"]
rar = ["rarfile>=4.2"]
7z = ["py7zr>=1.0.0"]

[project.urls]
Repository = "https://github.com/Ravencentric/archivefile"
Expand All @@ -46,11 +42,13 @@ docs = [
"mkdocs-material>=9.5.50",
"mkdocstrings[python]>=0.27.0",
]
test = [
"coverage[toml]>=7.6.10",
"pytest>=8.3.4",
test = ["coverage[toml]>=7.6.10", "pytest>=8.3.4"]
lint = [
"mypy>=1.16.0",
"ruff>=0.11.12",
"typing-extensions>=4.12.2",
{ include-group = "test" },
]
lint = ["mypy>=1.16.0", "ruff>=0.11.12", { include-group = "test" }]
dev = [
{ include-group = "docs" },
{ include-group = "test" },
Expand All @@ -61,36 +59,63 @@ dev = [
line-length = 120

[tool.ruff.lint]
extend-select = ["I"]
extend-select = [
"I", # https://docs.astral.sh/ruff/rules/#isort-i
"RUF", # https://docs.astral.sh/ruff/rules/#ruff-specific-rules-ruf
"UP", # https://docs.astral.sh/ruff/rules/#pyupgrade-up
"N", # https://docs.astral.sh/ruff/rules/#pep8-naming-n
"D4", # https://docs.astral.sh/ruff/rules/#pydocstyle-d
"B", # https://docs.astral.sh/ruff/rules/#flake8-bugbear-b
"FBT", # https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt
"C4", # https://docs.astral.sh/ruff/rules/#flake8-comprehensions-c4
"EM", # https://docs.astral.sh/ruff/rules/#flake8-errmsg-em
"ISC", # https://docs.astral.sh/ruff/rules/multi-line-implicit-string-concatenation/
"PIE", # https://docs.astral.sh/ruff/rules/#flake8-pie-pie
"RET", # https://docs.astral.sh/ruff/rules/#flake8-raise-rse
"PL", # https://docs.astral.sh/ruff/rules/#pylint-pl
"E", # https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
"W", # https://docs.astral.sh/ruff/rules/#pycodestyle-e-w
"FURB", # https://docs.astral.sh/ruff/rules/#refurb-furb
"TC", # https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc
"TID253", # https://docs.astral.sh/ruff/rules/banned-module-level-imports/
]
fixable = ["ALL"]

[tool.ruff.lint.flake8-tidy-imports]
# These modules are optional dependencies,
# so we don't want to import them at the module level
banned-module-level-imports = ["py7zr", "rarfile"]

[tool.ruff.lint.extend-per-file-ignores]
"tests/*" = ["D", "FBT", "PL", "C416"]

[tool.ruff.lint.isort]
required-imports = ["from __future__ import annotations"]

[tool.ruff.format]
docstring-code-format = true

[tool.mypy]
strict = true
pretty = true
files = ["src/**/*.py", "tests/**/*.py"]
enable_error_code = ["ignore-without-code"]

[[tool.mypy.overrides]]
module = ["rarfile"]
ignore_missing_imports = true

[tool.pytest.ini_options]
filterwarnings = ["ignore::DeprecationWarning:tarfile"]

[tool.coverage.run]
omit = ["src/archivefile/_version.py", "tests/*"]
addopts = ["-ra", "--showlocals", "--strict-markers", "--strict-config"]
filterwarnings = ["error"]
log_cli_level = "INFO"
testpaths = ["tests"]

[tool.coverage.report]
exclude_also = [
"if TYPE_CHECKING:", # Only used for type-hints
"raise NotImplementedError", # Can't test what's not implemented
"def print_table", # Function that pretty much calls another third party function
"def print_tree", # Function that pretty much calls another third party function
"def __repr__", # For debugging
"if TYPE_CHECKING:", # Only used for type-hints
]


[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
21 changes: 12 additions & 9 deletions src/archivefile/__init__.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
from __future__ import annotations

from archivefile._core import ArchiveFile
from archivefile._enums import CompressionType
from archivefile._models import ArchiveMember
from archivefile._utils import is_archive
from archivefile._version import Version, _get_version

__version__ = _get_version()
__version_tuple__ = Version(*[int(i) for i in __version__.split(".")])
from ._core import ArchiveFile, is_archive
from ._errors import (
ArchiveFileError,
ArchiveMemberNotAFileError,
ArchiveMemberNotFoundError,
UnsupportedArchiveFormatError,
)
from ._models import ArchiveMember

__all__ = [
"ArchiveFile",
"ArchiveFileError",
"ArchiveMember",
"ArchiveMemberNotAFileError",
"ArchiveMemberNotFoundError",
"UnsupportedArchiveFormatError",
"is_archive",
"CompressionType",
]
Loading