diff --git a/.exp/design-workflow-1-grok-1-inference-and-sampling.md b/.exp/design-workflow-1-grok-1-inference-and-sampling.md index 31cae7b..84b9e22 100644 --- a/.exp/design-workflow-1-grok-1-inference-and-sampling.md +++ b/.exp/design-workflow-1-grok-1-inference-and-sampling.md @@ -6,7 +6,7 @@ The \"Grok-1 Inference and Sampling\" workflow provides the machinery to load th Key inputs: Checkpoint in `./checkpoints/ckpt-0/`, `tokenizer.model`, GPU cluster, prompts as `Request` objects (prompt str, temperature float, nucleus_p float, rng_seed int, max_len int). Outputs: Generated text strings. -Entry points: `run.py` for test run, or `InferenceRunner().run()` generator for streaming requests. +Entry points: `./install.sh` for automated full setup including dependency installation, model weight download via HF or torrent, and execution of test run via `run.py` [PR #389](https://github.com/xai-org/grok-1/pull/389); `run.py` directly for inference and sampling assuming environment, checkpoints, and tokenizer are prepared; `InferenceRunner().run()` generator for streaming batched requests in custom code. Relevant files: `run.py`, `runners.py`, `model.py`, `checkpoint.py`, `tokenizer.model`. The workflow orchestrates model loading, compilation of sharded compute functions, prompt processing (prefill KV cache while sampling first token), and iterative single-token generation using cached attention keys/values, until max length or EOS. @@ -46,13 +46,17 @@ The workflow orchestrates model loading, compilation of sharded compute function ```mermaid sequenceDiagram participant User + participant InstallSh as install.sh participant RunPy as run.py participant IR as InferenceRunner participant MR as ModelRunner participant Model as model.py participant Checkpoint as checkpoint.py participant JAX as JAX Runtime - User->>RunPy: Execute main() + User->>InstallSh: Execute ./install.sh + InstallSh->>InstallSh: Install system deps, Python pkgs from requirements.txt + InstallSh->>InstallSh: Download checkpoints & tokenizer via HF/torrent + InstallSh->>RunPy: python3 run.py RunPy->>IR: Create with config, MR, paths, meshes IR->>MR: initialize(dummy_data, meshes) MR->>Model: model.initialize(), fprop_dtype=bf16 @@ -64,6 +68,7 @@ sequenceDiagram IR->>IR: Precompile with dummy prompts for pad_sizes RunPy->>IR: gen = run() // generator setup with initial memory, settings, etc. ``` +Note: Updated to include automated setup via install.sh (PR #389). ## Inference and Sampling Sequence @@ -127,6 +132,6 @@ sequenceDiagram - **Error/Edge Cases**: Assumes sufficient memory/GPUs; handles long contexts by left-truncation/padding. No built-in EOS handling (relies on max_len or app logic). Quantized weights require custom unpickling. - **Performance Notes**: MoE router/experts use JAX vmap/shard_map (serial per-token, inefficient for prod). Focus on correctness/single-host validation. - **Extensibility**: Modular Haiku design allows custom configs/modules. Generator interface suits serving multiple prompts concurrently. -- **Dependencies & Setup**: `requirements.txt` (jax[cuda12_pip], haiku, etc.). Download ckpt via torrent/HF, place in checkpoints/. +- **Dependencies & Setup**: `./install.sh` [PR #389](https://github.com/xai-org/grok-1/pull/389) automates the process: detects OS and installs system tools (git, python3, pip, transmission-cli using apt/dnf/brew or checks on Windows); installs Python dependencies via `pip install -r requirements.txt` and `huggingface_hub`; downloads `tokenizer.model` and `checkpoints/ckpt-0/*` using `huggingface-cli` from [xai-org/grok-1](https://huggingface.co/xai-org/grok-1), and full weights via torrent magnet link with `transmission-cli`; finally runs `python run.py` to test the workflow. Manual setup: install packages from `requirements.txt` (jax[cuda12_pip], haiku, sentencepiece, numpy, etc.); download checkpoint from [Hugging Face](https://huggingface.co/xai-org/grok-1) or torrent magnet `magnet:?xt=urn:btih:5f96d43576e3d386c9ba65b883210a393b68210e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce` and place in `./checkpoints/`. This document captures the high-level design, derived from code analysis. \ No newline at end of file diff --git a/pr-analysis-389.md b/pr-analysis-389.md new file mode 100644 index 0000000..7790041 --- /dev/null +++ b/pr-analysis-389.md @@ -0,0 +1,43 @@ +# PR #389: Workflow Design Impact Analysis + +## Affected Workflows +- **Grok-1 Inference and Sampling**: The PR creates `install.sh` which directly addresses and automates the dependencies and setup instructions documented in `.exp/design-workflow-1-grok-1-inference-and-sampling.md`, including installation of packages from `requirements.txt` and downloading checkpoints to `./checkpoints/`. Moreover, the script culminates in executing `run.py`, the primary entry point for this workflow (see [workflows.json](.exp/workflows.json)). This introduces a new automated prerequisite phase to the workflow design without modifying core code. + +No other workflows (Model Loading and Initialization, Model Forward Pass and Logits Computation) are impacted, as their designs focus on runtime operations assuming prepared environment and files, with no references to installation processes. + +## Grok-1 Inference and Sampling Analysis + +### Summary of design changes +The PR introduces `install.sh`, a comprehensive Bash script that automates user setup for running the workflow. It detects the operating system, installs necessary system packages and Python dependencies, clones the repository if needed, downloads model weights via Hugging Face CLI and torrent, and launches `run.py` for inference testing. This affects the documented design by: + +- Adding a new pre-initialization phase for environment and asset preparation, previously manual. +- Extending the entry point to include `install.sh` as a user-friendly wrapper around `run.py`. +- Enhancing setup documentation with automation details, links to resources, and manual alternatives. + +The implementation uses conditional OS logic, external tools (apt, brew, dnf, transmission-cli, huggingface-cli), and error-checked commands for reliability. Benefits include cross-platform accessibility, reduced manual effort, and alternative download paths for robustness. Implications involve potential need for elevated privileges, network dependencies, and minor script quirks (e.g., redundant cloning when run from repo). + +The design document has been updated accordingly: entry points expanded, setup section detailed with PR reference, and initialization sequence diagram revised to prepend setup steps. + +```mermaid +graph LR + subgraph before ["Before PR"] + U1[User] --> R[Execute run.py] + R --> I[Initialization & Inference Setup] + end + subgraph after ["After PR"] + U2[User] --> IS[Execute install.sh] + IS --> SD[System deps install OS-specific] + IS --> PP[Python packages from requirements.txt] + IS --> DW[Download checkpoints & tokenizer] + DW --> R2[Execute run.py] + R2 --> I2[Initialization & Inference Setup] + end + before -.->|New setup phase added| after + style IS fill:#90EE90,stroke:#333,stroke-width:4px + style SD fill:#90EE90,stroke:#333,stroke-width:4px + style PP fill:#90EE90,stroke:#333,stroke-width:4px + style DW fill:#90EE90,stroke:#333,stroke-width:4px + style R2 fill:#FFFF00,stroke:#333,stroke-width:4px +``` + +**Legend**: Green rectangles indicate additions from the PR (new setup components); yellow indicates changes (altered execution path to run.py). \ No newline at end of file