Unified Framework Diagram
- Table of Contents
- Overview
- Installation
- Configuration
- Datasets
- Usage
- System Architecture
- Example Scripts
- Output Format
- Configuration Files
- Development
- Citation
- Acknowledgments
- Contact
This repository contains the code for our paper: "Does Memory Need Graphs? A Unified Framework and Empirical Analysis for Long-Term Dialog Memory". We conduct experimental and system-oriented analysis of long-term dialog memory architectures. We propose a unified framework that deconstructs dialog memory systems into core components and supports both graph-based and non-graph (flat) approaches. Under this framework, we perform staged refinement experiments on the LongMemEval and HaluMem datasets, identifying stable and strong baselines that enable fair comparisons and practical deployment of component choices.
This repository implements a unified framework for building and evaluating agentic, updatable memory systems for long-term dialog. Key capabilities:
- Support for flat and graph-based indexing - Includes mainstream non-graph (flat) and graph-based approaches for memory organization.
- Structured Memory Extraction — extract session-level summaries, salient keyphrases, and user facts.
- Dynamic Memory Management — decide when to create, update, or skip memory points using similarity search and LLM judgments.
- Flexible Backends — support multiple embedding, retrieval, and generation backends (Contriever, Stella, GTE, SentenceTransformers, OpenAI, BM25, gpt4o-mini, Lamma3.1-8B, OpenAI API server).
- Evaluation Pipelines — end-to-end tooling for LongMemEval and HaluMem: retrieval logs, recall metrics, and optional answer generation.
Requirements (suggested)
- Python 3.10+ (3.11 recommended)
- conda or pip + virtualenv/venv
- GPU + CUDA for local large-model experiments (optional)
Quick setup — using conda (recommended)
# Create and activate a conda environment (example)
conda create -n meminsight python=3.11 -y
conda activate meminsight
# Install dependencies
pip install -r requirements.txtEnvironment variable examples (Note: Environment variables may be overridden by settings in the configuration files)
# Set necessary environment variables
export OPENAI_API_KEY="your-api-key" # For OpenAI models
export HF_HOME="/path/to/cache" # Cache directory for HuggingFace modelsThis repository supports layered .env files. The loader checks (and loads) these files in order:
- Terminal environment variables
.envin the project root directory.envin subdirectories- Command line arguments
Recommended environment variables (see .env.example):
OPENAI_API_KEY: API key for OpenAI-compatible LLM backends.OPENAI_BASE_URL: Base URL for local or third-party vLLM/OpenAI-compatible server (e.g.http://localhost:8001/v1).LLM_MODEL: Default model for generation (e.g.gpt-4o-miniormeta-llama/Meta-Llama-3.1-8B-Instruct).EMBEDDING_MODEL/EMBEDDING_RETRIEVER: Defaults for embedding selection.EMBEDDING_API_URL,EMBEDDING_API_KEY: Parameters for third-party embedding models.
For detailed environment variable configurations, refer to the Configuration Files section.
LongMemEval (s/m) contains 500 questions, each associated with numerous dialogue sessions. Download guidance refer to the original repository. The default storage location for the dataset is {project_root}/data/longmemeval-cleaned.
Once downloaded, run:
python data_preprocessing/lme_deduplicate.pyto deduplicate session ids for each question (fix duplicate session metadata before indexing).
HaluMem evaluates memory systems' ability to handle hallucinations and memory updates.
Dataset Variants:
- HaluMem-Medium:
data/HaluMem/HaluMem-Medium.jsonl- Moderate conversation length - HaluMem-Long:
data/HaluMem/HaluMem-Long.jsonl- Extended conversations with distractors
To download HaluMem, follow the official repository link. The default storage location is {project_root}/data/HaluMem. In our experiments, we only considered HaluMem-Medium as it is already sufficiently challenging.
python data_preprocessing/lme_extract_summ.py
python data_preprocessing/lme_extract_keyphrase.py
python data_preprocessing/lme_extract_userfact.pyNote that you need to set the input data path and output expansion path in the extraction scripts or via their CLI arguments. For environment variable settings, refer to the Configuration and Configuration Files sections.
./scripts/lme_run_retrieval.shSupport no expansion and multiple expansions.
Key Arguments:
--retriever: Choose fromflat-bm25,flat-contriever,flat-stella,flat-gte,flat-openai--granularity: Currently supportssession(session-level retrieval)--index_expansion_method: Comma-separated list of expansion methods:none: No expansion (original sessions only)session-summ: Session summariessession-keyphrase: Session keyphrasessession-userfact: User facts from sessions- Combine multiple:
session-userfact,session-keyphrase,session-summ
--index_expansion_result_join_mode: How to combine expansions with original content:none: No expansionseparate: Keep expansions separate (retrieve from each)merge: Merge expansions with original sessions; two embeddings are computed based on the merged expansion and the original sessionmerge_raw: Merge raw sessions into expansions; only one embedding is computed
--index_expansion_result_cache: Comma-separated cache file paths (must match expansion methods)--use_raw_session_as_key: Also use original session dialogue for retrieval--value_expansion_join_mode: Expand retrieved values with expansion content (noneormerge)
Key Differences from Original LongMemEval:
- Support multiple expansions
- Support multiple expansion join methods
- Fix the issue in LongMemEval where the empty index expansions lead to incorrectly discarding the original session
- Fix the issue in LongMemEval where the altered session id (changing the 'answer' prefix) led to mismatched id lookups
python evals/lme_compute_recall.py \
--in_file <retrieval_log_file> \
--oracle_file data/longmemeval-cleaned/longmemeval_oracle_deduplicate.json \
--haystack_file data/longmemeval-cleaned/longmemeval_s_cleaned_deduplicate.json \
--out_file <out_file>python evals/lme_run_generation.py \
--in_file <retrieval_log_file> \
--model_name meta-llama/Meta-Llama-3.1-8B-Instruct \
--topk_context 5 \
--cot true \
--out_dir <output_directory>Generation Arguments:
--topk_context: Number of top retrieved contexts to use--history_format: Format for history (jsonornl)--useronly: Use only user utterances (trueorfalse)--cot: Enable chain-of-thought reasoning (trueorfalse)--merge_key_expansion_into_value: How to merge expansions (none,merge,replace)
Example:
python evals/lme_compute_qa.py gpt-4o <generation_output_file> data/longmemeval-cleaned/longmemeval_oracle_deduplicate.jsonGraph retrieval consists of two steps: first construct the graph, then run the graph retrieval script. Logically, flat's --retriever/--index_expansion_method corresponds to graph's --embedding/--graphrag-mode, etc.
# Construct the graph (if not already built)
./scripts/graph_lme_construct.sh \
--in-file data/longmemeval-cleaned/longmemeval_s_cleaned.json \
--out-dir data/graph_s-gpt-4o-mini \
--embedding text-embedding-3-small \
--entity-namespace openai_name_entities
# Run graph retrieval
./scripts/graph_lme_run_retrieval.sh \
--in-file data/longmemeval-cleaned/longmemeval_s_cleaned.json \
--out-dir results/graph_lme/ \
--embedding text-embedding-3-small \
--graphrag-mode entity,chunk,one-hot-expand \
--only-need-contextThe retrieval output of graph (e.g., graph_retrieval_results-*.json) has a structure compatible with flat retrieval logs, so evals/lme_compute_recall.py can be directly reused to compute recall.
python evals/lme_compute_recall.py \
--in_file <retrieval_log_file> \
--oracle_file data/longmemeval-cleaned/longmemeval_oracle_deduplicate.json \
--haystack_file data/longmemeval-cleaned/longmemeval_s_cleaned_deduplicate.json \
--out_file <out_file>When using generation (QA) for evaluation, if the input is the graph_retrieval_results-*.json obtained from graph retrieval, it can be passed to lme_run_generation.py. The generation process is identical to flat.
python evals/lme_run_generation.py \
--in_file <retrieval_log_file> \
--model_name meta-llama/Meta-Llama-3.1-8B-Instruct \
--topk_context 5 \
--cot true \
--out_dir <output_directory>
python evals/lme_compute_qa.py gpt-4o <generation_output_file> data/longmemeval-cleaned/longmemeval_oracle_deduplicate.json./scripts/halu_run.shKey Arguments:
Data Paths:
--data_path: Path to HaluMem dataset (.jsonlformat)--out_dir: Output directory for results--cache_dir: Cache directory for models
Memory System Configuration:
--embedding_model: Embedding model for retrieval (contriever,stella,gte,all-MiniLM-L6-v2, etc.)--retrieve_method: Memory retrieval strategy:merge: Combine all memory components (summary, keywords, facts) into single retrieverseparate: Separate retrievers for each component, then combine scoresmerge_raw: Merge with raw session text
--llm_model: LLM for memory operations (e.g.,meta-llama/Meta-Llama-3.1-8B-Instruct,gpt-4o-mini)--llm_backend: LLM backend (openaifor OpenAI-compatible APIs)--base_url: Base URL for LLM API (e.g.,http://localhost:8001/v1for vLLM)--temperature: Temperature for LLM generation (default:0.0)
Memory Operations:
--enable_update: Enable memory update operations (create/update/skip decisions); if False, memories are always added into the system.--keep_update_note: Keep original memory after update operation (default: True)--enable_link: Enable linking related memories by prompting an LLM--use_neighbour_memories: Use neighboring memories for to compose top-k context (must --enable_link)
Retrieval Configuration:
--top_k: Number of memories to retrieve for QA (default:20)--qa_retrieve_method: QA retrieval method:flatten: Flatten all memory points for retrieval, where individual embedding is computed for each point (default)default: Use retrieve_method setting
--use_raw_session_as_key: Include raw session session dialogue in retrieval; must be True if retrieve_method ismerge_raw--include_point_type: Include point type (summary/keyword/fact) in retrieved context
QA Configuration:
--qa_llm: LLM model for QA (defaults tollm_model)--qa_api_base: API base URL for QA LLM--skip_qa: Skip QA generation (only extract and retrieve memories)
Processing Configuration:
--version: Version identifier for output files--resume: Resume from existing progress--use_metadata_cache: Use cached extracted metadata--metadata_cache_dir: Directory for metadata extraction cache--device: Device for embedding model (auto-detect if not specified)
Multiprocessing:
--num_workers: Number of parallel workers for processing users (default:1)--gpu_ids: Comma-separated GPU IDs (e.g.,"0,1,2,3")
Default configurations suggested by HaluMem is provided in evals/.env/
python evals/halu_eval.py \
--file_path <structure_eval_results.jsonl># Construct and run graph retrieval
./scripts/graph_halu_construct.sh --in-file data/HaluMem/HaluMem-Medium.jsonl --llm-model gpt-4o-mini
./scripts/graph_halu_run_retrieval.sh \
--graph-root data/nc-graph_halu_mem_medium-4o-mini \
--out-dir results/graph_halu/ \
--embedding text-embedding-3-small \
--graphrag-mode entity,chunk,one-hot-expand \
--only-need-contextThe original evaluation code for HaluMem used an online approach, building the graph while recording experimental results. For modularity considerations, we recommend using an offline approach for this part of the evaluation. Offline evaluation for the graph method requires first generating intermediate files (e.g., add_memory_by_session.json) using the --mode parameter of evals/halu_graph_eval.py. Then, merge the retrieval results with these intermediate files to form the offline evaluation input, and finally run the evaluation script to compute metrics. Recommended steps:
- Generate add_memory (parse GraphML, output
add_memory_by_session.json)
python evals/halu_graph_eval.py --mode add_memory \
--graph_root data/nc-graph_halu_mem_medium-4o-mini \
--out_path <output_file_directory>This command writes add_memory_by_session.json to the location specified by --out_path (if --out_path is not provided, the script will default to writing under --graph_root).
- Specify the retrieval result file (filename or full path) with
--retrieve_file_path, or place the graph retrieval output (graph_retrieval_results-*.json) in the same directory (data/nc-graph_halu_mem_medium-4o-mini/). Then run the merge to generate the offline evaluation input (*_test_eval_results.jsonl). It is recommended to explicitly specify the output file--out_pathin the command line:
python evals/halu_graph_eval.py --mode gen_eval \
--graph_root data/nc-graph_halu_mem_medium-4o-mini \
--retrieve_file_path <retrieval_file_path> \
--out_path <output_file_directory> \
--dataset_path <HaluMem_dataset_path> \
[--use_entity]--retrieve_file_path: Path or filename of the retrieval result file (relative filenames will be searched under--graph_root)--out_path: Output directory. The script will write the result file inferred from the retrieval filename under this directory, e.g.,graph_retrieval_results-xxx.json→graph_retrieval_results-xxx_test_eval_results.jsonl. If a path ending with.jsonor.jsonlis provided, it is treated as a full file path (backward compatibility).--out_suffix: Suffix for the merged output file (default_test_eval_results.jsonl, used for inference only when a specific output filename is not provided)--dataset_path: Path to the HaluMem dataset--use_entity: Optional, enable entity-based context construction (otherwise use chunk-based context)
This step writes a merged JSONL file (one line per user's evaluation input) in the directory specified by --out_path (or the default location next to the retrieval file), to be used for downstream offline evaluation.
- Run Offline Evaluation
Provide the JSONL generated in the previous step to
evals/halu_eval.pyfor metric calculation. For example:
python evals/halu_eval.py --file_path <path/to/*_eval_results.jsonl>Depending on the actual file organization, you can also move the generated *_test_eval_results.jsonl to a separate results directory and point --file_path to its location.
Notes:
--modesupportsadd_memory(parse GraphML),gen_eval(merge and generate evaluation JSONL),test_llm(quick test LLM calls).- Ensure that the
--out_dirfromgraph_halu_run_retrieval.shor the location where you move/copy the retrieval results matches the path expected byhalu_graph_eval.py(data/nc-graph_halu_mem_medium-4o-mini/). Alternatively, adjust the script parameters/environment variablePROJECT_ROOTaccordingly to correctly read the retrieval results and generate offline evaluation files.
The evaluation scripts under the evals/ directory in this repository are common to both flat and graph retrieval/recall evaluations. The main differences lie in how the retrieval output is generated and the additional graph files (GraphML).
General Workflow (Applicable to LongMemEval and HaluMem):
- Run Retrieval (flat or graph) to generate retrieval logs (flat typically outputs retrieval logs in JSON/JSONL, graph outputs
graph_retrieval_results-*.jsonand additionally produces GraphML files as graph construction results). - Compute Recall: Use scripts like
evals/lme_compute_recall.py. - Generate Answers and Evaluate QA (Optional): Use the same generation scripts as flat (
lme_run_generation.py,evals/lme_compute_qa.pyorhalu_eval.py).
Key Points:
- Commonality: Both flat and graph produce retrieval logs that can be processed by the generic scripts under
evals/(the recall/QA pipeline can be reused). - Differences: The graph pipeline additionally outputs GraphML files as graph construction results. For HaluMem evaluation, an offline approach is used. The graph retrieval generation script names and output paths are typically different (scripts with the
graph_*prefix). - Recommendation: When comparing flat and graph, keep the same oracle/haystack input files and top_k settings to ensure comparability of evaluation results.
-
GraphRAG (
src/graph/graphrag.py): Core implementation of graph retrieval, responsible for graph storage selection, graph construction flow, clustering, and query parameter management. -
Graph Construction Entry Points (
src/graph/lme_construct_graph.py,src/graph/halu_construct_graph.py): Convert LongMemEval / HaluMem sessions or questions into graph representations and write to the working directory (including GraphML files and graph storage). -
Graph Retrieval Entry Points (
src/graph/lme_run_retrieval.py,src/graph/halu_run_retrieval.py): UseGraphRAGto perform graph-level queries, aggregate retrieval results on the graph, and output JSON retrieval logs. -
Entity and Graph Operations (
src/graph/entity_extraction/extract.py,src/graph/_op.py): Concrete implementations for extracting entities/relations from session text, merging nodes/edges, and linking to the knowledge graph. -
LLM and Embedding Adapters (
src/graph/_llm.py,src/graph/_utils.py): Embedding dispatch, adapter logic for calling OpenAI / local LLMs, and graph-related utility functions. -
Graph Storage Base (
src/graph/base.py): Abstract graph storage interface and basic implementations (e.g., NetworkX storage adapter, etc.) for reading/writing nodes/edges, querying, and clustering support.
Note: The GraphML files produced by the graph components can be used for offline inspection and visualization. evals/halu_graph_eval.py provides an evaluation flow based on GraphML.
-
AgenticMemorySystem (
agentic_memory_system.py): Core memory management system- Structured memory notes with summary, keywords, and facts
- Dynamic memory operations (create/update/merge)
- Multiple embedding model support
- Memory linking and neighbor-aware operations
-
LLMController (
llm_controller.py): LLM interface abstraction- Unified interface for different LLM backends
- Structured output parsing with retry logic
- Token counting and context management
-
StructuredMemoryRetriever (
halu_utils.py): HaluMem-specific wrapper- Session-based memory management
- Metadata extraction and caching
- Memory operation tracking
-
FlexibleEmbeddingRetriever (
agentic_memory_system.py): Retrieval backend- Multi-model support (Contriever, Stella, GTE, SentenceTransformer, OpenAI, BM25)
- Efficient batch encoding
- Similarity-based retrieval
merge: Combine all memory components into a single retrieval indexseparate: Maintain separate indices for summary, keywords, and facts, then aggregate scoresmerge_raw: Include raw session text along with structured components
When a new session is processed, the system:
- Extracts structured metadata (summary, keywords, facts)
- Retrieves similar existing memories
- LLM decides: create new memory, update existing, or skip (redundant)
- Updates are merged while preserving all original session IDs for evaluation
bash scripts/lme_run_retrieval.sh \
data/longmemeval-cleaned/longmemeval_m_cleaned.json \
flat-stella \
session \
session-userfact,session-keyphrase,session-summ \
merge \
"data/longmemeval-cleaned/expansions-llama3.1_8b/session-userfact.json,data/longmemeval-cleaned/expansions-llama3.1_8b/session-keyphrase.json,data/longmemeval-cleaned/expansions-llama3.1_8b/session-summ.json" \
llama-3.1-8b-instruct-ICL \
my_experiment# Run evaluation
bash scripts/halu_run.sh \
--dataset long \
--embedding-model stella \
--retrieve-method merge \
--llm-model meta-llama/Meta-Llama-3.1-8B-Instruct \
--base-url http://localhost:8001/v1 \
--enable-update \
--keep-update-note \
--use-metadata-cache \
--resume \
--version full_pipeline_expBelow shows how to use the scripts provided in the repository to construct graphs and run graph-based retrieval (for LongMemEval / HaluMem). The output includes GraphML files per question/session and graph_retrieval_results-*.json.
# Construct graph for LongMemEval
./scripts/graph_lme_construct.sh \
--in-file data/longmemeval-cleaned/longmemeval_s_cleaned.json \
--out-dir data/graph_s-gpt-4o-mini \
--embedding text-embedding-3-small \
--entity-namespace openai_name_entities
# Run graph-based retrieval for LongMemEval
./scripts/graph_lme_run_retrieval.sh \
--in-file data/longmemeval-cleaned/longmemeval_s_cleaned.json \
--out-dir results/graph_lme/ \
--embedding text-embedding-3-small \
--graphrag-mode entity,chunk,one-hot-expand \
--only-need-context
# Construct graph and run retrieval for HaluMem
./scripts/graph_halu_construct.sh \
--in-file data/HaluMem/HaluMem-Medium.jsonl \
--llm-model gpt-4o-mini
./scripts/graph_halu_run_retrieval.sh \
--graph-root data/nc-graph_halu_mem_medium-4o-mini \
--out-dir results/graph_halu/ \
--embedding text-embedding-3-small \
--graphrag-mode entity,chunk,one-hot-expand \
--only-need-contextNotes:
- Use the
--graphrag-modeflag (orGRAPHRAG_MODEenvironment variable) to select graph components (examples:entity,chunk,one-hot-expand,rank-entity). - The output directory contains GraphML files for offline inspection and
graph_retrieval_results-*.jsonretrieval logs for use byevals/halu_graph_eval.py.
Each line in the output file contains:
{
"question_id": "user123_session5_q1",
"question_type": "single_event",
"question": "What did I order for lunch?",
"answer": "You ordered a chicken salad.",
"question_date": "2024-03-15",
"haystack_dates": ["2024-03-10", "2024-03-12", ...],
"haystack_session_ids": ["user123_session1", "user123_session2", ...],
"answer_session_ids": ["user123_session3"],
"retrieval_results": {
"query": "What did I order for lunch?",
"ranked_items": [
{
"corpus_id": "user123_session3",
"text": "...",
"timestamp": "2024-03-12",
"is_original": true,
"expansion_type": "original"
},
...
]
}
}Results are saved in <out_file_path>/:
*_eval_results.jsonl: Per-question results with retrieved contexts and generated answersuser_*.json(intmp/): Intermediate user-level results with memory operations- Memory operation logs and statistics
This repository drives runtime configuration through environment variables and layered .env files. Configuration loading order:
- Terminal environment variables
.envin the project root directory.envin subdirectories (e.g.,evals/)- Command line arguments
Below are the commonly used environment variables in the repository (defaults can be found in .env.example in the repository root):
-
OPENAI_API_KEY: API Key for OpenAI or OpenAI-compatible backends (default: empty string).- Purpose: For calling OpenAI API, third-party compatible servers, or proxies (e.g., vLLM/OpenAI-compat services).
-
OPENAI_BASE_URL: Base URL for OpenAI-compatible services (default:http://localhost:8001/v1).- Purpose: Set when using self-hosted vLLM/OpenAI-compat services, e.g.,
http://localhost:8001/v1.
- Purpose: Set when using self-hosted vLLM/OpenAI-compat services, e.g.,
-
LLM_MODEL: Default model for generation (default:gpt-4o-mini).- Examples:
gpt-4o-mini,gpt-4o,meta-llama/Meta-Llama-3.1-8B-Instruct.
- Examples:
-
EMBEDDING_MODEL: Embedding model for vectorization (default:text-embedding-3-small). -
EMBEDDING_RETRIEVER: Embedding retriever selection (default:flat-openai). -
EMBEDDING_API_URL: Optional third-party embedding service URL (default: empty). -
EMBEDDING_API_KEY: API Key forEMBEDDING_API_URL(default: empty).- Note: When using an external embedding provider, the system POSTs
{ "model": ..., "input": [...] }and expects an OpenAI-like response structure.
- Note: When using an external embedding provider, the system POSTs
-
CACHE_DIR: Model/data cache directory (default:data/cache). -
NUM_WORKERS: Default number of processes for multiprocessing/parallel tasks (default:64, can be adjusted for data preprocessing scenarios). -
SAVE_EVERY: Checkpoint saving frequency for long tasks (default:256). -
LLM_TEMPERATURE: Default temperature for LLM inference (default:0.0). -
QA_LLM: Default LLM for QA (default:gpt-4o-mini). -
KEYPHRASE_MAX_TOKENS,KEYPHRASE_TEMPERATURE: Default parameters for keyphrase extraction (defaults:100,0.0respectively). -
SUMMARY_MAX_TOKENS,SUMMARY_TEMPERATURE: Default parameters for summary extraction (defaults:500,0.0respectively). -
USERFACT_MAX_TOKENS,USERFACT_TEMPERATURE: Default parameters for user fact extraction (defaults:2000,1.0respectively).
Example: Setting environment variables in Linux/macOS shell:
export OPENAI_API_KEY="your_api_key"
export OPENAI_BASE_URL="http://localhost:8001/v1"Recommendation: Place project-level defaults in the root directory .env file (the repository already includes .env.example), and place overriding .env files specific to sub-processes (e.g., evals/, data_preprocessing/) in the respective subdirectories.
Below explains the current repository's support for models:
Embedding Models:
contriever: Facebook Contrieverstella: Stella-1.5B-v5gte: GTE-Qwen2-7B-instructall-MiniLM-L6-v2,all-mpnet-base-v2: SentenceTransformersopenai: OpenAI embeddings (default to text-embedding-3-small/small)bm25: BM25 sparse retrieval
LLM Models:
- OpenAI models:
gpt-4o-mini,gpt-4, etc. - Local models via vLLM:
meta-llama/Meta-Llama-3.1-8B-Instruct, etc. - Third-party API services configured via environment variables
.
├── README.md # documentation
├── data/ # Raw data and processed outputs (HaluMem, LongMemEval, etc.)
├── data_preprocessing/ # Data preprocessing and expansion scripts
│ └── lme_deduplicate.py # LongMemEval deduplication example
├── src/ # Core code: flat and graph pipelines
│ ├── config.py # Environment and configuration loader (layered .env)
│ ├── flat/ # Non-graph (embedding-based) implementation
│ │ ├── agentic_memory_system.py
│ │ ├── halu_run.py
│ │ ├── halu_utils.py
│ │ └── llm_controller.py
│ └── graph/ # Graph-based pipeline (GraphRAG, etc.)
│ ├── graphrag.py
│ ├── lme_construct_graph.py
│ ├── halu_construct_graph.py
│ └── lme_run_retrieval.py
├── evals/ # Evaluation scripts (recall, QA, graph eval)
│ ├── halu_eval.py
│ ├── halu_graph_eval.py
│ ├── lme_compute_recall.py
│ └── lme_compute_qa.py
├── scripts/ # Example run scripts (construct/retrieve/evaluate)
│ ├── halu_run.sh
│ ├── lme_run_retrieval.sh
│ ├── graph_lme_construct.sh
│ └── graph_lme_run_retrieval.sh
├── model_cache/ # Local model cache (hub)
├── sample_data/ # Small sample data
└── README.assets/ # Documentation images/resources
Note: The above is a simplified view of the repository. The actual directory may contain additional scripts, configuration files, and output directories (e.g., data/*, model snapshots under checkpoints/, etc.).
To use local LLMs via vLLM:
# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Meta-Llama-3.1-8B-Instruct \
--port 8001 \
--tensor-parallel-size 1
# Run evaluation pointing to vLLM server
python halu_run.py \
--llm_model meta-llama/Meta-Llama-3.1-8B-Instruct \
--base_url http://localhost:8001/v1 \
...This work builds upon: