-
Notifications
You must be signed in to change notification settings - Fork 1
Questions generator & validator #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kiyro7
wants to merge
24
commits into
master
Choose a base branch
from
questions_generator
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
835b5ce
initial commit
kiyro7 39d54cf
first prototype
kiyro7 d813c88
added LLM questions marker
kiyro7 8e42c8e
removed methodology
kiyro7 48ed43c
requirements.txt added versions
kiyro7 1bcf046
simplified docker
kiyro7 8a54af1
heuristic patterns update
kiyro7 d7a57d7
updated questions ranking and added examples
kiyro7 e20a3e0
docker-compose finally done
kiyro7 5694ae7
interactive mode
kiyro7 6ec4877
logging added
kiyro7 0b28da7
logging update
kiyro7 39f7626
docker fix (builds aprox 40 mins)
kiyro7 bee9a7a
fixed heuristic questions generation
kiyro7 a16784b
clearing
kiyro7 c2df6e4
created static folder
kiyro7 666535d
full logs refactor and translation to russian
kiyro7 e92b6ac
stashed new question generator code for future updates
kiyro7 e7e72da
docker update - llm & stuff and code separated
kiyro7 21b960b
global question generator refactor
kiyro7 a89eb4d
added instructions
kiyro7 dc1cbd4
docx_parser from document_insight_system prototype (works) & docker f…
kiyro7 9adab60
testing paragraphs max nesting (founded max depth - 1)
kiyro7 7db5faf
first prototype of chapters detection (works)
kiyro7 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| FROM python:3.10-slim | ||
|
|
||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| git wget gcc g++ \ | ||
| libprotobuf-dev protobuf-compiler \ | ||
| libreoffice-core \ | ||
| libreoffice-writer \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \ | ||
| PIP_DEFAULT_TIMEOUT=120 | ||
|
|
||
| COPY requirements.txt ./ | ||
| RUN pip install --no-cache-dir -r requirements.txt | ||
|
|
||
| COPY . . | ||
|
|
||
| CMD ["bash"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,94 @@ | ||
| ## Запуск | ||
|
|
||
| ### Сборка | ||
| - `docker-compose up init-llm` - загружается модель, устанавливаются зависимости для LLM-модуля | ||
|
|
||
| ### Использование | ||
| - `docker-compose up llm` - поднимаем контейнер с LLM-модулем | ||
| - `docker-compose up app` - поднимаем контейнер с основным приложением | ||
| - `docker compose exec app python run.py /app/static/vkr_examples/VKR1.docx --no-overflow-logs` - запускаем генерацию вопросов по файлу ВКР | ||
|
|
||
|
|
||
| ## Пример сгенерированных вопросов по тексту ВКР | ||
|
|
||
| [✔ OK] Как цель и задачи, сформулированные во введении, отражены в итоговых выводах заключения? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Какие термины и подходы из обзора предметной области легли в основу формальной постановки задачи? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✖ FAIL] В каких требованиях к решению, указанных в постановке задачи, находят отражение цели работы? | ||
| - релевантность: False | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✖ FAIL] Какие количественные или качественные свойства решения подтверждены в разделе «Исследования» и как они связаны с задачами введения? | ||
| - релевантность: True | ||
| - ясность: False | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Как практическая значимость работы следует из задач и результатов исследования? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Какие ограничения метода решения указаны в тексте и как они влияют на достижение цели? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| --- rut5-base-multitask вопросы --- | ||
|
|
||
| [✖ FAIL] Что представляет собой актуальную задачу? | ||
| - релевантность: True | ||
| - ясность: False | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Что позволит пользователю получать информацию о объектах, расположенных на карте? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Что позволяет принимать более обоснованные и оптимальные решения? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✖ FAIL] Что определяет положение точек на эллипсоиде? | ||
| - релевантность: True | ||
| - ясность: False | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Что определяет относительное положение точки на плоскости? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✔ OK] В какой системе координат лежит географическая система координат? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✔ OK] По проблемной ориентации: Универсальные географические решают общие проблемы? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Какие ГИС создаются по масштабу 1: 4 000 000 и меньше? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False | ||
|
|
||
| [✖ FAIL] Для чего предназначена специализированная ГИС? | ||
| - релевантность: True | ||
| - ясность: False | ||
| - сложность:False | ||
|
|
||
| [✔ OK] Для сравнения представленных ГИС выбраны следующие критерии? | ||
| - релевантность: True | ||
| - ясность: True | ||
| - сложность:False |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| services: | ||
| init-llm: | ||
| build: ./llm_service | ||
| entrypoint: [ "/bin/sh", "-c" ] | ||
| command: [ "/usr/local/bin/init-volumes.sh" ] | ||
| volumes: | ||
| - rut5_model:/models/rut5 | ||
| restart: "no" | ||
|
|
||
| llm: | ||
| build: ./llm_service | ||
| depends_on: | ||
| init-llm: | ||
| condition: service_completed_successfully | ||
| volumes: | ||
| - rut5_model:/models/rut5 | ||
| ports: | ||
| - "8000:8000" | ||
|
|
||
| app: | ||
| build: . | ||
| stdin_open: true | ||
| tty: true | ||
| volumes: | ||
| - nltk_data:/nltk_data | ||
| - ./static/vkr_examples:/app/static/vkr_examples | ||
| command: ["bash", "-lc", "sleep infinity"] | ||
|
|
||
| volumes: | ||
| rut5_model: | ||
| nltk_data: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # Файлы взяты 26.01.2026 с https://github.com/moevm/document_insight_system/tree/master/app/main/reports | ||
|
|
||
| ### Staff | ||
| python -m document_parsers.document --help | ||
| python -m document_parsers.docx_uploader docx_parser --help | ||
|
|
||
| (open() видит так файл, а docx.Document - нет) | ||
| python -m document_parsers.docx_uploader docx_parser --file static/vkr_examples/VKR1.docx | ||
|
|
||
|
|
||
|
|
||
| # Запуск и тестирование | ||
|
|
||
| Пререквизиты: `argparse`, `python-docx`, `docx2python`, `re`, `subprocess`, `markdown`. Для парсинга `.doc`-файлов потребуется | ||
| LibreOffice. | ||
|
|
||
| Здесь и далее считается, что корневая директория репозитория добавлена в `PYTHONPATH`. | ||
|
|
||
| Код проверки текстовых документов разбит по python-пакетам: | ||
|
|
||
| ## `docx_uploader` | ||
|
|
||
| Proof-of-concept парсинг файлов `.docx` с выводом структуры | ||
| файла в текстовом виде в stdout. | ||
|
|
||
| Запуск: `python3 -m app.main.mse22.docx_uploader [--help|-h] docx_parser --file <path_to_docx_file>` | ||
|
|
||
| Конкретные примеры: | ||
|
|
||
| `python3 -m app.main.mse22.docx_uploader docx_parser --file ~/my/beatiful/file.docx` | ||
| – парсинг файла `~/my/beatiful/file.docx`; | ||
|
|
||
| `python3 -m app.main.mse22.docx_uploader --help` | ||
| – вызов краткой справки; | ||
|
|
||
| `python3 -m app.main.mse22.docx_uploader docx_parser --file ~/my/beatiful/file.docx > /dev/null && echo $?` | ||
| – проверка безошибочной работы пакета на файле `~/my/beatiful/file.docx` без | ||
| вывода содержимого файла. | ||
|
|
||
| ## `doc` | ||
|
|
||
| Перевод файлов `.doc`, `.odt` в `.docx` с помощью сторонней программы (LibreOffice) с целью дальнейшего парсинга. | ||
|
|
||
| `python3 -m app.main.mse22.converter_to_docx convert --filename <path_to_file>` | ||
|
|
||
| Пример: `python3 -m app.main.mse22.converter_to_docx convert --filename ~/my/beatiful/file.doc` | ||
|
|
||
| ## `document` | ||
|
|
||
| Парсинг файлов с созданием вспомогательных структур, которые будут | ||
| использоваться для проверки документов, с печатью результата в stdout. | ||
|
|
||
| Запуск: `python3 -m app.main.mse22.document [-h|--help] --filename <path_to_docx_file> --type <type_of_file>` | ||
|
|
||
| Тип файла: | ||
|
|
||
| - LR - Лабораторная работа | ||
| - FWQ - Выпускная квалификационная работа | ||
|
|
||
| Конкретные примеры: | ||
|
|
||
| `python3 -m app.main.mse22.document --help` | ||
| – вызов краткой справки; | ||
|
|
||
| `python3 -m app.main.mse22.document --filename ~/my/beatiful/file.docx --type FWQ` | ||
| – парсинг файла `~/my/beatiful/file.docx`; | ||
|
|
||
| `python3 -m app.main.mse22.document --filename ~/my/beatiful/file.docx --type FWQ > /dev/null && echo $?` | ||
| – проверка безошибочной работы пакета на файле `~/my/beatiful/file.docx` без | ||
| вывода содержимого файла. | ||
|
|
||
| ## `PDF` | ||
|
|
||
| Получаем текст по страницам из файла с помощью конвертации файла в pdf. | ||
|
|
||
| ```bash | ||
| $ python3 -m app.main.mse22.pdf_document text_from_pages --filename path_to_file | ||
| ``` | ||
| ## `MD` | ||
|
|
||
| Парсинг файлов `.md` с выводом структуры файла в текстовом виде в stdout. | ||
|
|
||
| ```bash | ||
| $ python3 -m app.main.reports.md_uploader md_parser --mdfile path_to_md_file | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| import os | ||
| import subprocess | ||
| from os.path import dirname | ||
|
|
||
|
|
||
| def run_process(cmd: str): return subprocess.run(cmd.split(' ')) | ||
|
|
||
|
|
||
| def convert_to(filepath, target_format='pdf'): | ||
| new_filename, outdir = None, dirname(filepath) | ||
| convert_cmd = { | ||
| 'pdf': f"soffice --headless --convert-to pdf --outdir {outdir} {filepath}", | ||
| 'docx': f"soffice --headless --convert-to docx --outdir {outdir} {filepath}", | ||
| 'pptx': f"soffice --headless --convert-to pptx --outdir {outdir} {filepath}", | ||
| }[target_format] | ||
|
|
||
| if run_process(convert_cmd).returncode == 0: | ||
| # success conversion | ||
| new_filename = "{}.{}".format(filepath.rsplit('.', 1)[0], target_format) | ||
|
|
||
| return new_filename | ||
|
|
||
|
|
||
| def open_file(filepath, remove=False): | ||
| file = open(filepath, 'rb') | ||
| if remove: os.remove(filepath) | ||
| return file |
2 changes: 2 additions & 0 deletions
2
app/questions_generator/document_parsers/document/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| from .chapter import * | ||
| from .document import Document |
20 changes: 20 additions & 0 deletions
20
app/questions_generator/document_parsers/document/__main__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| import argparse | ||
|
|
||
| from .document import main as document_main | ||
|
|
||
|
|
||
| def parse_args(): | ||
| parser = argparse.ArgumentParser(description="File name") | ||
| parser.add_argument("--filename", type=str, required=True, help="path to .docx file") | ||
| parser.add_argument("--type", type=str, required=True, help="LR or FWQ") | ||
| parser.set_defaults(func=document_main) | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def main(): | ||
| args = parse_args() | ||
| args.func(args) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
3 changes: 3 additions & 0 deletions
3
app/questions_generator/document_parsers/document/chapter/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| from .chapter import Chapter | ||
| from .chapter_creator import ChapterCreator | ||
| from .chapter_object import * |
4 changes: 4 additions & 0 deletions
4
app/questions_generator/document_parsers/document/chapter/chapter.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| class Chapter: | ||
| def __init__(self, title, objects): | ||
| self.header = title | ||
| self.pageObjects = objects |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.