smart_importer

https://github.com/beancount/smart_importer

Augments Beancount importers with machine learning functionality.

Status

Working protoype, development status: beta

Installation

The smart_importer can be installed from PyPI:

pip install smart_importer

Quick Start

This package provides import hooks that can modify the imported entries. When running the importer, the existing entries will be used as training data for a machine learning model, which will then predict entry attributes.

The following example shows how to apply the PredictPostings hook to an existing CSV importer:

from beangulp.importers import csv
from beangulp.importers.csv import Col

from smart_importer import PredictPostings


class MyBankImporter(csv.Importer):
    '''Conventional importer for MyBank'''

    def __init__(self, *, account):
        super().__init__(
            {Col.DATE: 'Date',
             Col.PAYEE: 'Transaction Details',
             Col.AMOUNT_DEBIT: 'Funds Out',
             Col.AMOUNT_CREDIT: 'Funds In'},
            account,
            'EUR',
            (
                'Date, Transaction Details, Funds Out, Funds In'
            )
        )


CONFIG = [
    MyBankImporter(account='Assets:MyBank:MyAccount'),
]

HOOKS = [
    PredictPostings().hook
]

Documentation

This section explains in detail the relevant concepts and artifacts needed for enhancing Beancount importers with machine learning.

Beancount Importers

Let's assume you have created an importer for "MyBank" called MyBankImporter:

class MyBankImporter(importer.Importer):
    """My existing importer"""
    # the actual importer logic would be here...

Note: This documentation assumes you already know how to create Beancount/Beangulp importers. Relevant documentation can be found in the beancount import documentation. With the functionality of beangulp, users can write their own importers and use them to convert downloaded bank statements into lists of Beancount entries. Examples are provided as part of beangulps source code under examples/importers.

smart_importer only works by appending onto incomplete single-legged postings (i.e. It will not work by modifying postings with accounts like "Expenses:TODO"). The extract method in the importer should follow the latest interface and include an existing_entries argument.

Using smart_importer as a beangulp hook

Beangulp has the notation of hooks, for some detailed example see beangulp hook example <https://github.com/beancount/beangulp/blob/ead8a2517d4f34c7ac7d48e4ef6d21a88be7363c/examples/import.py#L50>. This can be used to apply smart importer to all importers.

PredictPostings - predict the list of postings.
PredictPayees- predict the payee of the transaction.

For example, to convert an existing MyBankImporter into a smart importer:

from your_custom_importer import MyBankImporter
from smart_importer import PredictPayees, PredictPostings

CONFIG = [
    MyBankImporter('whatever', 'config', 'is', 'needed'),
]

HOOKS = [
    PredictPostings().hook,
    PredictPayees().hook
]

Wrapping an importer to become a smart_importer

Instead of using a beangulp hook, it's possible to wrap any importer to become a smart importer, this will modify only this importer.

PredictPostings - predict the list of postings.
PredictPayees- predict the payee of the transaction.

For example, to convert an existing MyBankImporter into a smart importer:

from your_custom_importer import MyBankImporter
from smart_importer import PredictPayees, PredictPostings

CONFIG = [
    PredictPostings().wrap(
        PredictPayees().wrap(
            MyBankImporter('whatever', 'config', 'is', 'needed')
        )
    ),
]

HOOKS = [
]

Specifying Training Data

The smart_importer hooks need training data, i.e. an existing list of transactions in order to be effective. Training data can be specified by calling bean-extract with an argument that references existing Beancount transactions, e.g., import.py extract -e existing_transactions.beancount. When using the importer in Fava, the existing entries are used as training data automatically.

Usage with Fava

Smart importers play nice with Fava. This means you can use smart importers together with Fava in the exact same way as you would do with a conventional importer. See Fava's help on importers for more information.

Development

Pull requests welcome!

Executing the Unit Tests

Simply run (requires tox):

make test

Configuring Logging

Python's logging module is used by the smart_importer module. The according log level can be changed as follows:

import logging
logging.getLogger('smart_importer').setLevel(logging.DEBUG)

Using Tokenizer

Custom tokenizers can let smart_importer support more languages, eg. Chinese.

If you looking for Chinese tokenizer, you can follow this example:

First make sure that jieba is installed in your python environment:

pip install jieba

In your importer code, you can then pass jieba to be used as tokenizer:

from smart_importer import PredictPostings
import jieba

jieba.initialize()
tokenizer = lambda s: list(jieba.cut(s))

predictor = PredictPostings(string_tokenizer=tokenizer)

Privacy

smart_importer uses machine learning (artificial intelligence, AI) algorithms in an ethical, privacy-conscious way: All data processing happens on the local machine; no data is sent to or retrieved from external servers or the cloud. All the code, including the machine learning implementation, is open-source.

Model: The machine learning model used in smart_importer is a classification model. The goal of the classification model is to predict transaction attributes, such as postings/accounts and payee names, in order to reduce the manual effort when importing transactions. The model is implemented using the open-source scikit-learn library, specifically using scikit-learn's SVC (support vector machine) implementation.

Training data: The model is trained on historical transactions from your Beancount ledger. This training happens on-the-fly when the import process is started, by reading existing_entries from the importer. The trained model is used locally on your machine during the import process, as follows.

Input: The input data are the transactions to be imported. Typically, these are transactions with a single posting, where one posting (e.g., the bank account) is known and the other one is missing.

Output: The output data are transactions with predicted second postings and/or other predicted transaction attributes.

Accuracy and Feedback Loops: The effectiveness of the model depends on the volume and diversity of your historical data — small or homogeneous datasets may result in poor predictions. Predictions are made automatically when importing new transactions, but users should always review them for accuracy before committing them to the ledger. Users can manually adjust predictions (e.g., change the payee or account) and save the corrected transactions to their ledger. These corrections are then used as training data for future predictions, allowing the accuracy to improve over time.

The smart_importer project is fully open source, meaning you can inspect and modify the code as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.github/workflows		.github/workflows
smart_importer		smart_importer
tests		tests
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGES		CHANGES
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst
pylintrc		pylintrc
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

smart_importer

Status

Installation

Quick Start

Documentation

Beancount Importers

Using smart_importer as a beangulp hook

Wrapping an importer to become a smart_importer

Specifying Training Data

Usage with Fava

Development

Executing the Unit Tests

Configuring Logging

Using Tokenizer

Privacy

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

License

beancount/smart_importer

Folders and files

Latest commit

History

Repository files navigation

smart_importer

Status

Installation

Quick Start

Documentation

Beancount Importers

Using smart_importer as a beangulp hook

Wrapping an importer to become a smart_importer

Specifying Training Data

Usage with Fava

Development

Executing the Unit Tests

Configuring Logging

Using Tokenizer

Privacy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages