Added basic description of evaluation procedure to developer documentation

KarlLundengaard · KarlLundengaard · commit 1a94457d45f1 · 2025-03-19T11:05:24.000Z
diff --git a/app/docs/dev.md b/app/docs/dev.md
@@ -1,6 +1,16 @@
 # CompareExpressions
 This function utilises the [`SymPy`](https://docs.sympy.org/latest/index.html) to provide a maths-aware evaluation of a learner's response.
 
+## Architecture overview
+The execution of the evaluation function follows this pattern:
+
+- Determine context
+- Parse response and answer data
+- Parse criteria
+- Store input parameters, parsed responses in a key-value store that allows adding new fields, but not editing existing fields
+- Execute generation feedback procedure provided by the context to generate written feedback and tags
+- Serialise generated feedback and tags in a suitably formatted dictionary
+
 ## Evaluation function
 
 The main evaluation function is found in `evaluation.py` as has the following signature:
@@ -37,7 +47,7 @@ The overall flow of the evaluation procedure can be described as follows:
 ### Context
 
 The context is a data structure that contains at least the following seven pieces of information:
-- `default_parameters` A dictionary where the keys are parameter names and the values are the default values that the evaluation function will use unless another value is provided together with the response. The only required field is the 
+- `default_parameters` A dictionary where the keys are parameter names and the values are the default values that the evaluation function will use unless another value is provided together with the response. The required fields are context-dependent, currently all contexts use the default parameters found in `utility\expression_utilities.py` and the `physical_quantity` context adds a few extra fields, see the default parameters defined in `context\physical_quantity.py`.
 - `expression_parse` function that parses expressions (i.e. the `response` and `answer` inputs) into the form used by the feedback generation procedure.
 - `expression_preprocess` function that performs string manipulations that makes ensures that correctly written input expressions follows the conventions expected by `expression_parse`.
 - `expression_preview` is a function that generates a string that can be turned into a human-readable representation of how the evaluation function interpreted the response.
@@ -50,12 +60,13 @@ The context can also contain other fields if necessary.
 **Remark:** The current implementation uses a dictionary rather than a dedicated class for ease of iteration during the initial development phase.
 
 There are currently two different contexts:
-- `symbolic` Handles comparisons of various symbolic expressions. Defined in `context\symbolic.py`.
-- `physical_quantity` Handles comparisons of expressions involving units. Defined in `context\physical_quantity.py`.
+- `symbolic`: Handles comparisons of various symbolic expressions. Defined in `context\symbolic.py`.
+- `physical_quantity`: Handles comparisons of expressions involving units. Defined in `context\physical_quantity.py`.
 
 **Remark:** Handwritten expressions are sent as latex, which requires extra preprocessing before the right context can be determined in some cases. It should be considered whether a new context, perhaps called `handwritten`, should be created for this purpose.
 
-**TODO** Describe currently available contexts
+**TODO** Describe currently available contexts in detail
+
 #### `symbolic` - Comparison of symbolic expressions
 
 **Remark:** The `symbolic` context should probably be split into several smaller contexts, the following subdivision is suggested:
@@ -65,18 +76,138 @@ There are currently two different contexts:
 - `inequality`: Same as `equality` except for mathematical inequalities (which will require different choices when it comes to what can be considered equivalence). It might be appropriate to combine `equality` and `inequality` into one context (called `statements` or similar).
 - `collection`: Comparison of collections (e.g. sets, lists or intervals of the number line). Likely to consist mostly of code for handling comparison of individual elements using the other contexts, and configuring what counts as equivalence between different collections.
 
+##### `symbolic` Criteria commands and grammar
+Criteria 
+
+The criteria commands uses the following productions
+```
+    START -> BOOL
+    BOOL -> EQUAL
+    BOOL -> ORDER
+    BOOL -> EQUAL
+    BOOL -> EQUAL
+    BOOL -> RESERVED written as OTHER
+    BOOL -> RESERVED written as RESERVED
+    BOOL -> RESERVED contains OTHER
+    BOOL -> RESERVED contains RESERVED
+    EQUAL_LIST -> EQUAL;EQUAL
+    EQUAL_LIST -> EQUAL_LIST;EQUAL
+    EQUAL -> OTHER = OTHER
+    EQUAL -> RESERVED = OTHER
+    EQUAL -> OTHER = RESERVED
+    EQUAL -> RESERVED = RESERVED
+    EQUAL -> OTHER ORDER OTHER
+    EQUAL -> RESERVED ORDER OTHER
+    EQUAL -> OTHER ORDER RESERVED
+    EQUAL -> RESERVED ORDER RESERVED
+    OTHER -> RESERVED OTHER
+    OTHER -> OTHER RESERVED
+    OTHER -> OTHER OTHER
+```
+along the the following base tokens:
+
+- `START`: Formal token used to indicate the start of an expression (in practice: any expression that can be reduced to a single `START` is a parseable criterion).
+- `END`: Formal token that indicates the end of a tokenized string.
+- `NULL`: Formal token that denotes a token without meaning, should not appear when an expression is tokenized.
+- `BOOL`: Expression that can be reduced to either `True` or `False`.
+- `EQUAL`: Token that denotes symbolic equality between the mathematical expressions.
+- `EQUALITY`: Token that denotes the equality operator `=`.
+- `EQUAL_LIST`: Token that denotes a list of equalities.
+- `RESERVED`: Token that denotes a formal name for a reserved name for an expression. Reserved names include `response` and `answer`.
+- `ORDER`: Token that denotes an order operator. Order operators include `>`, `<`, `>=` and `<=`.
+- `WHERE`: Token that denotes the separation of a criteria and a list of equalities that describe substitutions that should be done before the criteria is checked.
+- `WRITTEN_AS`: Token that denotes that syntactical comparison should be done.
+- `CONTAINS`: Token that denotes that a mathematical expression is dependent on a symbol or subexpression.
+- `SEPARATOR`: Token that denotes which symbol is used to separate a the list of equalities used by `WHERE`.
+- `OTHER`: Token that denotes any substring that will be passed on for more context specific parsing (e.g. explicit mathematical expressions for symbolic comparisons).
+
+##### Examples of commonly used criteria
+
+**TODO** Add examples
+
 #### `physical_quantity` - Comparison of expressions that involve units
 
+##### `physical_quantity` Criteria commands and grammar
+
+The criteria commands uses the following productions
+```
+    START -> BOOL
+    BOOL -> EQUAL
+    BOOL -> ORDER
+    BOOL -> EQUAL where EQUAL
+    BOOL -> EQUAL where EQUAL_LIST
+    BOOL -> RESERVED written as OTHER
+    BOOL -> RESERVED written as RESERVED
+    BOOL -> RESERVED contains OTHER
+    BOOL -> RESERVED contains RESERVED
+    EQUAL_LIST -> EQUAL;EQUAL
+    EQUAL_LIST -> EQUAL_LIST;EQUAL
+    EQUAL -> OTHER = OTHER
+    EQUAL -> RESERVED = OTHER
+    EQUAL -> OTHER = RESERVED
+    EQUAL -> RESERVED = RESERVED
+    EQUAL -> OTHER ORDER OTHER
+    EQUAL -> RESERVED ORDER OTHER
+    EQUAL -> OTHER ORDER RESERVED
+    EQUAL -> RESERVED ORDER RESERVED
+    OTHER -> RESERVED OTHER
+    OTHER -> OTHER RESERVED
+    OTHER -> OTHER OTHER
+```
+along the the following base tokens:
+
+- `START`: Formal token used to indicate the start of an expression (in practice: any expression that can be reduced to a single `START` is a parseable criterion).
+- `END`: Formal token that indicates the end of a tokenized string.
+- `NULL`: Formal token that denotes a token without meaning, should not appear when an expression is tokenized.
+- `BOOL`: Expression that can be reduced to either `True` or `False`.
+- `QUANTITY`: Token that denotes a physical quantity, that can be either given as both a value and units, only value (i.e. a dimensionless quantity) or only units.
+- `DIMENSION`: Token that denotes an expression only containing physical dimensions.
+- `START_DELIMITER`: Token that denotes a list of equalities.
+- `INPUT`: Token that denotes any substring that will be passed on for more context specific parsing (e.g. explicit mathematical expressions for symbolic comparisons).
+- `matches`: Token for operator that checks in two quantities match, i.e. if they are rewritten using the same units, are their values equal (up to chosen tolerance).
+- `dimension`: Token for expression only involving dimensions (i.e. no values or units).
+- `=`: Token for operator that checks equality (i.e. compares if value and units are identical separately)
+- `<=`: Token for operator that checks if a quantity's value is less than or equal to another quantity's value (after both quantities are rewritten on the same units)
+- `>=`: Token for operator that checks if a quantity's value is greater than or equal to another quantity's value (after both quantities are rewritten on the same units)
+- `<`: Token for operator that checks if a quantity's value is less than another quantity's value (after both quantities are rewritten on the same units)
+- `>`: Token for operator that checks if a quantity's value is greater than another quantity's value (after both quantities are rewritten on the same units)
+
+##### Examples of commonly used criteria
+
+**TODO** Add examples
+
 #### Code shared between different contexts
 
-### Criteria
+##### Expression parsing
+
+**TODO** Describe shared code for expression preprocessing and parsing
+
+**TODO** Describe shared code for expression parsing parameters
 
-**TODO** Describe currently available criteria
+##### Other shared code
+
+**TODO** Describe shared default parameters
+
+## Feedback and tag generation
+
+- Generate feedback procedures from criteria, each procedure return a boolean that indicates whether the corresponding criterion is satisfied or not, a string intended to be shown to the student, and a list of tags indicating what was found when checking the criteria
+- For each criterion; run the corresponding procedure and store the result, the feedback string and the list of tags
+- If all criteria are found to be true, then the response is considered correct
+
+### Tag conventions
+The feedback procedures consists of a series of function calls, the specifics are determined by the particular criteria, that each return a list of strings (called tags). Each tag then indicates what further function calls must be performed to continue the evaluation, as well as what feedback string (if any) should be generated. When there are no remaining function calls the feedback procedure is completed. The tags are formatted according as *criteria*`_`*name of function call outcome*. For tags that are not connected to a specific criteria (e.g. tags that indicate an issue with expression parsing) the criteria name and underscore is omitted.
+
+## Returning final results
+The function returns result dictionary with the following fields:
+- `is_correct` is a boolean value that is set to `True` is all criteria are satisfied
+- `feedback` is a string that is created by joining all strings generated by the feedback procedures with a line break between each string.
+- `tags` is a list of strings that is generated by joining all lists of tags generated by feedback procedures and removing duplicates.
 
-#### Criteria command and grammar
+# Preview function
 
-#### Examples of commonly used criteria
+When the evaluation function preview is called the code in `preview.py` will be executed. Since different contexts interpret responses in different ways they also have their own preview functions. The context-specific preview functions can be found in `preview_implementations`.
 
-### Feedback generation
+**Remark**: Since it is likely that there will be significant overlap between the response preview and the response evaluation (e.g. code for parsing and interpreting the response), it is good practice if they can share as much code as possible to ensure consistency. For this reason it might be better to move the preview functions fully inside the context (either by making a `preview` subfolder in the `context` folder, or by moving the implementation of the preview function inside the context files themselves). In this case the `preview.py` and `evaluation.py` could also share the same code for determining the right context to use.
 
-### Returning final results
+# Tests
+There are two main groups of tests, evaluation tests and preview tests. The evaluation test can be run by calling `evaluation_tests.py`