Skip to content

Commit 1a94457

Browse files
Added basic description of evaluation procedure to developer documentation
1 parent 47ce634 commit 1a94457

File tree

1 file changed

+141
-10
lines changed

1 file changed

+141
-10
lines changed

app/docs/dev.md

Lines changed: 141 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
# CompareExpressions
22
This function utilises the [`SymPy`](https://docs.sympy.org/latest/index.html) to provide a maths-aware evaluation of a learner's response.
33

4+
## Architecture overview
5+
The execution of the evaluation function follows this pattern:
6+
7+
- Determine context
8+
- Parse response and answer data
9+
- Parse criteria
10+
- Store input parameters, parsed responses in a key-value store that allows adding new fields, but not editing existing fields
11+
- Execute generation feedback procedure provided by the context to generate written feedback and tags
12+
- Serialise generated feedback and tags in a suitably formatted dictionary
13+
414
## Evaluation function
515

616
The main evaluation function is found in `evaluation.py` as has the following signature:
@@ -37,7 +47,7 @@ The overall flow of the evaluation procedure can be described as follows:
3747
### Context
3848

3949
The context is a data structure that contains at least the following seven pieces of information:
40-
- `default_parameters` A dictionary where the keys are parameter names and the values are the default values that the evaluation function will use unless another value is provided together with the response. The only required field is the
50+
- `default_parameters` A dictionary where the keys are parameter names and the values are the default values that the evaluation function will use unless another value is provided together with the response. The required fields are context-dependent, currently all contexts use the default parameters found in `utility\expression_utilities.py` and the `physical_quantity` context adds a few extra fields, see the default parameters defined in `context\physical_quantity.py`.
4151
- `expression_parse` function that parses expressions (i.e. the `response` and `answer` inputs) into the form used by the feedback generation procedure.
4252
- `expression_preprocess` function that performs string manipulations that makes ensures that correctly written input expressions follows the conventions expected by `expression_parse`.
4353
- `expression_preview` is a function that generates a string that can be turned into a human-readable representation of how the evaluation function interpreted the response.
@@ -50,12 +60,13 @@ The context can also contain other fields if necessary.
5060
**Remark:** The current implementation uses a dictionary rather than a dedicated class for ease of iteration during the initial development phase.
5161

5262
There are currently two different contexts:
53-
- `symbolic` Handles comparisons of various symbolic expressions. Defined in `context\symbolic.py`.
54-
- `physical_quantity` Handles comparisons of expressions involving units. Defined in `context\physical_quantity.py`.
63+
- `symbolic`: Handles comparisons of various symbolic expressions. Defined in `context\symbolic.py`.
64+
- `physical_quantity`: Handles comparisons of expressions involving units. Defined in `context\physical_quantity.py`.
5565

5666
**Remark:** Handwritten expressions are sent as latex, which requires extra preprocessing before the right context can be determined in some cases. It should be considered whether a new context, perhaps called `handwritten`, should be created for this purpose.
5767

58-
**TODO** Describe currently available contexts
68+
**TODO** Describe currently available contexts in detail
69+
5970
#### `symbolic` - Comparison of symbolic expressions
6071

6172
**Remark:** The `symbolic` context should probably be split into several smaller contexts, the following subdivision is suggested:
@@ -65,18 +76,138 @@ There are currently two different contexts:
6576
- `inequality`: Same as `equality` except for mathematical inequalities (which will require different choices when it comes to what can be considered equivalence). It might be appropriate to combine `equality` and `inequality` into one context (called `statements` or similar).
6677
- `collection`: Comparison of collections (e.g. sets, lists or intervals of the number line). Likely to consist mostly of code for handling comparison of individual elements using the other contexts, and configuring what counts as equivalence between different collections.
6778

79+
##### `symbolic` Criteria commands and grammar
80+
Criteria
81+
82+
The criteria commands uses the following productions
83+
```
84+
START -> BOOL
85+
BOOL -> EQUAL
86+
BOOL -> ORDER
87+
BOOL -> EQUAL
88+
BOOL -> EQUAL
89+
BOOL -> RESERVED written as OTHER
90+
BOOL -> RESERVED written as RESERVED
91+
BOOL -> RESERVED contains OTHER
92+
BOOL -> RESERVED contains RESERVED
93+
EQUAL_LIST -> EQUAL;EQUAL
94+
EQUAL_LIST -> EQUAL_LIST;EQUAL
95+
EQUAL -> OTHER = OTHER
96+
EQUAL -> RESERVED = OTHER
97+
EQUAL -> OTHER = RESERVED
98+
EQUAL -> RESERVED = RESERVED
99+
EQUAL -> OTHER ORDER OTHER
100+
EQUAL -> RESERVED ORDER OTHER
101+
EQUAL -> OTHER ORDER RESERVED
102+
EQUAL -> RESERVED ORDER RESERVED
103+
OTHER -> RESERVED OTHER
104+
OTHER -> OTHER RESERVED
105+
OTHER -> OTHER OTHER
106+
```
107+
along the the following base tokens:
108+
109+
- `START`: Formal token used to indicate the start of an expression (in practice: any expression that can be reduced to a single `START` is a parseable criterion).
110+
- `END`: Formal token that indicates the end of a tokenized string.
111+
- `NULL`: Formal token that denotes a token without meaning, should not appear when an expression is tokenized.
112+
- `BOOL`: Expression that can be reduced to either `True` or `False`.
113+
- `EQUAL`: Token that denotes symbolic equality between the mathematical expressions.
114+
- `EQUALITY`: Token that denotes the equality operator `=`.
115+
- `EQUAL_LIST`: Token that denotes a list of equalities.
116+
- `RESERVED`: Token that denotes a formal name for a reserved name for an expression. Reserved names include `response` and `answer`.
117+
- `ORDER`: Token that denotes an order operator. Order operators include `>`, `<`, `>=` and `<=`.
118+
- `WHERE`: Token that denotes the separation of a criteria and a list of equalities that describe substitutions that should be done before the criteria is checked.
119+
- `WRITTEN_AS`: Token that denotes that syntactical comparison should be done.
120+
- `CONTAINS`: Token that denotes that a mathematical expression is dependent on a symbol or subexpression.
121+
- `SEPARATOR`: Token that denotes which symbol is used to separate a the list of equalities used by `WHERE`.
122+
- `OTHER`: Token that denotes any substring that will be passed on for more context specific parsing (e.g. explicit mathematical expressions for symbolic comparisons).
123+
124+
##### Examples of commonly used criteria
125+
126+
**TODO** Add examples
127+
68128
#### `physical_quantity` - Comparison of expressions that involve units
69129

130+
##### `physical_quantity` Criteria commands and grammar
131+
132+
The criteria commands uses the following productions
133+
```
134+
START -> BOOL
135+
BOOL -> EQUAL
136+
BOOL -> ORDER
137+
BOOL -> EQUAL where EQUAL
138+
BOOL -> EQUAL where EQUAL_LIST
139+
BOOL -> RESERVED written as OTHER
140+
BOOL -> RESERVED written as RESERVED
141+
BOOL -> RESERVED contains OTHER
142+
BOOL -> RESERVED contains RESERVED
143+
EQUAL_LIST -> EQUAL;EQUAL
144+
EQUAL_LIST -> EQUAL_LIST;EQUAL
145+
EQUAL -> OTHER = OTHER
146+
EQUAL -> RESERVED = OTHER
147+
EQUAL -> OTHER = RESERVED
148+
EQUAL -> RESERVED = RESERVED
149+
EQUAL -> OTHER ORDER OTHER
150+
EQUAL -> RESERVED ORDER OTHER
151+
EQUAL -> OTHER ORDER RESERVED
152+
EQUAL -> RESERVED ORDER RESERVED
153+
OTHER -> RESERVED OTHER
154+
OTHER -> OTHER RESERVED
155+
OTHER -> OTHER OTHER
156+
```
157+
along the the following base tokens:
158+
159+
- `START`: Formal token used to indicate the start of an expression (in practice: any expression that can be reduced to a single `START` is a parseable criterion).
160+
- `END`: Formal token that indicates the end of a tokenized string.
161+
- `NULL`: Formal token that denotes a token without meaning, should not appear when an expression is tokenized.
162+
- `BOOL`: Expression that can be reduced to either `True` or `False`.
163+
- `QUANTITY`: Token that denotes a physical quantity, that can be either given as both a value and units, only value (i.e. a dimensionless quantity) or only units.
164+
- `DIMENSION`: Token that denotes an expression only containing physical dimensions.
165+
- `START_DELIMITER`: Token that denotes a list of equalities.
166+
- `INPUT`: Token that denotes any substring that will be passed on for more context specific parsing (e.g. explicit mathematical expressions for symbolic comparisons).
167+
- `matches`: Token for operator that checks in two quantities match, i.e. if they are rewritten using the same units, are their values equal (up to chosen tolerance).
168+
- `dimension`: Token for expression only involving dimensions (i.e. no values or units).
169+
- `=`: Token for operator that checks equality (i.e. compares if value and units are identical separately)
170+
- `<=`: Token for operator that checks if a quantity's value is less than or equal to another quantity's value (after both quantities are rewritten on the same units)
171+
- `>=`: Token for operator that checks if a quantity's value is greater than or equal to another quantity's value (after both quantities are rewritten on the same units)
172+
- `<`: Token for operator that checks if a quantity's value is less than another quantity's value (after both quantities are rewritten on the same units)
173+
- `>`: Token for operator that checks if a quantity's value is greater than another quantity's value (after both quantities are rewritten on the same units)
174+
175+
##### Examples of commonly used criteria
176+
177+
**TODO** Add examples
178+
70179
#### Code shared between different contexts
71180

72-
### Criteria
181+
##### Expression parsing
182+
183+
**TODO** Describe shared code for expression preprocessing and parsing
184+
185+
**TODO** Describe shared code for expression parsing parameters
73186

74-
**TODO** Describe currently available criteria
187+
##### Other shared code
188+
189+
**TODO** Describe shared default parameters
190+
191+
## Feedback and tag generation
192+
193+
- Generate feedback procedures from criteria, each procedure return a boolean that indicates whether the corresponding criterion is satisfied or not, a string intended to be shown to the student, and a list of tags indicating what was found when checking the criteria
194+
- For each criterion; run the corresponding procedure and store the result, the feedback string and the list of tags
195+
- If all criteria are found to be true, then the response is considered correct
196+
197+
### Tag conventions
198+
The feedback procedures consists of a series of function calls, the specifics are determined by the particular criteria, that each return a list of strings (called tags). Each tag then indicates what further function calls must be performed to continue the evaluation, as well as what feedback string (if any) should be generated. When there are no remaining function calls the feedback procedure is completed. The tags are formatted according as *criteria*`_`*name of function call outcome*. For tags that are not connected to a specific criteria (e.g. tags that indicate an issue with expression parsing) the criteria name and underscore is omitted.
199+
200+
## Returning final results
201+
The function returns result dictionary with the following fields:
202+
- `is_correct` is a boolean value that is set to `True` is all criteria are satisfied
203+
- `feedback` is a string that is created by joining all strings generated by the feedback procedures with a line break between each string.
204+
- `tags` is a list of strings that is generated by joining all lists of tags generated by feedback procedures and removing duplicates.
75205

76-
#### Criteria command and grammar
206+
# Preview function
77207

78-
#### Examples of commonly used criteria
208+
When the evaluation function preview is called the code in `preview.py` will be executed. Since different contexts interpret responses in different ways they also have their own preview functions. The context-specific preview functions can be found in `preview_implementations`.
79209

80-
### Feedback generation
210+
**Remark**: Since it is likely that there will be significant overlap between the response preview and the response evaluation (e.g. code for parsing and interpreting the response), it is good practice if they can share as much code as possible to ensure consistency. For this reason it might be better to move the preview functions fully inside the context (either by making a `preview` subfolder in the `context` folder, or by moving the implementation of the preview function inside the context files themselves). In this case the `preview.py` and `evaluation.py` could also share the same code for determining the right context to use.
81211

82-
### Returning final results
212+
# Tests
213+
There are two main groups of tests, evaluation tests and preview tests. The evaluation test can be run by calling `evaluation_tests.py`

0 commit comments

Comments
 (0)