-
Notifications
You must be signed in to change notification settings - Fork 1.1k
RFC: proposal for declarative VDUs #5792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8f5cc3a to
9e6d22c
Compare
ricellis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of initial thoughts on where the proposed APIs could be painful.
| fields: | ||
|
|
||
| - `language`: must have the value `"query"` | ||
| - `validate_doc_update`: contains an _Extended Mango_ expression that encodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of polymorphism pattern that occurs in CouchDB APIs increases the complexity of modelling Couch types in strongly typed languages (i.e. Go, Java) and with technologies like OpenAPI.
| ddoc language | validate_doc_update type |
|---|---|
javascript |
string |
query |
object |
One could argue for, say, the use of oneOf and a discriminator based on language in OpenAPI, but such a change is a breaking one for anyone with an already existing model type representing a design document.
I am, sadly, very aware that this string/object problem already exists for the map property of a design document, see for example IBM/cloudant-go-sdk#507, and I'd be reluctant to see it propagate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a way around this would be to allow a single _design/_validate doc that doesn’t need a language field. That’d be a bit of a departure from how things work today, but I personally found the “multiple VDUs” always a bit awkward
| - `reason`: this contains either a custom error message, or the list of failures | ||
| generated by the first non-matching selector. | ||
|
|
||
| If no custom `$reason` is set, then the `reason` field contains a list of | ||
| failures like so: | ||
|
|
||
| { | ||
| "error": "forbidden", | ||
| "reason": { | ||
| "failures": [ | ||
| { | ||
| "path": ["$newDoc", "type"], | ||
| "type": "in", | ||
| "params": ["movie", "director"] | ||
| } | ||
| ] | ||
| } | ||
| } | ||
|
|
||
| This is consistent with the current working of JavaScript VDUs. Such functions | ||
| can call `throw({ forbidden: obj })` where `obj` is an object, and it will be | ||
| passed back to the client as JSON, i.e. it is already possible for user-defined | ||
| VDUs to generate responses like that above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whilst I see the argument that it is already possible for JS VDUs to pass any error structure they like I feel that formally specifying object types into the reason property is a breaking API change from the existing string usages enshrined in system defined error responses. Just because someone can change that already with their own functions doesn't mean that they should.
My concern here is that this change means using declarative VDU comes with a burden to check and potentially rewrite all post-write error handling code to handle both cases. That feels like a barrier to adoption or an avenue for unanticipated breakages.
I accept the value of providing more validation information, but I don't agree with overloading the reason property with a different type to do it. Keeping reason a string at least would mean existing code continues to work on new VDU validation failures even if it doesn't provide as much information. Applications that want to use the extra information have coding to do anyway so they could as easily obtain that from some other part of the error structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ricellis that’s a fair point. For now the design brief was: allow porting currently built-in validations into this, plus what we have as examples in the docs.
I personally would not mind a simplification here.
nickva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally got a chance to take a better look. It's a very well written RFC, well done!
I added some comments in-line. If I had to summarize them, the idea I was going with was to try to make everything simpler and see how to avoid having two evaluation modes, two different separate syntaxes, avoid too many new query language features, and maybe use new features for both indexing /filtering selectors as well as to the VDU ones.
I'd basically be happy with 80/20 or 70/30 solution with fewer new features vs a 95/5 but at the expense of more complexity.
| `"director"`. If it does not, return a 403 response to the client. | ||
| - Otherwise, accept the write and return a 201 or 202. | ||
|
|
||
| The body of the response contains two fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we started with something simpler so that if the selector doesn't match then we just throw a forbidden error (maybe with design doc id to indicate which one failed)...
| The intent of this interface is that each individual selector expression | ||
| produces a complete list of _all_ the ways in which the input did not match the | ||
| selector expression, so that the client can show all the validation errors to | ||
| the user in one go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's more flexible, I think it might make it harder for users to understand as it departs from a simpler filter / selectors behavior by returning a complex list (union) of failures as opposed a simple boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while true, imagine writing software against this:
- a user enters some form data
- the app sends the data to CouchDB, which validates it. say 3/10 fields fail validation, but we only return the first error
- user gets presented the one error, fixes it, re-submits
- CouchDB validates, returns error 2/3
- and so on
I think those ergonomics would be very unfortunate and I’d be okay with the added complexity here
| defining VDUs. Some of these features _only_ make sense for VDUs and should only | ||
| be allowed in this context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to minimize the differences between the VDU type selectors and indexing / filtering selectors. It would be nicer, I think, if users just learned one query syntax and could use it everywhere, as opposed to having to remember this works in VDUs but that works in selectors only and so on. This could mean we'd add new operators for indexing (type checking, range checking, etc) as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be preferable, let’s see if it can be done
| An Extended Mango expression is considered to match a given input if its | ||
| evaluation returns an empty list. | ||
|
|
||
| To produce the `path` field, the `match` function will need to track the path it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional implementation complexity is why I think at first we can probably get away with covering a good amount use cases just with the existing matchers, maybe with the design doc ids indicating which VDU failed. It's not as flexible as Javascript but if users need more flexibility can just use Javascript, that's not going away.
| not match, and all failures from all fields should be returned to the caller. | ||
|
|
||
|
|
||
| ### `$if`/`$then`/`$else` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, aesthetically if/then/else seem a bit out of place in what looks like a declarative query language. It makes me think of an imperative language like Javascript or Python. Especially as the ifs/thens can be translated to a bunch of ANDs and ORs user could still build the same logic with those, and it be nice to minimize change to our custom selector syntax and grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can bike sheds the names :)
| } | ||
| ] | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler way I could see the above example working would be to have two design docs: one checking for admin and the other for types
_design/type_checks:
"validate_doc_update": { "$newDoc.type": { "$in": ["movie", "director"] }}
| The most "obvious" way to achieve this would be simply allow | ||
| `validate_doc_update` (VDU) functions to be written in Erlang, as we do for | ||
| map-reduce index definitions. However, this is not especially accessible to most | ||
| users, and allows them to execute arbitrary code inside the database engine. A | ||
| better solution would be to design a way for validation rules to be expressed in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about Erlang, while it is a full / proper language, it's not something users will probably want to write VDUs in. I could see Lua or something more widespread perhaps. But we do have Javascript and that's not going away so there is always a way to defer complex cases to that. We don't have to invent a 100% replacements for JS VDUs, which makes our jobs easier a bit!
However, that said, when it comes to us (CouchDB devs) a good number of magically auto-injected VDU docs we use as system VDUs could be (many already were) rewritten in Erlang. In CouchDB internals those are called a bit differently -- they are BDUs ("before-document-update") callbacks.
| context_. The path to the current value is just one thing we will need to store | ||
| in the match context as selectors are evaluated. | ||
|
|
||
| One other thing we may want to store in the context is a "mode" that indicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Evaluation modes I think adds too much complexity for users. Maybe if we pick one new mode that works the same everywhere, but having to juggle two evaluation methods in a custom query language I think is too tricky.
| Strictly speaking, this is an accurate description of the intended validation | ||
| rules. However, it will not give good feedback. Under the evaluation rules | ||
| described above, where all operators return a possibly-empty list of failures, | ||
| the `$or` operator would work be collecting a list of failures for each of its | ||
| sub-selectors, and then if any of these lists is empty, then `$or` returns an | ||
| empty list. If none of the lists is empty, then the most reasonable thing for | ||
| `$or` to do would be to return the combined list of all failures from its | ||
| sub-selectors, since it has no idea what any of them mean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think compared to the complexity added (if/then statements, guards) I'd rather have it be simper in the first go-around with less feedback details. For example, the above could be in a design document named _design/type_checks with that $or:[...] structure and it can do the job without adding new constructs to the query language. If users want detailed feedback, there is always Javascript.
| To illustrate how the features proposed here could work in practice, we'll now | ||
| work through an example of translating an exiting JavaScript VDU to a | ||
| declarative form. Specifically we will look at the [example from the CouchDB VDU | ||
| docs](https://docs.couchdb.org/en/stable/ddocs/ddocs.html#validate-document-update-functions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The system _design/_auth VDU I think is a great candidate to rewrite as an Erlang BDU. For query based VDUs example simple type checks (is this a move, is this a director, is the year and integer in the right range, etc) would probably look better.
|
We also had a discussion in the last dev meeting and I’m posting relevant bits from the transcript here (some of what Nick is covering he’s already added above). re if/then/else naming of terms:
re adoption:
sidebar about just the performance aspect of running JS VDUs:
re can this be simplified?
re implementation:
Bob sums up his thoughts:
|
This PR contains the design documents we have produced as part of the M4 milestone for the STA project to design a declarative way to express document update validation functions. This was originally written as an "exploratory" or "rationale" document to compare different approaches and discover features that would be necessary to adequately cover the needed functionality, and then a shorter "specification" describing just the features we're proposing to build. At @janl's suggestion these documents form the "Advantages and Disadvantages" and "Detailed Description" parts of the RFC respectively.