Skip to content

Conversation

@jcoglan
Copy link
Contributor

@jcoglan jcoglan commented Nov 27, 2025

This PR contains the design documents we have produced as part of the M4 milestone for the STA project to design a declarative way to express document update validation functions. This was originally written as an "exploratory" or "rationale" document to compare different approaches and discover features that would be necessary to adequately cover the needed functionality, and then a shorter "specification" describing just the features we're proposing to build. At @janl's suggestion these documents form the "Advantages and Disadvantages" and "Detailed Description" parts of the RFC respectively.

@jcoglan jcoglan force-pushed the rfc-declarative-vdu branch from 8f5cc3a to 9e6d22c Compare November 27, 2025 15:14
Copy link

@ricellis ricellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of initial thoughts on where the proposed APIs could be painful.

fields:

- `language`: must have the value `"query"`
- `validate_doc_update`: contains an _Extended Mango_ expression that encodes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of polymorphism pattern that occurs in CouchDB APIs increases the complexity of modelling Couch types in strongly typed languages (i.e. Go, Java) and with technologies like OpenAPI.

ddoc language validate_doc_update type
javascript string
query object

One could argue for, say, the use of oneOf and a discriminator based on language in OpenAPI, but such a change is a breaking one for anyone with an already existing model type representing a design document.

I am, sadly, very aware that this string/object problem already exists for the map property of a design document, see for example IBM/cloudant-go-sdk#507, and I'd be reluctant to see it propagate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a way around this would be to allow a single _design/_validate doc that doesn’t need a language field. That’d be a bit of a departure from how things work today, but I personally found the “multiple VDUs” always a bit awkward

Comment on lines +153 to +175
- `reason`: this contains either a custom error message, or the list of failures
generated by the first non-matching selector.

If no custom `$reason` is set, then the `reason` field contains a list of
failures like so:

{
"error": "forbidden",
"reason": {
"failures": [
{
"path": ["$newDoc", "type"],
"type": "in",
"params": ["movie", "director"]
}
]
}
}

This is consistent with the current working of JavaScript VDUs. Such functions
can call `throw({ forbidden: obj })` where `obj` is an object, and it will be
passed back to the client as JSON, i.e. it is already possible for user-defined
VDUs to generate responses like that above.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst I see the argument that it is already possible for JS VDUs to pass any error structure they like I feel that formally specifying object types into the reason property is a breaking API change from the existing string usages enshrined in system defined error responses. Just because someone can change that already with their own functions doesn't mean that they should.

My concern here is that this change means using declarative VDU comes with a burden to check and potentially rewrite all post-write error handling code to handle both cases. That feels like a barrier to adoption or an avenue for unanticipated breakages.

I accept the value of providing more validation information, but I don't agree with overloading the reason property with a different type to do it. Keeping reason a string at least would mean existing code continues to work on new VDU validation failures even if it doesn't provide as much information. Applications that want to use the extra information have coding to do anyway so they could as easily obtain that from some other part of the error structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricellis that’s a fair point. For now the design brief was: allow porting currently built-in validations into this, plus what we have as examples in the docs.

I personally would not mind a simplification here.

Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got a chance to take a better look. It's a very well written RFC, well done!

I added some comments in-line. If I had to summarize them, the idea I was going with was to try to make everything simpler and see how to avoid having two evaluation modes, two different separate syntaxes, avoid too many new query language features, and maybe use new features for both indexing /filtering selectors as well as to the VDU ones.

I'd basically be happy with 80/20 or 70/30 solution with fewer new features vs a 95/5 but at the expense of more complexity.

`"director"`. If it does not, return a 403 response to the client.
- Otherwise, accept the write and return a 201 or 202.

The body of the response contains two fields:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we started with something simpler so that if the selector doesn't match then we just throw a forbidden error (maybe with design doc id to indicate which one failed)...

Comment on lines +180 to +183
The intent of this interface is that each individual selector expression
produces a complete list of _all_ the ways in which the input did not match the
selector expression, so that the client can show all the validation errors to
the user in one go.
Copy link
Contributor

@nickva nickva Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's more flexible, I think it might make it harder for users to understand as it departs from a simpler filter / selectors behavior by returning a complex list (union) of failures as opposed a simple boolean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while true, imagine writing software against this:

  • a user enters some form data
  • the app sends the data to CouchDB, which validates it. say 3/10 fields fail validation, but we only return the first error
  • user gets presented the one error, fixes it, re-submits
  • CouchDB validates, returns error 2/3
  • and so on

I think those ergonomics would be very unfortunate and I’d be okay with the added complexity here

Comment on lines +191 to +192
defining VDUs. Some of these features _only_ make sense for VDUs and should only
be allowed in this context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to minimize the differences between the VDU type selectors and indexing / filtering selectors. It would be nicer, I think, if users just learned one query syntax and could use it everywhere, as opposed to having to remember this works in VDUs but that works in selectors only and so on. This could mean we'd add new operators for indexing (type checking, range checking, etc) as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be preferable, let’s see if it can be done

An Extended Mango expression is considered to match a given input if its
evaluation returns an empty list.

To produce the `path` field, the `match` function will need to track the path it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional implementation complexity is why I think at first we can probably get away with covering a good amount use cases just with the existing matchers, maybe with the design doc ids indicating which VDU failed. It's not as flexible as Javascript but if users need more flexibility can just use Javascript, that's not going away.

not match, and all failures from all fields should be returned to the caller.


### `$if`/`$then`/`$else`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, aesthetically if/then/else seem a bit out of place in what looks like a declarative query language. It makes me think of an imperative language like Javascript or Python. Especially as the ifs/thens can be translated to a bunch of ANDs and ORs user could still build the same logic with those, and it be nice to minimize change to our custom selector syntax and grammar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can bike sheds the names :)

}
]
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler way I could see the above example working would be to have two design docs: one checking for admin and the other for types

_design/type_checks:

 "validate_doc_update": { "$newDoc.type": { "$in": ["movie", "director"] }}

Comment on lines +628 to +632
The most "obvious" way to achieve this would be simply allow
`validate_doc_update` (VDU) functions to be written in Erlang, as we do for
map-reduce index definitions. However, this is not especially accessible to most
users, and allows them to execute arbitrary code inside the database engine. A
better solution would be to design a way for validation rules to be expressed in
Copy link
Contributor

@nickva nickva Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about Erlang, while it is a full / proper language, it's not something users will probably want to write VDUs in. I could see Lua or something more widespread perhaps. But we do have Javascript and that's not going away so there is always a way to defer complex cases to that. We don't have to invent a 100% replacements for JS VDUs, which makes our jobs easier a bit!

However, that said, when it comes to us (CouchDB devs) a good number of magically auto-injected VDU docs we use as system VDUs could be (many already were) rewritten in Erlang. In CouchDB internals those are called a bit differently -- they are BDUs ("before-document-update") callbacks.

context_. The path to the current value is just one thing we will need to store
in the match context as selectors are evaluated.

One other thing we may want to store in the context is a "mode" that indicates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluation modes I think adds too much complexity for users. Maybe if we pick one new mode that works the same everywhere, but having to juggle two evaluation methods in a custom query language I think is too tricky.

Comment on lines +1560 to +1567
Strictly speaking, this is an accurate description of the intended validation
rules. However, it will not give good feedback. Under the evaluation rules
described above, where all operators return a possibly-empty list of failures,
the `$or` operator would work be collecting a list of failures for each of its
sub-selectors, and then if any of these lists is empty, then `$or` returns an
empty list. If none of the lists is empty, then the most reasonable thing for
`$or` to do would be to return the combined list of all failures from its
sub-selectors, since it has no idea what any of them mean.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think compared to the complexity added (if/then statements, guards) I'd rather have it be simper in the first go-around with less feedback details. For example, the above could be in a design document named _design/type_checks with that $or:[...] structure and it can do the job without adding new constructs to the query language. If users want detailed feedback, there is always Javascript.

Comment on lines +1758 to +1761
To illustrate how the features proposed here could work in practice, we'll now
work through an example of translating an exiting JavaScript VDU to a
declarative form. Specifically we will look at the [example from the CouchDB VDU
docs](https://docs.couchdb.org/en/stable/ddocs/ddocs.html#validate-document-update-functions).
Copy link
Contributor

@nickva nickva Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system _design/_auth VDU I think is a great candidate to rewrite as an Erlang BDU. For query based VDUs example simple type checks (is this a move, is this a director, is the year and integer in the right range, etc) would probably look better.

@janl
Copy link
Member

janl commented Dec 19, 2025

We also had a discussion in the last dev meeting and I’m posting relevant bits from the transcript here (some of what Nick is covering he’s already added above).

re if/then/else naming of terms:

  • I mean I had similar thought. I mean I read the whole thing. I was reading it for the last hour and I didn't get all the way through to the end because there's an awful lot in there. and the one that stood out for me was if then else. Just seems very strange. Those aren't the right words, I don't think, for what it does. I get the intention of what if then else does, but if is run if the thing that matches matches and then is then done afterwards it was very odd to me. I don't think that's quite the right terms for that stuff. But that's detail. (Bob)

  • My concern really is that I think we're inventing a new language just in a fairly obuse manner and maybe that's fine. is there any precedent for it though? does Mongo itself have this syntax and we're just aping it or… (Bob)

re adoption:

  • I mean I realize the barrier of entry for people to write VDUs writing JavaScript there are people that do like doing that and there are quite a lot of people we have that don't like doing that. are they going to doing this better given how poorly they like writing selectors in my experience as so I mean I don't want to it's a great proposal…but I just wonder if it's the right way to what are we trying to solve the fact that doing the JavaScript thing is slow or is it that JavaScript is an unpleasant way to write it I guess it's both… (Bob)

  • The point I was trying to make with all this [references] is that I could envision a future where we have a lot of example documentation for all these selectors.
    Jan Lehnardt: sorry for the validation selectors that for the lack of a better term and maybe even have support in Photon to put them together and then make it easy to construct these design documents so people don't actually have to go into the intricate details of it all but there's this web standard subcategory of micro formats whereas it's just like an here's how you do an email address how you do the HTML markup and you just import this you give it a bunch of values and then it does the right and that it's a little bit more machine readable across the web. so my thinking is we'll build a bunch of quote unquote types for typical stuff that people would want to validate and then they can just plug that together any way they want. and then it's a little bit easier than having to learn all the intricate details of all this kind of stuff. And then there's going to be plenty of examples of people can know… (Jan)

sidebar about just the performance aspect of running JS VDUs:

  • I mean, we could be running quickjs in process, could we not? (Bob)
  • We could right I mean it's be kind of a bit dangerous like it we'll try to avoid that for now. (Nick)
  • Jan translates this as “technically possible but maybe practically not feasible for CouchDB’s stability needs”
  • Bob recounts lua in haproxy as a viable option, but that’s not somnething that fits CouchDB

re can this be simplified?

  • I'm thinking from the perspective of what if we just did simplify it and it might not cover 100% but it might cover 80%. let's say even with a simple selector if you just say I'd like to mash or not and then if it doesn't it throws a formid like you don't get a choice to do these paths and do but let's just do the simplest thing could you do that and then we can look at maybe a whole bunch of examples and we see we need to really do this string ends with we need to cap together maybe two things or we need to get the type of something like it's really crucial that when you v you do this video use I want to know the type is a number here and we don't really have is number so maybe there's just sort of a smaller set of minimal kind of checks or things but we can also use them in a regular indexer as well an indexing can say I'd like to if it's a number I'm doing something else I emit this way if it's not I do that for… (Nick)

  • And we already have So we have this select a language that we kind of impose and say it's there and I don't want to depart too far from it or add too much to it and I'd rather keep it as small as possible because already know selectors use selectors for more things as opposed to selectors plus it has a bunch of other things just for VD use maybe an extensions or… (Nick)

  • contra: I think that's really where I currently am is is there a way that we're adding lots of extra options and hoping we cover everything. It's inventing our own language, which is fine if you get it right. It's complexity. I mean, I'm not quite on Nick's page with a version of this that's just less complicated. I mean, I think if you're going to do this, you need to cover everything you can. Otherwise, who's going to use the damn thing? They'll just have to use JavaScript.
    (Bob)

re implementation:

  • Of course, you're concerned. I mean, I mean, it's not you [Nick] or I going to be writing the code. (Bob)

Bob sums up his thoughts:

  • where people try to use it, don't understand it. I'll have to learn it and then I'll have to fix the bugs in it and they'll have to add functions that we didn't think of. so there's the maintenance aspect that I'm concerned about.
  • I don't object to the idea. I don't think it's technically wrong. I can't think of an alternative, Literally, embedding lure would make me happy, but I mean wouldn't make anybody else happy. I think you can walk away. You can say yes, if you can write it in this language is complete. You can write anything in that. It's just a question of how much you want to write. if someone else is building it and it's going in and my only concern would be that it's a quality that it comes in, it's got lots of tests. coverage is there as complete as it can be that it's well written it's well documented and that it doesn't crash couch if you use it in stupid ways so from that point of view you fill your boots with my cloud and

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants