You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/manager.md
+94-74Lines changed: 94 additions & 74 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,125 +54,145 @@ There are four ways to do this:
54
54
with postgres_manager(MyModel.myself.through) as manager:
55
55
manager.upsert(...)
56
56
57
-
## Upserting
58
-
An "upsert" is an operation where a piece of data is inserted/created if it doesn't exist yet and updated (overwritten) when it already exists. Django has long provided this functionality through [`update_or_create`](https://docs.djangoproject.com/en/1.10/ref/models/querysets/#update-or-create). It does this by first checking whether the record exists and creating it not.
57
+
## Conflict handling
58
+
The `PostgresManager` comes with full support for PostgreSQL's `ON CONFLICT DO ...`. This is an extremely useful feature for doing concurrency safe inserts. Often, when you want to insert a row, you want to overwrite it already exists, or simply leave the existing data there. This would require a `SELECT` first and then possibly a `INSERT`. Within those two queries, another process might make a change to the row. The alternative of trying to insert, ignoring the error and then doing a `UPDATE` is also not good. That would result in a a lot of write overhead (due to logging). Luckily, PostgreSQL offers `ON CONFLICT DO ...`, which allows you to specify what PostgreSQL should do in case that row already exists.
59
59
60
-
The major problem with this approach is possibility of race conditions. In between the `SELECT` and `INSERT`, another process could perform the `INSERT`. The last `INSERT` would most likely fail because it would be duplicating a `UNIQUE` constraint.
61
-
62
-
In order to combat this, PostgreSQL added native upserts. Also known as [`ON CONFLICT DO ...`](https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT). This allows a user to specify what to do when a conflict occurs.
63
-
64
-
### upsert
65
-
Attempts to insert a row with the specified data or updates (and overwrites) the duplicate row, and then returns the primary key of the row that was created/updated.
66
-
67
-
Upserts work by catching conflcits. PostgreSQL requires to know whichconflicts to react to. You have to specify the name of the column to which you want to react to. This is specified in the `conflict_target` parameter.
68
-
69
-
You can only specify a single "constraint" in this field. You **cannot** react to conflicts in multiple fields. This is a limitation by PostgreSQL. Note that this means **single constraint**, not necessarily a single column. A constraint can cover multiple columns.
60
+
`django-postgres-extra` brings full support for PostgreSQL's `ON CONFLICT DO ...`, allowing blazing fast and concurrency safe inserts:
# insert or update if already exists, then fetch, all in a single query
70
+
obj2 = (
71
+
MyModel.objects
72
+
.on_conflict(['myfield'], ConflictAction.UPDATE)
73
+
.insert_and_get(myfield='beer')
82
74
)
83
75
84
-
id2 = MyModel.objects.upsert(
85
-
conflict_target=['myfield'],
86
-
fields=dict(
87
-
myfield='beer'
88
-
)
76
+
# insert, or do nothing if it already exists, then fetch
77
+
obj1 = (
78
+
MyModel.objects
79
+
.on_conflict(['myfield'], ConflictAction.NOTHING)
80
+
.insert_and_get(myfield='beer')
89
81
)
90
82
91
-
assert id1 == id2
83
+
# insert or update if already exists, then fetch only the primary key
84
+
id = (
85
+
MyModel.objects
86
+
.on_conflict(['myfield'], ConflictAction.UPDATE)
87
+
.insert(myfield='beer')
88
+
)
92
89
93
-
Note that a single call to `upsert` results in a single `INSERT INTO ... ON CONFLICT DO UPDATE ...`. This fixes the problem outlined earlier about another process doing the `INSERT` in the mean time.
90
+
### Constraint specification
91
+
The `on_conflict` function's first parameter denotes the name of the column(s) in which the conflict might occur. Although you can specify multiple columns, these columns must somehow have a single constraint. For example, in case of a `unique_together` constraint.
94
92
95
-
#### unique_together
96
-
As mentioned earlier, `conflict_target` expects a single column name, or multiple if the constraint you want to react to spans multiple columns. Django's [unique_together](https://docs.djangoproject.com/en/1.11/ref/models/options/#unique-together) has this. If you want to react to this constraint that covers multiple columns, specify those columns in the `conflict_target` parameter:
93
+
#### Multiple columns
94
+
Specifying multiple columns is necessary in case of a constraint that spans multiple columns, such as when using Django's [unique_together](https://docs.djangoproject.com/en/1.11/ref/models/options/#unique-together):
Does the same thing as `upsert`, but returns a model instance rather than the primary key of the row that was created/updated. This also happens in a single query using `RETURNING` clause on the `INSERT INTO` statement:
139
+
### insert vs insert_and_get
140
+
After specifying `on_conflict` you can use either `insert` or `insert_and_get` to perform the insert.
141
+
142
+
#### insert
143
+
* Perform the insert, and then returns the primary key of the row that was inserted or it conflicted with.
144
+
145
+
#### insert_and_get
146
+
* Perform the insert, then returns the entire row that was inserted or it conflicted with, in the form of a model instance.
147
+
148
+
### Pitfalls
149
+
The standard Django methods for inserting/updating are not affected by `on_conflict`. It was a conscious decision to not override or change their behavior. **The following completely ignores the `on_conflict`**:
The same applies to methods such as `update`, `get_or_create`, `update_or_create` etc.
157
+
158
+
### Conflict actions
159
+
There's currently two actions that can be taken when encountering a conflict. The second parameter of `on_conflict` allows you to specify that should happen.
160
+
161
+
#### ConflictAction.UPDATE
162
+
* If the row does **not exist**, insert a new one.
163
+
* If the row **exists**, update it.
164
+
165
+
This is also known as a "upsert".
166
+
167
+
#### ConflictAction.NOTHING
168
+
* If the row does **not exist**, insert a new one.
169
+
* If the row **exists**, do nothing.
170
+
171
+
This is preferable when the data you're about to insert is the same as the one that already exists. This is more performant because it avoids a write in case the row already exists.
172
+
173
+
### Shorthand
174
+
The `on_conflict`, `insert` and `insert_or_create` methods were only added in `django-postgres-extra` 1.6. Before that, only `ConflictAction.UPDATE` was supported in the following form:
These two short hands still exist and **are not** deprecated. They behave exactly the same as `ConflictAction.UPDATE` and are there for convenience. It is up to you to decide what to use.
0 commit comments