Skip to content

Commit 09967b2

Browse files
committed
Update docs on new on_conflict :)
1 parent b7ea7f5 commit 09967b2

File tree

2 files changed

+96
-75
lines changed

2 files changed

+96
-75
lines changed

README.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ Browse the documentation at: http://django-postgres-extra.readthedocs.io
2121

2222
Major features
2323
-----
24-
* Single query, concurrency safe upserts.
24+
* Single query, concurrency safe upserts, or safely ignoring a duplicate insert.
25+
* Using PostgreSQL's ``ON CONFLICT DO ...``.
2526
* Unique and not null constraints for `HStoreField`.
2627
* Signals for updates.
2728

docs/manager.md

Lines changed: 94 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -54,125 +54,145 @@ There are four ways to do this:
5454
with postgres_manager(MyModel.myself.through) as manager:
5555
manager.upsert(...)
5656

57-
## Upserting
58-
An "upsert" is an operation where a piece of data is inserted/created if it doesn't exist yet and updated (overwritten) when it already exists. Django has long provided this functionality through [`update_or_create`](https://docs.djangoproject.com/en/1.10/ref/models/querysets/#update-or-create). It does this by first checking whether the record exists and creating it not.
57+
## Conflict handling
58+
The `PostgresManager` comes with full support for PostgreSQL's `ON CONFLICT DO ...`. This is an extremely useful feature for doing concurrency safe inserts. Often, when you want to insert a row, you want to overwrite it already exists, or simply leave the existing data there. This would require a `SELECT` first and then possibly a `INSERT`. Within those two queries, another process might make a change to the row. The alternative of trying to insert, ignoring the error and then doing a `UPDATE` is also not good. That would result in a a lot of write overhead (due to logging). Luckily, PostgreSQL offers `ON CONFLICT DO ...`, which allows you to specify what PostgreSQL should do in case that row already exists.
5959

60-
The major problem with this approach is possibility of race conditions. In between the `SELECT` and `INSERT`, another process could perform the `INSERT`. The last `INSERT` would most likely fail because it would be duplicating a `UNIQUE` constraint.
61-
62-
In order to combat this, PostgreSQL added native upserts. Also known as [`ON CONFLICT DO ...`](https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT). This allows a user to specify what to do when a conflict occurs.
63-
64-
### upsert
65-
Attempts to insert a row with the specified data or updates (and overwrites) the duplicate row, and then returns the primary key of the row that was created/updated.
66-
67-
Upserts work by catching conflcits. PostgreSQL requires to know whichconflicts to react to. You have to specify the name of the column to which you want to react to. This is specified in the `conflict_target` parameter.
68-
69-
You can only specify a single "constraint" in this field. You **cannot** react to conflicts in multiple fields. This is a limitation by PostgreSQL. Note that this means **single constraint**, not necessarily a single column. A constraint can cover multiple columns.
60+
`django-postgres-extra` brings full support for PostgreSQL's `ON CONFLICT DO ...`, allowing blazing fast and concurrency safe inserts:
7061

7162
from django.db import models
7263
from psqlextra.models import PostgresModel
64+
from psqlextra.query import ConflictAction
7365

7466
class MyModel(PostgresModel):
7567
myfield = models.CharField(max_length=255, unique=True)
7668

77-
id1 = MyModel.objects.upsert(
78-
conflict_target=['myfield'],
79-
fields=dict(
80-
myfield='beer'
81-
)
69+
# insert or update if already exists, then fetch, all in a single query
70+
obj2 = (
71+
MyModel.objects
72+
.on_conflict(['myfield'], ConflictAction.UPDATE)
73+
.insert_and_get(myfield='beer')
8274
)
8375

84-
id2 = MyModel.objects.upsert(
85-
conflict_target=['myfield'],
86-
fields=dict(
87-
myfield='beer'
88-
)
76+
# insert, or do nothing if it already exists, then fetch
77+
obj1 = (
78+
MyModel.objects
79+
.on_conflict(['myfield'], ConflictAction.NOTHING)
80+
.insert_and_get(myfield='beer')
8981
)
9082

91-
assert id1 == id2
83+
# insert or update if already exists, then fetch only the primary key
84+
id = (
85+
MyModel.objects
86+
.on_conflict(['myfield'], ConflictAction.UPDATE)
87+
.insert(myfield='beer')
88+
)
9289

93-
Note that a single call to `upsert` results in a single `INSERT INTO ... ON CONFLICT DO UPDATE ...`. This fixes the problem outlined earlier about another process doing the `INSERT` in the mean time.
90+
### Constraint specification
91+
The `on_conflict` function's first parameter denotes the name of the column(s) in which the conflict might occur. Although you can specify multiple columns, these columns must somehow have a single constraint. For example, in case of a `unique_together` constraint.
9492

95-
#### unique_together
96-
As mentioned earlier, `conflict_target` expects a single column name, or multiple if the constraint you want to react to spans multiple columns. Django's [unique_together](https://docs.djangoproject.com/en/1.11/ref/models/options/#unique-together) has this. If you want to react to this constraint that covers multiple columns, specify those columns in the `conflict_target` parameter:
93+
#### Multiple columns
94+
Specifying multiple columns is necessary in case of a constraint that spans multiple columns, such as when using Django's [unique_together](https://docs.djangoproject.com/en/1.11/ref/models/options/#unique-together):
9795

9896
from django.db import models
9997
from psqlextra.models import PostgresModel
10098

101-
class MyModel(PostgresModel):
99+
class MyModel(PostgresModel)
102100
class Meta:
103-
unique_together = ('myfield1', 'myfield2')
101+
unique_together = ('first_name', 'last_name',)
104102

105-
myfield1 = models.CharField(max_length=255)
106-
myfield1 = models.CharField(max_length=255)
103+
first_name = models.CharField(max_length=255)
104+
last_name = models.CharField(max_length=255)
107105

108-
MyModel.objects.upsert(
109-
conflict_target=['myfield1', 'myfield2'],
110-
fields=dict(
111-
myfield1='beer'
112-
myfield2='moar beer'
113-
)
106+
obj = (
107+
MyModel.objects
108+
.on_conflict(['first_name', 'last_name'], ConflictAction.UPDATE)
109+
.insert_and_get(first_name='Henk', last_name='Jansen')
114110
)
115111

116-
#### hstore
117-
You can specify HStore keys that have a unique constraint as a `conflict_target`:
112+
#### HStore keys
113+
Catching conflicts in columns with a `UNIQUE` constraint on a `hstore` key is also supported:
118114

119115
from django.db import models
120116
from psqlextra.models import PostgresModel
121117
from psqlextra.fields import HStoreField
122118

123-
class MyModel(PostgresModel):
124-
# values in the key 'en' have to be unique
125-
myfield = HStoreField(uniqueness=['en'])
119+
class MyModel(PostgresModel)
120+
name = HStoreField(uniqueness=['en'])
126121

127-
MyModel.objects.upsert(
128-
conflict_target=[('myfield', 'en')],
129-
fields=dict(
130-
myfield={'en': 'beer'}
131-
)
122+
id = (
123+
MyModel.objects
124+
.on_conflict([('name', 'en')], ConflictAction.NOTHING)
125+
.insert(name={'en': 'Swen'})
132126
)
133127

134-
It also supports specifying a "unique together" constraint on HStore keys:
128+
This also applies to "unique together" constraints in a `hstore` field:
135129

136-
from django.db import models
137-
from psqlextra.models import PostgresModel
138-
from psqlextra.fields import HStoreField
130+
class MyModel(PostgresModel)
131+
name = HStoreField(uniqueness=[('en', 'ar')])
139132

140-
class MyModel(PostgresModel):
141-
# values in the key 'en' and 'ar' have to be
142-
# unique together
143-
myfield = HStoreField(uniqueness=[('en', 'ar')])
144-
145-
MyModel.objects.upsert(
146-
conflict_target=[('myfield', 'en'), ('myfield', 'ar')],
147-
fields=dict(
148-
myfield={'en': 'beer', 'ar': 'arabic beer'}
149-
)
133+
id = (
134+
MyModel.objects
135+
.on_conflict([('name', 'en'), ('name', 'ar')], ConflictAction.NOTHING)
136+
.insert(name={'en': 'Swen'})
150137
)
151138

152-
### upsert_and_get
153-
Does the same thing as `upsert`, but returns a model instance rather than the primary key of the row that was created/updated. This also happens in a single query using `RETURNING` clause on the `INSERT INTO` statement:
139+
### insert vs insert_and_get
140+
After specifying `on_conflict` you can use either `insert` or `insert_and_get` to perform the insert.
141+
142+
#### insert
143+
* Perform the insert, and then returns the primary key of the row that was inserted or it conflicted with.
144+
145+
#### insert_and_get
146+
* Perform the insert, then returns the entire row that was inserted or it conflicted with, in the form of a model instance.
147+
148+
### Pitfalls
149+
The standard Django methods for inserting/updating are not affected by `on_conflict`. It was a conscious decision to not override or change their behavior. **The following completely ignores the `on_conflict` **:
150+
151+
obj = (
152+
MyModel.objects
153+
.on_conflict(['first_name', 'last_name'], ConflictAction.UPDATE)
154+
.create(first_name='Henk', last_name='Jansen')
155+
156+
The same applies to methods such as `update`, `get_or_create`, `update_or_create` etc.
157+
158+
### Conflict actions
159+
There's currently two actions that can be taken when encountering a conflict. The second parameter of `on_conflict` allows you to specify that should happen.
160+
161+
#### ConflictAction.UPDATE
162+
* If the row does **not exist**, insert a new one.
163+
* If the row **exists**, update it.
164+
165+
This is also known as a "upsert".
166+
167+
#### ConflictAction.NOTHING
168+
* If the row does **not exist**, insert a new one.
169+
* If the row **exists**, do nothing.
170+
171+
This is preferable when the data you're about to insert is the same as the one that already exists. This is more performant because it avoids a write in case the row already exists.
172+
173+
### Shorthand
174+
The `on_conflict`, `insert` and `insert_or_create` methods were only added in `django-postgres-extra` 1.6. Before that, only `ConflictAction.UPDATE` was supported in the following form:
154175

155176
from django.db import models
156177
from psqlextra.models import PostgresModel
157178

158179
class MyModel(PostgresModel):
159180
myfield = models.CharField(max_length=255, unique=True)
160181

161-
obj1 = MyModel.objects.create(myfield='beer')
162-
obj2 = MyModel.objects.create(myfield='beer')
163-
164-
obj1 = MyModel.objects.upsert_and_get(
165-
conflict_target=['myfield'],
166-
fields=dict(
167-
myfield='beer'
182+
obj = (
183+
MyModel.objects
184+
.upsert_and_get(
185+
conflict_target=['myfield']
186+
fields=dict(myfield='beer')
168187
)
169188
)
170189

171-
obj2 = MyModel.objects.upsert_and_get(
172-
conflict_target=['myfield'],
173-
fields=dict(
174-
myfield='beer'
190+
id = (
191+
MyModel.objects
192+
.upsert(
193+
conflict_target=['myfield']
194+
fields=dict(myfield='beer')
175195
)
176196
)
177197

178-
assert obj1.id == obj2.id
198+
These two short hands still exist and **are not** deprecated. They behave exactly the same as `ConflictAction.UPDATE` and are there for convenience. It is up to you to decide what to use.

0 commit comments

Comments
 (0)