Debias #23

pepsi2222 · 2022-09-14T09:17:33Z

[feat&fix] add ExpoMF, PDA, DICE

XuHwang · 2022-09-15T06:19:05Z

recstudio/model/mf/dice.py

+    def _get_query_encoder(self, train_data):
+        int = torch.nn.Embedding(train_data.num_users, self.embed_dim, padding_idx=0)
+        pop = torch.nn.Embedding(train_data.num_users, self.embed_dim, padding_idx=0)
+        class DICEQueryEncoder(torch.nn.Module):


The int and pop in query encoder could be defined as one Embedding, with dimension 2*self.embed_dim? @pepsi2222

Suggested change

class DICEQueryEncoder(torch.nn.Module):

torch.nn.Embedding(train_data.num_users, 2*self.embed_dim, padding_idx=0)

XuHwang · 2022-09-15T06:19:49Z

recstudio/model/mf/dice.py

+                self.pop = pop
+            def forward(self, batch):
+                return torch.cat((self.int(batch), self.pop(batch)), dim=-1)
+        return DICEItemEncoder(int, pop)


Similarly as comment in query encoder above.

XuHwang

Nice job! But some changes should be token according to the comments.

XuHwang · 2022-09-15T06:31:26Z

recstudio/model/mf/dice.py

+        return output
+
+    def _get_sampler(self, train_data):
+        class PopularSamplerWithMargin(Sampler):


To be discussed. I'm confusing about what the pool means and why negative items are sampled only in pop or unpop items when their size are smaller than pool. @pepsi2222

XuHwang · 2022-09-15T06:35:36Z

recstudio/model/mf/dice.py

+        output['mask'] = mask
+        output['score'] = {'pos_int_score': pos_int_score, 'pos_pop_score': pos_pop_score, 'pos_click_score': pos_click_score,
+                           'neg_int_score': neg_int_score, 'neg_pop_score': neg_pop_score, 'neg_click_score': neg_click_score}
+        output['query'] = {'query_int': query.chunk(2, -1)[0], 'query_pop': query.chunk(2, -1)[1]}


The chunk() operations are duplicated here, query_int and query_pop are defined in line 117.

Suggested change

output['query'] = {'query_int': query.chunk(2, -1)[0], 'query_pop': query.chunk(2, -1)[1]}

output['query'] = {'query_int': query_int, 'query_pop': query_pop}

XuHwang · 2022-09-15T06:37:16Z

recstudio/model/mf/dice.py

+from recstudio.model.mf.bpr import BPR
+from recstudio.model import basemodel, loss_func
+import time
+


u'd better attach the title and url of the paper corresponding to the model with a comment here. @pepsi2222

XuHwang · 2022-09-15T06:39:51Z

recstudio/data/advance_dataset.py


-    def build(self, split_ratio, shuffle=True, split_mode='user_entry', **kwargs):
+    def build(self, split_ratio, shuffle=True, split_mode='user_entry', excluding_hist=False, **kwargs):
+        self.excluding_hist = excluding_hist


I think excluding_hist is not a good name here. How about return_hist? @pepsi2222 @Xiuchen519

XuHwang · 2022-09-15T06:46:37Z

recstudio/model/mf/dice.py

+
+                        if num_pop_items < self.pool:
+                            for cnt in range(num_neg):
+                                idx = torch.randint(num_unpop_items, (1,))


Why not using torch.randint(num_unpop_items, (num_neg,)) instead of for?

Suggested change

idx = torch.randint(num_unpop_items, (1,))

idx = torch.randint(num_unpop_items, (num_neg,))

XuHwang · 2022-09-15T06:54:54Z

recstudio/model/mf/expomf.py

+                # data to device
+                batch = self._to_device(batch, self.device)               
+                # update latent user/item factors
+                a = self._expectation(batch)


Maybe the method is not required to be overrided. U can add some conditional statement in training_step method to achieve the EM alg as below:

if batch_idx % 2 == 0: do expectation else: do maximization

XuHwang · 2022-09-15T07:05:53Z

recstudio/model/mf/pda.py

+                                                                              excluding_hist=self.config.get('excluding_hist', False),
+                                                                              method=self.config.get('sampling_method', 'none'), return_query=True)
+            pos_score = self.score_func(query, pos_item_vec)
+            pos_score = pos_item_vec.split([pos_item_vec.shape[-1]-1, 1], dim=-1)[1] ** self.config['gamma'] * pos_score #


I wonder the line is duplicated, becasue the operation has been done in scorer in line 53. @pepsi2222
If so, the method don't need to be overrided.

XuHwang · 2022-09-15T07:07:35Z

recstudio/model/mf/pmf.py

+    def _get_query_encoder(self, train_data):
+        return torch.nn.Embedding(train_data.num_users, self.embed_dim, padding_idx=0)
+
+    def _init_parameter(self):


You can define init_method: normal and init_range:0.1 in pmf.yaml without overriding the method.

XuHwang reviewed Sep 15, 2022

View reviewed changes

XuHwang requested changes Sep 15, 2022

View reviewed changes

XuHwang requested a review from Xiuchen519 March 14, 2023 13:08

pepsi2222 force-pushed the debias branch from 9e35941 to 9923d4c Compare March 22, 2023 06:19

debiasing with different backbones (v1)

a2263ab

pepsi2222 force-pushed the debias branch from 9923d4c to a2263ab Compare March 22, 2023 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Debias #23

Debias #23

Uh oh!

pepsi2222 commented Sep 14, 2022

Uh oh!

XuHwang Sep 15, 2022 •

edited

Loading

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

XuHwang left a comment

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

XuHwang Sep 15, 2022 •

edited

Loading

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

XuHwang Sep 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	class DICEQueryEncoder(torch.nn.Module):
	torch.nn.Embedding(train_data.num_users, 2*self.embed_dim, padding_idx=0)

	output['query'] = {'query_int': query.chunk(2, -1)[0], 'query_pop': query.chunk(2, -1)[1]}
	output['query'] = {'query_int': query_int, 'query_pop': query_pop}

	idx = torch.randint(num_unpop_items, (1,))
	idx = torch.randint(num_unpop_items, (num_neg,))

Debias #23

Are you sure you want to change the base?

Debias #23

Uh oh!

Conversation

pepsi2222 commented Sep 14, 2022

Uh oh!

XuHwang Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XuHwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XuHwang Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XuHwang Sep 15, 2022 •

edited

Loading

XuHwang Sep 15, 2022 •

edited

Loading