Skip to content

Homework 3

Jinho D. Choi edited this page Feb 21, 2018 · 11 revisions

Task 1: Email Extraction

Out of 16,321 entries in email_map.tsv, 2,513 of them show mismatches between the number of authors and email addresses (email_mismatch.tsv).

  • Write a program that extracts the missing email addresses from the text files.
  • Update the third column of email_map.tsv with the extracted email addresses, where the addresses are delimited by ;.
  • Prioritize your updates to recent non-workshop publications.

Task 2: Institution Weighting

For each publication, weight individual institutions in terms of their contributions to the publication.

  • Write a program that measures the number of times that each institute appears in the email list divided by the total number of email addresses in each publication.
  • Come up with your own weighting scheme that makes sense to you.
  • Make sure to use the top-level domain name. For instance, given the email list [jinho@mathcs.emory.edu, choi@emory.edu], your program should give emory.edu:1.0, not mathcs.emory.edu:0.5;emory.edu:0.5.
  • Update the fourth column of email_map.tsv with the weights, where the institution and its weight are delimited by : and (institution, weight) pairs are delimited by ;.

Github Submission

  • Create a Github account if you do not have one.
  • Fork the nlp-ranking repository by clicking the Fork button at the top-right corner.
  • Install Github Desktop on your local machine if you have not.
  • Launch Github Desktop and sign in with your Github credentials.
  • Click the Clone a Repository button and choose the nlp-ranking repository you just forked.
  • Create dat/email_map_firstname_lastname.tsv (e.g., email_map_jinho_choi.tsv), commit, and push your changes to master.
  • Go to the nlp-ranking repository under your account (e.g., https://github.com/jdchoi77/nlp-ranking).
  • Click the New pull request button and on the following page, click the Create pull request button to make a pull request to elitcloud/nlp-ranking.

Canvas Submission

Practical Approaches to Data Science with Text

Instructor


Emory University

Clone this wiki locally