Skip to content

Homework 4

Jinho D. Choi edited this page Apr 4, 2018 · 4 revisions

Task

Your task is to cluster all titles of the papers in the ACL Anthology using different vector space models and clustering algorithms.

Data

Use all entries from the bib files here.

Vector Space Models

Experiment with the following two vector space models:

Clustering

Experiment with the following five clustering algorithms:

Optimize the hyper-parameters for each algorithm that give the most reasonable output based on your intuition. Also, identify each cluster with its main topic if possible.

Prediction

  • Extract titles of all papers written by "Jinho D. Choi".
  • For every clustering result above, find out which cluster each title belongs to.

Report

Submit a report that includes:

  • Reasoning behind the optimization you made for each clustering algorithm.
  • The main topic of each cluster and sample titles that represent the main topic.
  • The number of papers in each cluster.
  • For each algorithm, the number of papers from "Jinho D. Choi" in every cluster, and predict his primary research topics with sample titles.
  • Any other interesting findings.

https://canvas.emory.edu/courses/41979/assignments/127373

Practical Approaches to Data Science with Text

Instructor


Emory University

Clone this wiki locally