Skip to content

DifferentSC/BDCSProgrammingAssignment

Repository files navigation

Overview

This is an implementation of L-BFGS for linear regression on REEF. It finds an linear model like a0 + a1 * x1 + a2 * x2 + a3 * x3 + ..., which minizes the loss function. It spreads data across the worker nodes, calculates gradients, and summates those gradients to get overall gradients. Other than that, each nodes' operations are all local.

Usage

You can excute the MLPracticeClient by putting ./bin/run.sh [command args] in shell. You should be located at your homefolder when you are executing the command.

Commandline arguments

  • -iters (default = 10): maximum number of iterations
  • -workers (default = 3): number of worker tasks
  • -lambda (default = 0.001): degree or regularization
  • -input (default = /input.csv): location of input file on HDFS

Input/Output

It reads input from Hadoop file system and it's address is fixed to "hdfs://localhost:9000". After execution, it produces output to HDFS and it's stored at "hdfs://localhost:9000/output.txt". The output file has information about parameter vector and error value on each iteration.

I attached the training-set I used on my repo. It's from UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.html)

References

Updating Quasi-Newton Matrices With Limited Storage. J. Nocedal. 1980 Fast B-spline curve fitting by L-BFGS. Wenni Zheng a, Pengbo Bo, Yang Liu, Wenping Wang. 2012

Wikipedia article about Backtracking Linesearch(http://en.wikipedia.org/wiki/Backtracking_line_search)

About

This repository is for BDCS Programming Assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages