Skip to content

Commit db40ab7

Browse files
committed
Initiated 0.1.5 release & doc cleanup
1 parent dd6fef4 commit db40ab7

File tree

3 files changed

+33
-3
lines changed

3 files changed

+33
-3
lines changed

Project.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "ParallelKMeans"
22
uuid = "42b8e9d4-006b-409a-8472-7f34b3fb58af"
33
authors = ["Bernard Brenyah", "Andrey Oskin"]
4-
version = "0.1.4"
4+
version = "0.1.5"
55

66
[deps]
77
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
@@ -10,7 +10,7 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
1010

1111
[compat]
1212
StatsBase = "0.32, 0.33"
13-
julia = "1.3, 1.4"
13+
julia = "1.3"
1414
Distances = "0.8.2"
1515
MLJModelInterface = "0.2.1"
1616

docs/src/index.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,13 @@
22

33
## Motivation
44

5+
<<<<<<< HEAD
56
It's actually a funny story that led to the development of this package.
67
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after a heated discussion on the Julia Discourse forum when I asked for Julia optimization tips. Long story short, the Julia community is an amazing one! Andrey offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia package.
8+
=======
9+
It's actually a funny story led to the development of this package.
10+
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after a heated discussion on the Julia Discourse forum when I asked for Julia optimization tips. Long story short, Julia community is an amazing one! Andrey offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
11+
>>>>>>> Initiated 0.1.5 release & doc cleanup
712
813
Say hello to `ParallelKMeans`!
914

@@ -24,6 +29,22 @@ As a result, it is useful in practice to restart it several times to get the cor
2429

2530
## Installation
2631

32+
If you are using Julia in the recommended [Juno IDE](https://junolab.org/), the number of threads is already set to the number of available CPU cores so multithreading enabled out of the box.
33+
For other IDEs, multithreading must be exported in your environment before launching the Julia REPL in the command line.
34+
35+
*TIP*: One needs to navigate or point to the Julia executable file to be able to launch it in the command line.
36+
Enable multi threading on Mac/Linux systems via;
37+
38+
```bash
39+
export JULIA_NUM_THREADS=n # where n is the number of threads/cores
40+
```
41+
42+
For Windows systems:
43+
44+
```bash
45+
set JULIA_NUM_THREADS=n # where n is the number of threads/cores
46+
```
47+
2748
You can grab the latest stable version of this package from Julia registries by simply running;
2849

2950
*NB:* Don't forget to invoke Julia's package manager with `]`
@@ -60,6 +81,7 @@ git checkout experimental
6081
with Consistent Speedup](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf)
6182
- [ ] Implementation of [Geometric methods to accelerate k-means algorithm](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf).
6283
- [ ] Support for other distance metrics supported by [Distances.jl](https://github.com/JuliaStats/Distances.jl#supported-distances).
84+
- [ ] Implementation of [Yinyang K-Means](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf).
6385
- [ ] Native support for tabular data inputs outside of MLJModels' interface.
6486
- [ ] Refactoring and finalizaiton of API desgin.
6587
- [ ] GPU support.
@@ -101,14 +123,21 @@ r.iterations # number of elapsed iterations
101123
r.converged # whether the procedure converged
102124
```
103125

104-
### Supported KMeans algorithm variations
126+
### Supported KMeans algorithm variations and recommended use cases
105127

128+
<<<<<<< HEAD
106129
- [Lloyd()](https://cs.nyu.edu/~roweis/csc2515-2006/readings/lloyd57.pdf)
107130
- [Hamerly()](https://www.researchgate.net/publication/220906984_Making_k-means_Even_Faster)
108131
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf)
109132
- [Yinyang()](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf)
133+
=======
134+
- [Lloyd()](https://cs.nyu.edu/~roweis/csc2515-2006/readings/lloyd57.pdf) - Default algorithm but only recommended for very small matrices (switch to `n_threads = 1` to avoid overhead).
135+
- [Hamerly()](https://www.researchgate.net/publication/220906984_Making_k-means_Even_Faster) - Useful in most cases. If uncertain about your use case, use this!
136+
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf) - Recommended for high dimensional data.
137+
>>>>>>> Initiated 0.1.5 release & doc cleanup
110138
- [Geometric()](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf) - (Coming soon)
111139
- [MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - (Coming soon)
140+
- [Yinyang](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf) - (Coming soon)
112141

113142
### Practical Usage Examples
114143

src/mlj_interface.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,7 @@ function MMI.predict(m::KMeans, fitresult, Xnew)
154154
locations, cluster_labels, _ = fitresult
155155

156156
Xarray = MMI.matrix(Xnew)
157+
# TODO: Switch to non allocation method.
157158
(n, p), k = size(Xarray), m.k
158159

159160
pred = zeros(Int, n)

0 commit comments

Comments
 (0)