Skip to content

Commit ad9fefd

Browse files
committed
Initiated 0.1.5 release & doc cleanup
1 parent 7bc4bcd commit ad9fefd

File tree

3 files changed

+27
-7
lines changed

3 files changed

+27
-7
lines changed

Project.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "ParallelKMeans"
22
uuid = "42b8e9d4-006b-409a-8472-7f34b3fb58af"
33
authors = ["Bernard Brenyah", "Andrey Oskin"]
4-
version = "0.1.4"
4+
version = "0.1.5"
55

66
[deps]
77
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
@@ -10,7 +10,7 @@ StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
1010

1111
[compat]
1212
StatsBase = "0.32, 0.33"
13-
julia = "1.3, 1.4"
13+
julia = "1.3"
1414
Distances = "0.8.2"
1515
MLJModelInterface = "0.2.1"
1616

docs/src/index.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Motivation
44

55
It's actually a funny story led to the development of this package.
6-
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after a heated discussion on the Julia Discourse forum when I asked for Julia optimizaition tips. Long story short, Julia community is an amazing one! Andrey offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
6+
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after a heated discussion on the Julia Discourse forum when I asked for Julia optimization tips. Long story short, Julia community is an amazing one! Andrey offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
77

88
Say hello to `ParallelKMeans`!
99

@@ -24,6 +24,22 @@ As a result, it is useful in practice to restart it several times to get the cor
2424

2525
## Installation
2626

27+
If you are using Julia in the recommended [Juno IDE](https://junolab.org/), the number of threads is already set to the number of available CPU cores so multithreading enabled out of the box.
28+
For other IDEs, multithreading must be exported in your environment before launching the Julia REPL in the command line.
29+
30+
*TIP*: One needs to navigate or point to the Julia executable file to be able to launch it in the command line.
31+
Enable multi threading on Mac/Linux systems via;
32+
33+
```bash
34+
export JULIA_NUM_THREADS=n # where n is the number of threads/cores
35+
```
36+
37+
For Windows systems:
38+
39+
```bash
40+
set JULIA_NUM_THREADS=n # where n is the number of threads/cores
41+
```
42+
2743
You can grab the latest stable version of this package from Julia registries by simply running;
2844

2945
*NB:* Don't forget to Julia's package manager with `]`
@@ -58,6 +74,7 @@ git checkout experimental
5874
- [X] Full Implementation of Triangle inequality based on [Elkan - 2003 Using the Triangle Inequality to Accelerate K-Means"](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf).
5975
- [ ] Implementation of [Geometric methods to accelerate k-means algorithm](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf).
6076
- [ ] Support for other distance metrics supported by [Distances.jl](https://github.com/JuliaStats/Distances.jl#supported-distances).
77+
- [ ] Implementation of [Yinyang K-Means](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf).
6178
- [ ] Native support for tabular data inputs outside of MLJModels' interface.
6279
- [ ] Refactoring and finalizaiton of API desgin.
6380
- [ ] GPU support.
@@ -98,13 +115,14 @@ r.iterations # number of elapsed iterations
98115
r.converged # whether the procedure converged
99116
```
100117

101-
### Supported KMeans algorithm variations
118+
### Supported KMeans algorithm variations and recommended use cases
102119

103-
- [Lloyd()](https://cs.nyu.edu/~roweis/csc2515-2006/readings/lloyd57.pdf)
104-
- [Hamerly()](https://www.researchgate.net/publication/220906984_Making_k-means_Even_Faster)
105-
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf)
120+
- [Lloyd()](https://cs.nyu.edu/~roweis/csc2515-2006/readings/lloyd57.pdf) - Default algorithm but only recommended for very small matrices (switch to `n_threads = 1` to avoid overhead).
121+
- [Hamerly()](https://www.researchgate.net/publication/220906984_Making_k-means_Even_Faster) - Useful in most cases. If uncertain about your use case, use this!
122+
- [Elkan()](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf) - Recommended for high dimensional data.
106123
- [Geometric()](http://cs.baylor.edu/~hamerly/papers/sdm2016_rysavy_hamerly.pdf) - (Coming soon)
107124
- [MiniBatch()](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf) - (Coming soon)
125+
- [Yinyang](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ding15.pdf) - (Coming soon)
108126

109127
### Practical Usage Examples
110128

@@ -176,6 +194,7 @@ ________________________________________________________________________________
176194
- 0.1.1 Added interface for MLJ.
177195
- 0.1.2 Added Elkan algorithm.
178196
- 0.1.3 Faster & optimized execution.
197+
- 0.1.4 Updated interface for MLJ with a predict function.
179198

180199
## Contributing
181200

src/mlj_interface.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,7 @@ function MMI.predict(m::KMeans, fitresult, Xnew)
154154
locations, cluster_labels, _ = fitresult
155155

156156
Xarray = MMI.matrix(Xnew)
157+
# TODO: Switch to non allocation method.
157158
(n, p), k = size(Xarray), m.k
158159

159160
pred = zeros(Int, n)

0 commit comments

Comments
 (0)