Calling store_vector with MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't.

I have input samples as a sparse matrix of shape **(531990 samples, 85765 features)**. 

The size of this matrix in memory is **56KB**. The matrix as a numpy array is approximately 340GB. 

When i use the **_MemoryStorage_** option i run out of memory. This is due to the **_vec = vec.tocsr()_** in
**_unitvec_** function. The input vectors added by **_store_vector_** are scipy.sparse.csr.csr_matrix of shape **(85765, 1)**  as trying to store vectors as scipy.sparse.csr.csr_matrix of shape **(1, 85765)** gives:

```
File "nearpy/engine.py", line 96, in store_vector
  for bucket_key in lshash.hash_vector(v):
File "nearpy/hashes/randombinaryprojections.py", line 74, in hash_vector
  projection = self.normals_csr.dot(v)
File "scipy/sparse/base.py", line 359, in dot
  return self * other
File "scipy/sparse/base.py", line 479, in __mul__ raise ValueError('dimension mismatch')
ValueError: dimension mismatch
```

Removing the **_vec = vec.tocsr()_** line solves the problem for matrices of shape **(85765, 1)**   and no extra memory is allocated. This is strange behavior and it might be a scipy bug, but what is the point of the **_.tocsr()_** conversion anyway?  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling store_vector with MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't. #93

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Calling store_vector with MemoryStorage on scipy.sparse.csr_matrix allocates memory when it shouldn't. #93

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions