Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices

### Support guidelines

- [X] I've read the [support guidelines](https://github.com/alexklibisz/elastiknn/blob/main/readme.md#support)

### Background

I am running a 3-node Elasticsearch cluster on AWS servers. Each index contains between 100,000 to 1 million documents, with potential for further growth.

### Bug

I am encountering an issue where identical queries return different sets of documents on repeated executions, leading to inconsistent results for end-users. This inconsistency is negatively impacting the user experience.

**Configuration:**  
I am currently using Cosine LSH for dense vector search with the following mapping:

```json
"chunkVector_1024": {
  "type": "elastiknn_dense_float_vector",
  "elastiknn": {
    "model": "lsh",
    "similarity": "angular",
    "dims": 1024,
    "L": 99,
    "k": 1
  }
}
```

**Query:**  
```json
{
  "elastiknn_nearest_neighbors": {
    "field": "chunkVector_1024",
    "vec": {"values": {{vector_values}}},
    "model": "lsh",
    "similarity": "angular",
    "candidates": 100
  }
}
```

**Observed Behavior:**  
The results for the same query vary with each attempt, making the responses unpredictable.

**Investigation and Benchmarking:**  
Switching to the exact kNN approach (as documented in the ElastiKNN plugin) resolves the inconsistency but results in increased latency—approximately double that of the Cosine LSH method.

**Latency Comparison:**  
The below benchmarks are ran on a index with 1 shard, 1 replica, contains ~15k documents

| **Query**                       | **Avg Response Time (Cosine LSH)** | **Avg Response Time (Exact kNN)** |
|----------------------------------|------------------------------------|-----------------------------------|
| What is mutual fund?             | 10.97 ms                           | 20.38 ms                          |
| How can I invest in NPS?         | 10.29 ms                           | 18.70 ms                          |
| Advantages of mutual funds?      | 8.24 ms                            | 19.58 ms                          |
| How to open savings account?     | 10.27 ms                           | 19.41 ms                          |
| What are debt funds?             | 10.96 ms                           | 18.22 ms                          |

**Request for Recommendations:**  
Given the large indices and the need for low latency, how can I optimize the Cosine LSH setup to ensure consistent results while maintaining performance? Are there any adjustments or alternative configurations you would recommend? I would be happy to provide more details if needed.
@alexklibisz 

### Elastiknn Version

7.17.7

### Platform

AWS servers

### Steps to reproduce

_No response_

### Additional info

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

Support guidelines

Background

Bug

Elastiknn Version

Platform

Steps to reproduce

Additional info

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Query	Avg Response Time (Cosine LSH)	Avg Response Time (Exact kNN)
What is mutual fund?	10.97 ms	20.38 ms
How can I invest in NPS?	10.29 ms	18.70 ms
Advantages of mutual funds?	8.24 ms	19.58 ms
How to open savings account?	10.27 ms	19.41 ms
What are debt funds?	10.96 ms	18.22 ms

Inconsistent Results with Cosine LSH KNN Algorithm on Large Indices #746

Description

Support guidelines

Background

Bug

Elastiknn Version

Platform

Steps to reproduce

Additional info

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions