Skip to content

add recall metric #209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions engine/base_client/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,22 @@ def _search_one(cls, query: Query, top: Optional[int] = None):
end = time.perf_counter()

precision = 1.0
recall = 1.0
if query.expected_result:
ids = set(x[0] for x in search_res)
precision = len(ids.intersection(query.expected_result[:top])) / top

return precision, end - start
# precision equals True Positives / (True Positives + False Positives)
# recall equals True Positives / (True Positives + False Negatives)
# See https://en.wikipedia.org/wiki/Precision_and_recall
true_positives = len(ids.intersection(query.expected_result[:top]))
precision = (
true_positives / top
) # 1 means that the results consist of expected elements.
recall = true_positives / len(
Copy link

@jonbesga jonbesga Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formula for recall is correct but I wonder how useful it is when we are doing a top-k search.

If we specify top=5 and there are 100 neighbors in query.expected_result. The recall will always be at most 0.05 which doesn't seem to be a useful metric?

query.expected_result
) # 1 means that all expected elements are in the results.

return precision, recall, end - start
Copy link
Contributor

@SebanDan SebanDan Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe returning a data structure for all metrics for instance : { "precision":.., "recall":... } might be more suitable and flexible as it do not change the output of the function everytime a new metric is added. Also, it seems that here, precision and recall are the same.``

Copy link

@jonbesga jonbesga Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into account that in line 39 we have

            top = (
                len(query.expected_result)
                if query.expected_result is not None and len(query.expected_result) > 0
                else DEFAULT_TOP
            )

Is there any difference in recall and precision if top is not specified?


def search_all(
self,
Expand All @@ -71,7 +82,7 @@ def search_all(

if parallel == 1:
start = time.perf_counter()
precisions, latencies = list(
precisions, recalls, latencies = list(
zip(*[search_one(query) for query in tqdm.tqdm(queries)])
)
else:
Expand All @@ -90,7 +101,7 @@ def search_all(
if parallel > 10:
time.sleep(15) # Wait for all processes to start
start = time.perf_counter()
precisions, latencies = list(
precisions, recalls, latencies = list(
zip(*pool.imap_unordered(search_one, iterable=tqdm.tqdm(queries)))
)

Expand All @@ -109,6 +120,7 @@ def search_all(
"p95_time": np.percentile(latencies, 95),
"p99_time": np.percentile(latencies, 99),
"precisions": precisions,
"recalls": recalls,
"latencies": latencies,
}

Expand Down