Enhancing Related Hadith Suggestions Using Sentence Transformers

Enhancing Related Hadith Suggestions Using Sentence Transformers

Published Date: 24 May 2024
Author: Mohammad Galib Shams, Nabil Mosharraf Hossain

At Greentech Apps Foundation, we constantly look for ways to help users explore the Hadith literature more meaningfully. One of our goals is to connect users with related Hadiths—narrations that share similar meanings, themes, or wording—across various collections. However, the current system for identifying related Hadiths had two major limitations:

🛠️ Problem Statement

  1. Limited Coverage: The existing related Hadith dataset lacked depth. Many meaningful connections were missing, especially from collections like Musnad Ahmad.
  2. Missed Connections: Our current inference method relied heavily on metadata rather than the actual meaning of the Hadiths. This meant some powerful thematic links were not being surfaced.

To address this, we explored a more semantic approach—finding related Hadiths based on their English meaning using Sentence Transformers and vector similarity techniques.

🧪 Our Approach

We adopted a machine learning pipeline inspired by recent advances in semantic search:

  1. Sentence Embedding: We used a Sentence Transformers model to encode Hadiths into vector representations, allowing us to capture their semantic meaning.
  2. Preprocessing: Before embedding, we cleaned the text to reduce noise. This included removing common or unhelpful phrases that often appeared across multiple Hadiths.
  3. Vector Indexing: We used the HNSW (Hierarchical Navigable Small World) algorithm combined with K-Nearest Neighbors (KNN) to quickly and accurately find semantically similar Hadiths.
  4. Similarity Filtering: For each query Hadith, we retrieved top matches and filtered them using cosine similarity with a threshold of 0.87 or above to ensure relevance.

⚠️ Challenges Faced

We encountered several challenges during development:

  • Common Phrases as Noise: Phrases frequently used in Hadiths (like “Messenger of Allah said”) often diluted the semantic uniqueness of each Hadith. Identifying and cleaning these was crucial.
  • Irrelevant Entries in Database: Many entries in our Hadith database were not actual Hadiths but references or placeholders. We tackled this with a three-pronged approach:
    • Identifying and deleting known reference patterns.
    • Using an LLM and few-shot prompting to classify irrelevant entries.
    • Manually reviewing and removing residual noise.

However, to maintain speed, we discarded some Hadiths and their connections without full verification, which may have resulted in the loss of a few relevant links.


✅ Results So Far

We tested our model internally and externally:

  • Accuracy from internal team (100 Hadiths): ~95%
  • Accuracy from content team review (100 Hadiths): ~98%

These results are promising and confirm the potential of our semantic matching approach.


🔧 What’s Next

While our initial results are strong, we know there’s room to improve:

  1. Full DB Review: There are still outdated related Hadiths from the old system and inaccurate entries due to translation noise. We plan to use LLMs to comprehensively recheck related Hadith suggestions across the entire database.
  2. Model Improvements:
    • Try stronger embedding models.
    • Explore ScaNN, Google’s fast similarity search framework.
    • Further refine preprocessing (e.g., stopword removal, better text normalization).
  3. App Integration:
    • Investigate how this semantic search model could power in-app Hadith search features for a more intelligent and user-friendly experience.

💬 Conclusion

This project is a step forward in creating a more interconnected Hadith learning experience. By leveraging modern NLP techniques like Sentence Transformers, HNSW, and LLMs, we aim to build smarter tools that reflect the rich interconnectedness of the Hadith tradition.

We’re excited to continue improving this feature, and we welcome feedback from our users and collaborators who want to contribute to building better Islamic educational technology.


📖 Learn more about our R&D work on customer support AI here:
How We Use AI to Streamline Customer Support and Reply to Over 60,000 User Reviews

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *