A surprising experiment to demonstrate the importance of details in machine learning applications
With the plethora of embedding models available, selecting the right one can be a daunting task. Luckily, the MTEB leaderboard offers a wide array of ranking metrics for various natural language processing tasks.
Top 5 embedding models from the MTEB leaderboard as of May 17th, 2024
Upon visiting the site, it becomes apparent that the top five embedding models are Generative Pre-trained Transformers (GPTs). This might lead one to believe that GPT models are the superior choice for embeddings. However, is this assumption accurate? Let’s conduct an experiment to find out.
Embeddings are tensor representations of texts that convert text token IDs into tensor space projections. By inputting text into a neural network model and conducting a forward pass, embedding vectors can be obtained. But the process is slightly more intricate. Let’s break it down step by step:
1. Convert the text into token IDs
2. Pass the token IDs into a neural network
3. Retrieve the outputs of the neural network
To demonstrate this process, I will use a tokenizer to convert the text content “some questions” into tensor representation.
The second step involves passing the model inputs into a neural network for a forward pass to obtain the logits of generated tokens.
The third step is more complex due to the autoregressive nature of GPT models. The last token of a completed sentence contains affinity scores from preceding tokens, making it a crucial output to consider.
To measure the quality of GPT embeddings, cosine similarity can be utilized. Higher cosine similarity indicates closer semantic meaning between sentences.
By conducting an experiment using Mistral 7b v0.1 instruct, cosine similarities were calculated for various question and answer pairs. The results highlighted the limitations of using GPT models for embeddings without fine-tuning.
Further evaluation with a different model, e5-mistral-7b-instruct, revealed significant improvements in cosine similarity for relevant question and answer pairs. This enhancement was attributed to the use of contrastive loss in fine-tuning the model.
Understanding the importance of fine-tuning GPT models with customized loss functions like contrastive loss can lead to more meaningful and discriminative embeddings. By considering the strengths and limitations of GPT models, informed decisions can be made when selecting embedding models for machine learning projects. This experiment sheds light on the significance of attention to detail in machine learning applications and the impact of tailored approaches in model optimization.
Source link