No need for fine-tuning or advanced models: r does not rely on fine-tuning or advanced model guidance, and can independently improve the reasoning capabilities of the SLM, allowing it to achieve significant performance improvements without requiring additional resources. Experimental results show that r can effectively solve various inference problems and achieve significant performance improvements on multiple SLMs. For example, on the GSM8K dataset, r improves the accuracy of LLaMA-7B from .5% to 6.9% and the accuracy of Mistral-7B from 6.6% to 8.88%. A. Generation Phase (MCTS Rollout): Action Space: r introduces five human-like reasoning actions to simulate human behavior in the reasoning process: A: Propose one-step reasoning.
A: Propose remaining reasoning brazil email list steps. A: Ask the next sub-question and answer it. A: Answer the sub-question again. A5: Ask the question/sub-question again. MCTS Search: Use the MCTS algorithm to gradually generate candidate reasoning paths based on the current state and the action space. Reward function: Design an SLM-adapted reward function to estimate the contribution to the final answer based on the inference steps and guide the expansion of the MCTS tree. Why these actions help the model better explore the solution space: Variety: Rich action types allow the model to try out different reasoning strategies and avoid falling into a fixed mindset.
Flexibility: The model can choose appropriate actions based on the current status and flexibly respond to different problems. Decomposition: By decomposing complex problems into sub-problems, the model can be solved step by step, reducing the difficulty of inference. Verification: By answering the sub-question again,the model can check whether the answer to the sub-question is correct and improve the accuracy of the inference. B. Identification phase (mutual consistency): Discriminator SLM: Use another SLM with similar capabilities to the target SLM as a discriminator to evaluate the generated candidate inference paths. Partial hint: Use part of the candidate inference path as a hint to allow the discriminator to complete the remaining inference steps.
Market Growth Operations Manager
-
- Posts: 5
- Joined: Mon Dec 23, 2024 3:59 pm