🎉 #Gate Alpha 3rd Points Carnival & ES Launchpool# Joint Promotion Task is Now Live!
Total Prize Pool: 1,250 $ES
This campaign aims to promote the Eclipse ($ES) Launchpool and Alpha Phase 11: $ES Special Event.
📄 For details, please refer to:
Launchpool Announcement: https://www.gate.com/zh/announcements/article/46134
Alpha Phase 11 Announcement: https://www.gate.com/zh/announcements/article/46137
🧩 [Task Details]
Create content around the Launchpool and Alpha Phase 11 campaign and include a screenshot of your participation.
📸 [How to Participate]
1️⃣ Post with the hashtag #Gate Alpha 3rd
The trustworthiness assessment of GPT models reveals potential vulnerabilities and privacy risks.
Research on the Credibility Assessment of Language Models
The University of Illinois at Urbana-Champaign, in collaboration with several universities and research institutions, has launched a large language model (LLMs) comprehensive trustworthiness assessment platform, which is introduced in the latest paper "Decoding Trust: A Comprehensive Assessment of the Trustworthiness of GPT Models."
The research team conducted a comprehensive credibility assessment of the GPT model and discovered some previously unpublished vulnerabilities. For example, the GPT model is prone to generating toxic and biased outputs and may leak private information from training data and conversation history. Although GPT-4 is generally more reliable than GPT-3.5 in standard tests, it is more susceptible to attacks when faced with maliciously designed prompts, possibly because it adheres more strictly to misleading instructions.
The study conducted a comprehensive evaluation of the GPT model from eight credibility perspectives, including its robustness against text adversarial attacks, adaptability to different task instructions, and system prompts. The evaluation utilized various scenarios, tasks, metrics, and datasets.
The results show that the GPT model performs excellently in certain aspects, such as being unmisled by counterfactual examples. However, there are also some issues, such as being susceptible to misleading system prompts that can produce biased content, especially GPT-4 is more easily influenced. The degree of bias in the model is also related to specific topics, with less bias on certain sensitive topics.
In terms of privacy protection, research has found that GPT models may leak sensitive information from training data, such as email addresses. GPT-4 is more robust than GPT-3.5 in protecting personal identification information, but may still leak privacy in certain cases.
This study provides a comprehensive benchmark for assessing the credibility of language models, helping to identify potential vulnerabilities and driving the development of more reliable models. The research team hopes that this work will encourage academia to continue in-depth research based on this foundation, working together to create more powerful and trustworthy language models.