Help! My chatbot is Racist

Jailbreaking is the process of removing software restrictions imposed by manufacturers or operating system providers on electronic devices like smartphones, tablets, or gaming consoles. It allows users to gain root access or administrative privileges, granting them greater control over the device’s operating system. As a result, they can customize the device, install unauthorized apps, and modify system settings that are typically inaccessible.

Jailbreaking offers both advantages and disadvantages. On the positive side, it provides users with greater customization options, enabling them to personalize their devices according to their preferences. Jailbreaking can also lead to instability or performance issues, potentially affecting the device’s overall functionality. Furthermore, the act of jailbreaking can void warranties provided by the manufacturer or operating system provider. Additionally, there is a risk of inadvertently installing malicious or pirated software, which can compromise the device’s security and stability.

Large language models (LLMs) have made remarkable advancements in natural language processing (NLP) and have found application in various fields such as healthcare, therapy, education, and customer service. Considering that users, including students and patients, engage with chatbots, the safety of these systems is of utmost importance. In this context, a study systematically evaluated toxicity in over half a million generations of ChatGPT, a popular dialogue-based LLM. The study found that assigning ChatGPT a persona, such as that of the boxer Muhammad Ali, significantly increased the toxicity of its responses. Such outputs can be harmful to unsuspecting users[1].

Most AI chatbots have built-in safety measures known as “guardrail mechanisms” to ensure responsible and safe usage of AI technology. These mechanisms prevent the generation of harmful or inappropriate content and guide the model to provide useful and appropriate responses. The guardrail mechanisms include pre-training on curated data, where the training data is carefully selected and filtered to minimize exposure to potentially harmful or biased content. After pre-training, the models undergo fine-tuning, a process involving human reviewers who review and rate possible model outputs based on guidelines and policies. This feedback loop helps the model improve its responses. Safety mitigations are also implemented, including reinforcement learning from human feedback (RLHF) and the use of the Moderation API to identify and warn or block certain types of unsafe or inappropriate content. OpenAI actively encourages user feedback to identify problematic outputs or false positives/negatives from the content filter, enabling continuous improvement and making the system safer and more effective.

Recent discourse in the legal circle has often discussed the rapid technological growth of AI and what it could mean for the future employment of lawyers. What we fail to notice in these times is that AI like everyone of man’s creations is not exactly perfect, it is simply the execution of a greater idea of a machine that can understand humans almost as well as other human beings do. Like every other invention this has also been subject to misuse and weaponization, which is what brings us to the conclusion and the lesson learned from this entire event. The author is of the opinion that technology will never pose a threat to lawyers and will only be a tool to make law and enforcement more accessible. This is the cause no matter how far we advance we will always require a set of human beings with the knowledge of law to monitor the growth and lay down guidelines for the use of such technologies. It is essential to recognize that while these guardrail mechanisms are in place, they may not be perfect, and there is always a possibility of errors or limitations in the AI’s responses. This is why it is safe to say that AI will not be replacing lawyers anytime soon.

WRITTEN BY –

SHARANYA CHOWDHURY

2ND YEAR, BALLB

RML NLU, LUCKNOW


[1] https://arxiv.org/abs/2304.05335

Leave a Reply

Your email address will not be published. Required fields are marked *