Connect with random people instantly. Find them in the world’s largest group chat. The “Omegle” for people who don’t want to get creepy messages from old people and weird strangers! Free private chat forever, and meet people along the way. Zonish is also great for you to contact your friends anonymously. Zonish.com is also the best way to contact your friends anonymously, without your parents finding out! Our site is pretty much a way for you to launder your chats. Statistically, the chance of someone finding your chat is impossible, unless they are with you in real life, looking at your computer or device. We hope to make the internet a safer and more secure place for everyone to chat on, without the risks of being spied on, by anyone untrustworthy. Talking to strangers online can be sketchy, so if you are ever talking to someone you don’t feel comfortable with, please just leave the chat. If you are reading this, please let us know if you have any ideas, questions, or concerns for our website here: [email protected] Thanks for reading and enjoy chatting!

Spam Classifier: A Natural Language Processing Project

What is Natural Language Processing?

NLP is a method or a way in which computer interprets the Human language are perform the task. Alexa, Siri, etc. are some of its example.

Let’s start with the Spam Classifier:

The spam classifier predicts whether received message is a ham or a spam.

Let’s start with the dataset: The dataset consists of 5572 messages and their labels which is either “ham” or “spam”.

import pandas as pd
messages = pd.read_csv(“SMSSpamClassifier”,sep=”\t”,names=[‘label’,’message’])

Now the labels needs to be converted in 0 and 1 labels which can be done using get_dummies() method of pandas library.

y = pd.getdummies(messages[‘labels’])
y = y.iloc[:1].values

Here, y wil contain 0 for “ham” labels and 1 for “spam” labels.

Now let’s look at independent data i.e. for x. For that 1st we have to clean the message data i.e. remove stopwords, lower string, group the same type words, etc. For all these we will use WordNetLemmatizer, the main reason of using the lemmatizer instead of stemming, it will provide meaning full words.

Now the code for it is:

import re
import nltk
import nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
corpus = []
for i in range(len(messages)):
review = re.sub(‘[^a-zA-Z]’,’ ‘,messages[‘message’][i])
review = review.lower()
review = review.split()
review = [lemmatizer.lemmaatizer(word) for word in review if not word in stopwords.words(‘english’)]
review = ‘ ‘.join(review)
corpus.append(review)

Here, corpus have all the sentences with clear data. The code above removes the stopwords, lowercase them and get all the important words that are required for prediction. Now we use Term Frequency and Inverse Term Frequency i.e. TfidfVectorizer to for the vector of words. The Tf-idf vector provide us with a vector of words and their importance.

Trending Bot Articles:

1. How Conversational AI can Automate Customer Service

2. Automated vs Live Chats: What will the Future of Customer Service Look Like?

3. Chatbots As Medical Assistants In COVID-19 Pandemic

4. Chatbot Vs. Intelligent Virtual Assistant — What’s the difference & Why Care?

from sklearn.feature_extraction.text import TfidfVectorizer
cv = TfidfVectorizer(max_features=5000)
x = cv.fit_transform(corpus).toarray()

The data is prepared in ‘x’ and now we can use it for training our model. Since Naïve Bayes algorithm works better for NLP we will use it for training our model.

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state=0)
spam_detect_model = MultinomialNB().fit(X_train, y_train)
y_pred = spam_detect_model.predict(X_test)
print(accuracy_score(y_test,y_pred))

The model will give of accuracy of around 98%. To predict the new input we can use model.predict(cv.tranform(user_input).toarray()) and get the output for it.

All resources and code is present at:

Darkshadow9799/Sms-Spam-Classifier

To have a look for NLP description click here.

Don’t forget to give us your 👏 !


Spam Classifier: A Natural Language Processing Project was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.