Introduction
One of many social media purposes that made it to the highest eight in 2023 is TikTok. This software has revolutionized the way in which we watch brief movies. Those that get pleasure from entertaining and humorous brief movies are possible already aware of this app. Nevertheless, not everyone seems to be fascinated with it. Some reward it, whereas others criticize it. The primary goal of doing sentiment evaluation on the TikTok app is:
To evaluate consumer sentiment in regards to the software, which incorporates consumer opinions and feedback.
To achieve perception into how customers really feel in regards to the app, whether or not they have optimistic or unfavourable experiences, and their likes or dislikes.
For the builders to put it to use to enhance the appliance’s performance, remedy issues, and reply to consumer considerations.
For enterprise individuals to regulate their advertising and marketing technique on TikTok.
Ranging from that perspective, we will use Python to do TikTok evaluation sentiment evaluation to see how individuals react to this software. The supply of assorted Python modules for evaluation offers us a big edge and considerably hurries up the analysis course of.
So, if you wish to understand how customers reply to this software, you’re in the suitable place. Now, let’s carry out sentiment evaluation by way of the steps within the article.
This text was printed as part of the Knowledge Science Blogathon.
Desk of Contents
Step 1: Import Library
Step 2: Learn the Knowledge
Step 3: Knowledge Preprocessing
Step 4: Sentiment Evaluation
Step 1: Import Library
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords
import string
import re
nltk.obtain(‘stopwords’)
stemmer = nltk.SnowballStemmer(“english”)
WordCloud is a library used to create textual content visualizations primarily based on the variety of instances the phrases seem in them so it’s simple to know.
STOPWORDS is a library used to take away unimportant phrases from paperwork or textual content, reminiscent of prepositions and conjunctions. The primary aim in implementing the cease phrases course of is to cut back the variety of phrases in a doc, which is able to have an effect on the pace and efficiency of NLP (pure language processing).
ImageColorGenerator is a library that generates colours utilizing pictures related to the textual content’s subject.
The SentimentIntensityAnalyzer is a library that analyzes sentiment in textual content. This library makes use of a rating to find out if the textual content being analyzed falls into the optimistic, unfavourable, or impartial class.
So, that’s the principle perform of the library above. We are able to create eye-catching visualizations, take away pointless phrases, generate topic-based colours, and consider textual content sentiment. Now, let’s go to the following step.
Step 2: Learn the Knowledge
This second step is crucial half as a result of, with out related knowledge, it could actually result in inaccurate evaluation. The dataset that we’ll use is a group of TikTok opinions downloaded from Kaggle primarily based on scores on the Google Play Retailer. Now let’s take a look at the contents of the dataset.
Python Code:
It seems that there are ten columns within the dataset, which embody reviewId, userName, userImage, content material, rating, thumbsUpCount, reviewCreatedVersion, at, replyContent, and revisedAt. Nevertheless, not all the columns are used for sentiment evaluation. We’ll discuss it within the subsequent step.
Step 3: Knowledge Preprocessing
Knowledge preprocessing is a vital step in sentiment evaluation. It includes cleansing and getting ready knowledge for evaluation to make sure the accuracy and effectiveness of the sentiment evaluation outcomes. We are going to use a number of the initialized libraries at this level. Preprocessing methods embody eradicating undesirable characters, reminiscent of punctuation, and changing all of the textual content to lowercase to make the evaluation course of simpler.
One other essential step in knowledge preprocessing is eradicating cease phrases, that are widespread phrases that aren’t important in figuring out the sentiment of a textual content. Cease phrases can embody phrases like “the,” “is,” and “and.” Eradicating these phrases may also help scale back noise and enhance the accuracy of the sentiment evaluation.
Different preprocessing methods embody tokenization, which includes breaking apart the textual content into particular person phrases or phrases, and stemming or lemmatization, which includes decreasing phrases to their base type to account for spelling and phrase utilization variations.
General, correct knowledge preprocessing is crucial for conducting correct and efficient sentiment evaluation, and it is a vital step in any pure language processing job.
As I beforehand acknowledged, we don’t use all the dataset columns. Solely two columns can be used: content material and rating. Due to this fact, we are going to create a brand new dataset containing solely two columns.
knowledge = knowledge[[“content”, “score”]]
print(knowledge.head())

At first look on the dataset, I observed some columns had null values. Nevertheless, let’s test whether or not the 2 columns we use to research TikTok evaluation sentiment have null values or not.
print(knowledge.isnull().sum())

It turns on the market are 4 null values within the content material column and one within the rating column. Let’s drop these null values and take our evaluation additional.
knowledge = knowledge.dropna()
Now let’s put together this knowledge for the sentiment evaluation job. Right here, we have to clear up the textual content within the content material column to make sure correct evaluation.
stopword=set(stopwords.phrases(‘english’))
def clear(textual content):
textual content = str(textual content).decrease()
textual content = re.sub(‘[.*?]’, ”, textual content)
textual content = re.sub(‘https?://S+|www.S+’, ”, textual content)
textual content = re.sub(‘<.*?>+’, ”, textual content)
textual content = re.sub(‘[%s]’ % re.escape(string.punctuation), ”, textual content)
textual content = re.sub(‘n’, ”, textual content)
textual content = re.sub(‘w*dw*’, ”, textual content)
textual content = [word for word in text.split(‘ ‘) if word not in stopword]
textual content=” “.be a part of(textual content)
textual content = [stemmer.stem(word) for word in text.split(‘ ‘)]
textual content=” “.be a part of(textual content)
return textual content
knowledge[“content”] = knowledge[“content”].apply(clear)
The Python code above defines a perform named “clear,” which accepts a parameter named “textual content”. This perform takes the enter textual content and performs a collection of text-cleaning operations on it to organize it for sentiment evaluation.
Right here’s what every line of the perform does:
str(textual content).decrease(): Converts all textual content to lowercase.
re.sub(‘[.*?]’, ”, textual content): Removes any textual content inside sq. brackets, which is usually used to indicate tags or URLs.
re.sub(‘https?://S+|www.S+’, ”, textual content): Removes any URLs.
re.sub(‘<.*?>+’, ”, textual content): Removes any HTML tags.
re.sub(‘[%s]’ % re.escape(string.punctuation), ”, textual content): Removes any punctuation.
re.sub(‘n’, ”, textual content): Removes any newlines.
textual content = re.sub(‘w*dw*’, ”, textual content): Removes any phrases containing numbers.
textual content = [word for word in text.split(‘ ‘) if word not in stopword]: Removes any cease phrases, that are widespread phrases that don’t add a lot that means to the textual content (e.g. “the”, “and”).
” “.be a part of(textual content): Joins the remaining phrases again collectively right into a single string.
[stemmer.stem(word) for word in text.split(‘ ‘)]: Applies stemming to the phrases within the textual content, which suggests decreasing phrases to their base type (e.g., “operating” turns into “run”).
” “.be a part of(textual content): Joins the stemmed phrases again collectively right into a single string.
return textual content: Returns the cleaned textual content because the output of the perform.
Let’s discover the proportion of scores given to TikTok on the Google Play Retailer!
scores = knowledge[“score”].value_counts()
numbers = scores.index
amount = scores.values
import plotly.categorical as px
determine = px.pie(knowledge,
values=amount,
names=numbers,gap = 0.5)
determine.present()

TikTok has garnered a formidable 74% of five-star scores from customers, with solely 12.9% giving it a one-star ranking. Let’s now take a better take a look at the varieties of phrases utilized by TikTok reviewers.
textual content = ” “.be a part of(i for i in knowledge.content material)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color=”white”).generate(textual content)
plt.determine( figsize=(15,10))
plt.imshow(wordcloud, interpolation=’bilinear’)
plt.axis(“off”)
plt.present()

Step 4: Sentiment Evaluation
We’ve got now reached the ultimate step, sentiment evaluation. Firstly, we’ll rework the rating column into three new columns: Optimistic, Damaging, and Impartial, primarily based on the sentiment rating of every consumer evaluation. That is executed with the intention to purchase an intensive grasp of the evaluation. Let’s get began.
nltk.obtain(‘vader_lexicon’)
sentiments = SentimentIntensityAnalyzer()
knowledge[“Positive”] = [sentiments.polarity_scores(i)[“pos”] for i in knowledge[“content”]]
knowledge[“Negative”] = [sentiments.polarity_scores(i)[“neg”] for i in knowledge[“content”]]
knowledge[“Neutral”] = [sentiments.polarity_scores(i)[“neu”] for i in knowledge[“content”]]
knowledge = knowledge[[“content”, “Positive”, “Negative”, “Neutral”]]
print(knowledge.head())

Let’s now take a better take a look at the kind of phrases utilized in optimistic opinions of TikTok.
optimistic=” “.be a part of([i for i in data[‘content’][data[‘Positive’] > knowledge[“Negative”]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color=”white”).generate(optimistic)
plt.determine( figsize=(15,10))
plt.imshow(wordcloud, interpolation=’bilinear’)
plt.axis(“off”)
plt.present()

Let’s now discover the generally used phrases in unfavourable opinions of TikTok.
unfavourable=” “.be a part of([i for i in data[‘content’][data[‘Negative’] > knowledge[“Positive”]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color=”white”).generate(unfavourable)
plt.determine( figsize=(15,10))
plt.imshow(wordcloud, interpolation=’bilinear’)
plt.axis(“off”)
plt.present()

Conclusion
TikTok has taken the world by storm with its brief, amusing movies that folks can’t get sufficient of. However not everyone seems to be a fan of the app. On this submit, now we have mentioned the next:
How you can use Python to do preprocessing on the textual content knowledge?
How you can use Python to research the emotions of TikTok opinions?
How you can discover the phrases utilized in optimistic and unfavourable opinions?
Whether or not you’re a TikTok fan or not, the vary of viewpoints is fascinating. Did you discover this text about TikTok Opinions’ sentiment evaluation helpful?
When you’ve got any questions or feedback, please depart them beneath.
The media proven on this article shouldn’t be owned by Analytics Vidhya and is used on the Creator’s discretion.