Thursday, March 23, 2023
No Result
View All Result
Get the latest A.I News on A.I. Pulses
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
No Result
View All Result
Get the latest A.I News on A.I. Pulses
No Result
View All Result

R for Knowledge Evaluation: Easy methods to Discover the Excellent Cocomelon Video for Your Youngsters | by Chengzhi Zhao | Mar, 2023

March 5, 2023
146 4
Home Data science
Share on FacebookShare on Twitter


Easy methods to Construct Finish-to-Finish Knowledge Undertaking Exploring New Trending Cocomelon Movies from Scratch Utilizing R

Photograph by Tony Sebastian on Unsplash

Cocomelon — Nursery Rhymes is the world’s second-largest Youtube channel (155M+ subscribers). It’s such a preferred and useful channel that it’s an inevitable topic for toddlers and oldsters. I get pleasure from spending time watching Cocomelon along with my son.

After watching Cocomelon movies for a month, I seen the identical movies are repeatedly beneficial on Youtube. Movies like “The wheel on the bus” and “tub track” are well-liked and enjoyable to observe, however they had been printed years in the past, and youngsters acquired bored watching them repeatedly. As a father, I wish to present some newer however good-quality movies from the Cocomelon channel. As a knowledge skilled, I additionally wish to discover the world’s second-largest Youtube channel knowledge to realize extra insights and discover one thing fascinating in regards to the knowledge accessible.

All movies inside a YouTube channel solely present customers with two choices: lately uploaded (order by time) and well-liked (order by view). I might go to the lately uploaded tab and click on one after one other. Nevertheless, the Cocomelon channel has 800+ movies, which can be time-consuming.

The great factor is that I’m an engineer and know easy methods to construct one thing with knowledge. So I began writing code by gathering knowledge, performing the cleanup, visualization, and gaining extra insights. I’ll share my journey on utilizing R for Knowledge Evaluation: constructing an end-to-end resolution for exploring trending Cocomelon movies utilizing R from scratch.

Observe: though the instance code I wrote in R and the Youtube channel is for Cocomelon, they’re my choice. You can even write in Python or Rust with its knowledge evaluation instrument, and I’ll present easy methods to get knowledge from Youtube applies to different channels as properly.

The information supply is all the time the place to begin of any knowledge venture. I’ve made a number of makes an attempt to step onto my closing resolution.

I first searched on Google for the time period: “Youtube views stats for Cocomelon” It exhibits some statistics in regards to the channel, however none cowl extra detailed knowledge for every video. These websites are closely flooded with advertisements, and internet scraping is likely to be difficult.

Then I appeared on the public dataset on Kaggle, and CC0 datasets like Trending YouTube Video Statistics could possibly be an excellent possibility. Nevertheless, after exploring the dataset, I discovered two points:

It would not comprise Cocomelon within the datasetThe content material was retrieved years in the past and wanted newer movies I needed to seek for.

My solely possibility is to tug knowledge instantly from Youtube to tug probably the most up-to-date knowledge. There are additionally two choices right here:

Internet scraping: I might arrange a crawler or discover one venture on GitHub and use it instantly. My concern right here is that if the crawler is aggressive, it would block my Youtube account. And crawling is not very environment friendly for quite a few movies to tug from.Youtube API: I lastly landed on this resolution. It’s environment friendly and supplies some fundamental statistics on movies: variety of views and variety of likes. We will additional use this data to construct our knowledge evaluation venture.

Get Youtube API Key To Pull Knowledge

Youtube API crucial grants you permission to tug knowledge from Youtube. You first would wish to go to then “create credentials” with the API key. The default key is not restricted; you possibly can restrict the API key used just for Youtube.

Google Cloud Create Credentials | Picture By Writer

Get Youtube Channel Playlist in R

Upon getting the API key, confer with Youtube Knowledge API for extra reference on the potential knowledge it helps. To look at the API in a queryable stage, we are able to use instruments like Postman or instantly copy the total URL.

For instance, we might like to tug the channel data for Cocomelon; by some means, I did not discover its channel id by inspecting its URL, however I discovered it via some google search.

https://www.youtube.com/channel/UCbCmjCuTUZos6Inko4u57UQ

Now we are able to use the channel id to assemble the GET technique and fill the API key into the important thing subject:

https://www.googleapis.com/youtube/v3/channels?half=snippet,contentDetails,statistics&id=UCbCmjCuTUZos6Inko4u57UQ&key=

From the returned JSON, probably the most essential piece of knowledge is the playlist data, which tells us additional about all of the movies.

“contentDetails”: {“relatedPlaylists”: {“likes”: “”,”uploads”: “UUbCmjCuTUZos6Inko4u57UQ”}}

With the brand new adoption of pagination and the utmost variety of objects on one web page being 50, calling playlistItems will take time to achieve the ultimate record. We might want to make use of the present token to retrieve the subsequent web page till no subsequent one is discovered. We will put every part collectively in R.

library(shiny)library(vroom)library(dplyr)library(tidyverse)library(httr)library(jsonlite)library(ggplot2)library(ggthemes)library(stringr)

key <- “to_be_replace”playlist_url <-paste0(“https://www.googleapis.com/youtube/v3/playlistItems?half=snippet,contentDetails,standing&maxResults=50&playlistId=UUbCmjCuTUZos6Inko4u57UQ&key=”,key)

api_result <- GET(playlist_url)json_result <- content material(api_result, “textual content”, encoding = “UTF-8”)movies.json <- fromJSON(json_result)movies.json$nextPageTokenvideos.json$totalResults

pages <- record(movies.json$objects)counter <- 0

whereas (!is.null(movies.json$nextPageToken)) {next_url <-paste0(playlist_url, “&pageToken=”, movies.json$nextPageToken)api_result <- GET(next_url)print(next_url)message(“Retrieving web page “, counter)json_result <- content material(api_result, “textual content”, encoding = “UTF-8”)movies.json <- fromJSON(json_result)counter <- counter + 1pages[[counter]] <- movies.json$objects}## Mix all of the dataframe into oneall_videos <- rbind_pages(pages)## Get a listing of videovideos <- all_videos$contentDetails$videoId

all_videos ought to give us all of the fields for the video. All we care about at this stage is the videoId so we are able to fetch detailed data on every video.

Iterate the Video Record and Fetch Knowledge For Every Video In R

As soon as all of the movies are saved in a vector, we are able to replicate the same course of as we did for the playlist. Will probably be a lot simpler this time since we do not have to deal with the pagination.

At this stage, we might care extra in regards to the knowledge we’ll finally pull from the video API name. I selected those for our later knowledge evaluation and visualization. To avoid wasting time in pulling this knowledge once more, it is higher to persist the info right into a CSV file, so we do not have to run the API name a number of instances.

videos_df = knowledge.body()video_url <-paste0(“https://www.googleapis.com/youtube/v3/movies?half=contentDetails,id,liveStreamingDetails,localizations,participant,recordingDetails,snippet,statistics,standing,topicDetails&key=”,key)

for (v in movies) {a_video_url <- paste0(video_url, “&id=”, v)print(v)print(a_video_url)api_result <- GET(a_video_url)json_result <- content material(api_result, “textual content”, encoding = “UTF-8”)movies.json <- fromJSON(json_result, flatten = TRUE)# colnames(movies.json$objects)video_row <- movies.json$objects %>%choose(snippet.title,snippet.publishedAt,snippet.channelTitle,snippet.thumbnails.default.url,participant.embedHtml,contentDetails.period,statistics.viewCount,statistics.commentCount,statistics.likeCount,statistics.favoriteCount,snippet.tags)videos_df <- rbind(videos_df, video_row)}

write.csv(videos_df, “~/cocomelon.csv”, row.names=TRUE)

The information is ready for our subsequent stage to discover the Cocomelon Youtube video. Now it is time to carry out some cleanup and create visualizations to point out findings.

The default object knowledge sort would not work properly with the later sorting, so we might must convert some objects to drift or date varieties.

videos_df <- videos_df %>% remodel(statistics.viewCount = as.numeric(statistics.viewCount),statistics.likeCount = as.numeric(statistics.likeCount),statistics.favoriteCount = as.numeric(statistics.favoriteCount),snippet.publishedAt = as.Date(snippet.publishedAt))

What are the highest 5 most seen Cocomelon movies?

This half is easy. We might want to pick out the fields we’re keen on, then kind the movies in descending order by the sector. viewCount .

videos_df %>%choose(snippet.title, statistics.viewCount) %>% organize(desc(statistics.viewCount)) %>% head(5)

# Output:# snippet.title statistics.viewCount#1 Tub Music | CoComelon Nursery Rhymes & Youngsters Songs 6053444903#2 Wheels on the Bus | CoComelon Nursery Rhymes & Youngsters Songs 4989894294#3 Baa Baa Black Sheep | CoComelon Nursery Rhymes & Youngsters Songs 3532531580#4 Sure Sure Greens Music | CoComelon Nursery Rhymes & Youngsters Songs 2906268556#5 Sure Sure Playground Music | CoComelon Nursery Rhymes & Youngsters Songs 2820997030

For you’ve got watched Cocomelon movies earlier than, it isn’t shocking to see the end result that “Tub Music,” “Wheels on the Bus,” and “Baa Baa Black Sheep” rank within the prime 3. It matches the Cocomelon well-liked tab on Youtube. Additionally, the “Tub Music” is performed 20%+ extra instances than the second video — “Wheels on the Bus.” I can see that many toddlers are struggling to take a shower, and having youngsters watch this video might give them an thought of easy methods to take a shower and luxury them to calm them down.

We additionally create a bar chart with the highest 5 movies:

ggplot(knowledge = chart_df, mapping = aes(x = reorder(snippet.title, statistics.viewCount), y = statistics.viewCount)) +geom_bar(stat = “id”,fill=”lightgreen”) +scale_x_discrete(labels = perform(x) str_wrap(x, width = 16)) +theme_minimal()

The Prime 5 most seen Cocomelon movies | Picture By Writer

The variety of views and likes are correlated: Is a video extra more likely to get a thumb up (like) with extra views?

We will use the info to show it additional. First, normalize the viewCount and likeCount to suit higher for the visualization. Secondly, we additionally compute the times because the video was uploaded to get when the favored movies are created.

chart_df <- videos_df %>%mutate(views = statistics.viewCount / 1000000,likes = statistics.likeCount / 10000,number_days_since_publish = as.numeric(Sys.Date() – snippet.publishedAt))

ggplot(knowledge = chart_df, mapping = aes(x = views, y = likes)) +geom_point() +geom_smooth(technique = lm) + theme_minimal()

cor(chart_df$views, chart_df$likes, technique = “pearson”)## 0.9867712

Cocomelon movies correlation between views and likes | Picture By Writer

The correlation coefficient is 0.98 very extremely correlated: with extra views on a video, it’s more likely to get extra thumbs up. It is also fascinating that solely six movies have over 2B+ views: dad and mom and youngsters get pleasure from these six movies and probably watch them many instances.

We will additional plot the favored movies and discover out that the preferred movies aged 1500–2000 days confirmed these movies had been created round 2018 or 2019.

Variety of days since printed by views | Picture By Writer

The favored video is simple to retrieve. Nevertheless, well-liked movies created 4,5 years in the past can nonetheless be trending attributable to many every day movies.

How about discovering new Cocomelon movies with views? Since we are able to solely pull the variety of views from the Youtube API for the present state, we might must retailer the info briefly by pulling knowledge from the API with some days in between.

f1 <- read_csv(“~/cocomelon_2023_2_28.csv”)df2 <- read_csv(“~/cocomelon_2023_3_2.csv”)

df1<- df1 %>% remodel(statistics.viewCount = as.numeric(statistics.viewCount))

df2<- df2 %>% remodel(statistics.viewCount = as.numeric(statistics.viewCount),snippet.publishedAt = as.Date(snippet.publishedAt))

df1 <- df1 %>% choose(snippet.title,statistics.viewCount)df2 <- df2 %>% choose(snippet.title,snippet.publishedAt,statistics.viewCount)

# Be a part of knowledge by snippet.titlejoined_df <- inner_join(df1, df2, by = ‘snippet.title’)joined_df <- joined_df %>%mutate(view_delta = statistics.viewCount.y – statistics.viewCount.x,number_days_since_publish = as.numeric(Sys.Date() – snippet.publishedAt))

# Latest Video uploaded inside 200 days and prime 5 of them by view deltachart_df <- joined_df %>%filter(number_days_since_publish<=200) %>% choose(snippet.title, view_delta) %>%organize(desc(view_delta)) %>% head(5)

ggplot(knowledge = chart_df,mapping = aes(x = reorder(snippet.title, view_delta),y = view_delta)) +geom_bar(stat = “id”, fill = “lightblue”) +scale_x_discrete(labels = perform(x)str_wrap(x, width = 16)) +theme_minimal()

# Output# snippet.title view_delta#1 🔴 CoComelon Songs Stay 24/7 – Tub Music + Extra Nursery Rhymes & Youngsters Songs 2074257#2 Sure Sure Fruits Music | CoComelon Nursery Rhymes & Youngsters Songs 1709434#3 Airplane Music | CoComelon Nursery Rhymes & Youngsters Songs 977383#4 Bingo’s Tub Music | CoComelon Nursery Rhymes & Youngsters Songs 951159#5 Hearth Truck Music – Vehicles For Youngsters | CoComelon Nursery Rhymes & Youngsters Songs 703467

New Trending Cocomelon Video | Picture By Writer

The highest trending video is 🔴 CoComelon Songs Stay 24/7. This video exhibits that oldsters can hold the youngsters with movies robotically rotating with out switching movies explicitly. The remaining movies additionally confirmed potential good single songs which can be good suggestions.

There are lots of movies to observe on Youtube for teenagers. Cocomelon has many movies, and I wish to present my child the nice ones with the restricted time he’s allowed to observe every day. Discovering these trending movies is an enchanting exploration for knowledge professionals.

I hope my put up is useful to you. As the subsequent step, I’ll proceed my journey in R and use Shiny to construct an interactive utility with customers.



Source link

Tags: AnalysisChengzhiCocomelonDataFindkidsMarPerfectVideoZhao
Next Post

Multi-Armed Bandits Utilized to Order Allocation amongst Execution Algorithms | by Lars ter Braak | Mar, 2023

Utilizing Propensity-Rating Matching to Construct Main Indicators | by Jordan Gomes | Mar, 2023

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent News

AI vs ARCHITECT – Synthetic Intelligence +

March 23, 2023

Entrepreneurs Use AI to Take Benefit of 3D Rendering

March 23, 2023

KDnuggets Prime Posts for January 2023: SQL and Python Interview Questions for Knowledge Analysts

March 22, 2023

How Is Robotic Micro Success Altering Distribution?

March 23, 2023

AI transparency in follow: a report

March 22, 2023

Most Chance Estimation for Learners (with R code) | by Jae Kim | Mar, 2023

March 22, 2023

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
A.I. Pulses

Get The Latest A.I. News on A.I.Pulses.com.
Machine learning, Computer Vision, A.I. Startups, Robotics News and more.

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
No Result
View All Result

Recent News

  • AI vs ARCHITECT – Synthetic Intelligence +
  • Entrepreneurs Use AI to Take Benefit of 3D Rendering
  • KDnuggets Prime Posts for January 2023: SQL and Python Interview Questions for Knowledge Analysts
  • Home
  • DMCA
  • Disclaimer
  • Cookie Privacy Policy
  • Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In