Friday, March 24, 2023
No Result
View All Result
Get the latest A.I News on A.I. Pulses
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
No Result
View All Result
Get the latest A.I News on A.I. Pulses
No Result
View All Result

Scraping Streaming Movies Utilizing Selenium + Community logs and YT-dlp Python

January 21, 2023
147 3
Home Natural Language Processing
Share on FacebookShare on Twitter


Introduction

Extracting video, picture URLs, and textual content from the webpage might be performed simply with selenium and exquisite soup in python. If there are URLs like “ because the src then we are able to immediately entry these movies.

Nevertheless, there are such a lot of web sites that use the blob format URLs like src=”blob: We will extract them utilizing selenium + bs4 however we can’t entry them immediately as a result of these are generated internally by the browser.

What are BLOB URLs?

Blob URLs can solely be generated internally by the browser. URL.createObjectURL() will create a particular reference to the Blob or File object which later might be launched utilizing URL.revokeObjectURL(). These URLs can solely be used domestically in a single occasion of the browser and in the identical session.

BLOB URLs are sometimes used to show or play multimedia content material, equivalent to movies, immediately in an online browser or media participant, with out the necessity to obtain the content material to the person’s native machine. They’re usually used at the side of HTML5 video parts, which permit net builders to embed video content material immediately into an online web page, utilizing a easy <video> tag.

To beat the above problem we’ve discovered two strategies that may assist to extract the video URL immediately:

YT-dlpSelenium + Community logs

YT-dlp

YT-dlp is a really useful module to obtain youtube movies and likewise extracts different attributes of youtube movies like titles, descriptions, tags, and so forth. We now have discovered a technique to extract movies from regular net pages (non-youtube) utilizing some extra choices with it. Beneath are the steps and pattern code for utilizing it.

Set up YT-dlp module for ubuntu

sudo snap set up yt-dlp

Beneath is the easy code for video URL extraction utilizing yt-dlp with the python subprocess. We’re utilizing extra choices like -f, -g, -q, and so forth. The outline for these choices might be discovered on the git hub of yt-dlp.

import subprocess

def get_video_urls(url):

videos_url = []
youtube_subprocess = subprocess.Popen([“yt-dlp”,”-f”,”all”,”-g”,”-q”,”–ignore-error”,
“–no-warnings”, url], stdout=subprocess.PIPE)
strive:
video_url_list = youtube_subprocess.talk(timeout=15)[0].decode(“utf-8”).cut up(“n”)
for video in video_url_list:
if video.endswith(“.mp4”) or video.endswith(“.mp3”) or video.endswith(“.mov”) or video.endswith(“.webm”):
videos_url.append(video)

if len(videos_url) == 0:
for video in video_url_list:
if video.endswith(“.m3u8″):
videos_url.append(video)
besides subprocess.TimeoutExpired:
youtube_subprocess.kill()

return videos_url

print(get_video_urls(url=”

Selenium + Community logs

At any time when blob format URLs are used within the web site and the video is being performed, we are able to entry the streaming URL (.m3u8) for that video within the browser’s community tab. We will use the community and efficiency logs to search out the streaming URLs.

What’s M3U8?

M3U8 is a textual content file that makes use of UTF-8-encoded characters to specify the places of a number of media information. It’s generally used to specify a playlist of audio or video information for streaming over the web, utilizing a media participant that helps the M3U8 format, equivalent to VLC, Apple’s iTunes, and QuickTime. The file sometimes has the “.m3u8” file extension and begins with an inventory of a number of media information, adopted by a collection of attribute data strains. Every line in an M3U8 file sometimes specifies a single media file, together with its title and size, or a reference to a different M3U8 file for streaming a playlist of media information.

We will extract the community and efficiency logs utilizing selenium with some superior choices. Carry out the next steps to put in all of the required packages:

pip set up selenium
pip set up webdriver_manager

Beneath is the pattern code for getting streaming URL (.m3u8) utilizing selenium and community logs:

from selenium import webdriver
from selenium.webdriver.widespread.desired_capabilities import DesiredCapabilities
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time
import json
from selenium.webdriver.widespread.by import By
import json

desired_capabilities = DesiredCapabilities.CHROME
desired_capabilities[“goog:loggingPrefs”] = {“efficiency”: “ALL”}

choices = webdriver.ChromeOptions()

choices.add_argument(“–no-sandbox”)
choices.add_argument(“–headless”)
choices.add_argument(‘–disable-dev-shm-usage’)
choices.add_argument(“start-maximized”)
choices.add_argument(“–autoplay-policy=no-user-gesture-required”)
choices.add_argument(“disable-infobars”)
choices.add_argument(“–disable-extensions”)
choices.add_argument(“–ignore-certificate-errors”)
choices.add_argument(“–mute-audio”)
choices.add_argument(“–disable-notifications”)
choices.add_argument(“–disable-popup-blocking”)
choices.add_argument(f’user-agent={desired_capabilities}’)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().set up()),
choices=choices,
desired_capabilities=desired_capabilities)

def get_m3u8_urls(url):
driver.get(url)
driver.execute_script(“window.scrollTo(0, 10000)”)
time.sleep(20)
logs = driver.get_log(“efficiency”)
url_list = []

for log in logs:
network_log = json.masses(log[“message”])[“message”]
if (“Community.response” in network_log[“method”]
or “Community.request” in network_log[“method”]
or “Community.webSocket” in network_log[“method”]):
if ‘request’ in network_log[“params”]:
if ‘url’ in network_log[“params”][“request”]:
if ‘m3u8’ in network_log[“params”][“request”][“url”] or ‘.mp4’ in network_log[“params”][“request”][“url”]:
if “blob” not in network_log[“params”][“request”][“url”]:
if ‘.m3u8’ in network_log[“params”][“request”][“url”]:
url_list.append( network_log[“params”][“request”][“url”] )

driver.shut()
return url_list

if __name__ == “__main__”:

url = ”
url_list = get_m3u8_urls(url)
print(url_list)

When you get the streaming URL it may be performed within the VLC media participant utilizing the stream choice. 

The m3u8 URL can be downloaded as a .mp4 file utilizing the FFmpeg module. It may be put in in ubuntu utilizing:

sudo apt set up ffmpeg

After putting in FFmpeg we are able to simply obtain the video utilizing the beneath command:

ffmpeg -i -c copy -bsf:a aac_adtstoasc output.mp4

Hope you want these two approaches of Advance video scraping. Do tell us when you’ve got any queries.



Source link

Tags: logsNetworkPythonScrapingSeleniumStreamingVideosYTdlp
Next Post

AIhub espresso nook: Giant language fashions for scientific writing

Approaches to Knowledge Imputation - KDnuggets

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent News

Optimize Knowledge Warehouse Storage with Views and Tables | by Madison Schott | Mar, 2023

March 24, 2023

Bard Makes use of Gmail Information | Is AI Coaching With Private Information Moral?

March 24, 2023

Key Methods to Develop AI Software program Value-Successfully

March 24, 2023

Visible language maps for robotic navigation – Google AI Weblog

March 24, 2023

Unlock Your Potential with This FREE DevOps Crash Course

March 24, 2023

High 15 YouTube Channels to Degree Up Your Machine Studying Expertise

March 23, 2023

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
A.I. Pulses

Get The Latest A.I. News on A.I.Pulses.com.
Machine learning, Computer Vision, A.I. Startups, Robotics News and more.

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
No Result
View All Result

Recent News

  • Optimize Knowledge Warehouse Storage with Views and Tables | by Madison Schott | Mar, 2023
  • Bard Makes use of Gmail Information | Is AI Coaching With Private Information Moral?
  • Key Methods to Develop AI Software program Value-Successfully
  • Home
  • DMCA
  • Disclaimer
  • Cookie Privacy Policy
  • Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In