Thursday, March 23, 2023
No Result
View All Result
Get the latest A.I News on A.I. Pulses
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
No Result
View All Result
Get the latest A.I News on A.I. Pulses
No Result
View All Result

5 Genuinely Helpful Bash Scripts for Knowledge Science

February 16, 2023
140 10
Home Data science
Share on FacebookShare on Twitter


Picture by creator
 

Python, R, and SQL are sometimes cited because the most-used languages for processing, modeling, and exploring knowledge. Whereas that could be true, there is no such thing as a purpose that others cannot be — or usually are not being — used to do that work.

The Bash shell is a Unix and Unix-like working system shell, together with the instructions and programming language that associate with it. Bash scripts are packages written utilizing this Bash shell scripting language. These scripts are executed sequentially by the Bash interpreter, and might embody all the constructs sometimes present in different programming languages, together with conditional statements, loops, and variables.

Widespread Bash script makes use of embody:

automating system administration duties
performing backups and upkeep
parsing log recordsdata and different knowledge
creating command-line instruments and utilities

Bash scripting can be used to orchestrate the deployment and administration of complicated distributed programs, making it an extremely helpful talent within the arenas of knowledge engineering, cloud computing environments, and DevOps.

On this article, we’re going to try 5 totally different knowledge science-related scripting-friendly duties, the place we must always see how versatile and helpful Bash may be.

 

 

Clear and Format Uncooked Knowledge

 Right here is an instance bash script for cleansing and formatting uncooked knowledge recordsdata:

#!/bin/bash

# Set the enter and output file paths
input_file=”raw_data.csv”
output_file=”clean_data.csv”

# Take away any main or trailing whitespace from every line
sed ‘s/^[ t]*//;s/[ t]*$//’ $input_file > $output_file

# Exchange any commas inside quoted fields with a placeholder
sed -i ‘s/”,”/,/g’ $output_file

# Exchange any newlines inside quoted fields with a placeholder
sed -i ‘s/”,”/ /g’ $output_file

# Take away the quotes round every area
sed -i ‘s/”//g’ $output_file

# Exchange the placeholder with the unique comma separator
sed -i ‘s/,/”,”/g’ $output_file

echo “Knowledge cleansing and formatting full. Output file: $output_file”

 

This script:

assumes that your uncooked knowledge file is in a CSV file referred to as raw_data.csv
saves the cleaned knowledge as clean_data.csv
makes use of the sed command to:

take away main/trailing whitespace from every line and change any commas inside quoted fields with a placeholder
change newlines inside quoted fields with a placeholder
take away the quotes round every area
change the placeholder with the unique comma separator

prints a message indicating that the info cleansing and formatting is full, together with the placement of the output file

 

Automate Knowledge Visualization

 Right here is an instance bash script for automating knowledge visualization duties:

#!/bin/bash

# Set the enter file path
input_file=”knowledge.csv”

# Create a line chart of column 1 vs column 2
gnuplot -e “set datafile separator ‘,’; set time period png; set output ‘line_chart.png’; plot ‘$input_file’ utilizing 1:2 with strains”

# Create a bar chart of column 3
gnuplot -e “set datafile separator ‘,’; set time period png; set output ‘bar_chart.png’; plot ‘$input_file’ utilizing 3:xtic(1) with bins”

# Create a scatter plot of column 4 vs column 5
gnuplot -e “set datafile separator ‘,’; set time period png; set output ‘scatter_plot.png’; plot ‘$input_file’ utilizing 4:5 with factors”

echo “Knowledge visualization full. Output recordsdata: line_chart.png, bar_chart.png, scatter_plot.png”

 

The above script:

assumes that your knowledge is in a CSV file referred to as knowledge.csv
makes use of the gnuplot command to create three several types of plots:

a line chart of column 1 vs column 2
a bar chart of column 3
a scatter plot of column 4 vs column 5

outputs the plots in png format and saves them as line_chart.png, bar_chart.png, and scatter_plot.png respectively
prints a message indicating that the info visualization is full and the placement of the output recordsdata

Please notice that for this script to perform, one would wish to regulate the column numbers and sorts of charts primarily based in your knowledge and necessities.

 

Statistical Evaluation

 Right here is an instance bash script for performing statistical evaluation on a dataset:

#!/bin/bash

# Set the enter file path
input_file=”knowledge.csv”

# Set the output file path
output_file=”statistics.txt”

# Use awk to calculate the imply of column 1
imply=$(awk -F’,’ ‘{sum+=$1} END {print sum/NR}’ $input_file)

# Use awk to calculate the usual deviation of column 1
stddev=$(awk -F’,’ ‘{sum+=$1; sumsq+=$1*$1} END {print sqrt(sumsq/NR – (sum/NR)**2)}’ $input_file)

# Append the outcomes to the output file
echo “Imply of column 1: $imply” >> $output_file
echo “Commonplace deviation of column 1: $stddev” >> $output_file

# Use awk to calculate the imply of column 2
imply=$(awk -F’,’ ‘{sum+=$2} END {print sum/NR}’ $input_file)

# Use awk to calculate the usual deviation of column 2
stddev=$(awk -F’,’ ‘{sum+=$2; sumsq+=$2*$2} END {print sqrt(sumsq/NR – (sum/NR)**2)}’ $input_file)

# Append the outcomes to the output file
echo “Imply of column 2: $imply” >> $output_file
echo “Commonplace deviation of column 2: $stddev” >> $output_file

echo “Statistical evaluation full. Output file: $output_file”

 

This script:

assumes that your knowledge is in a CSV file referred to as knowledge.csv
makes use of the awk command to calculate the imply and normal deviation of two columns
separates the info by a comma
saves the outcomes to a textual content file statistics.txt.
prints a message indicating that the statistical evaluation is full and the placement of the output file

Word that you would be able to add extra awk instructions to calculate different statistical values or for extra columns.

 

Handle Python Bundle Dependencies

 

Right here is an instance bash script for managing and updating dependencies and packages required for knowledge science initiatives:

#!/bin/bash

# Set the trail of the digital surroundings
venv_path=”venv”

# Activate the digital surroundings
supply $venv_path/bin/activate

# Replace pip
pip set up –upgrade pip

# Set up required packages from necessities.txt
pip set up -r necessities.txt

# Deactivate the digital surroundings
deactivate

echo “Dependency and bundle administration full.”

 

This script:

assumes that you’ve got a digital surroundings arrange, and a file named necessities.txt containing the bundle names and variations that you just wish to set up
makes use of the supply command to activate a digital surroundings specified by the trail venv_path.
makes use of pip to improve pip to the most recent model
installs the packages specified within the necessities.txt file
makes use of the deactivate command to deactivate the digital surroundings after the packages are put in
prints a message indicating that the dependency and bundle administration is full

This script ought to be run each time you wish to replace your dependencies or set up new packages for an information science challenge.

 

Handle Jupyter Pocket book Execution

 

Right here is an instance bash script for automating the execution of Jupyter Pocket book or different interactive knowledge science environments:

#!/bin/bash

# Set the trail of the pocket book file
notebook_file=”evaluation.ipynb”

# Set the trail of the digital surroundings
venv_path=”venv”

# Activate the digital surroundings
supply $venv_path/bin/activate

# Begin Jupyter Pocket book
jupyter-notebook $notebook_file

# Deactivate the digital surroundings
deactivate

echo “Jupyter Pocket book execution full.”

 

The above script:

assumes that you’ve got a digital surroundings arrange and Jupyter Pocket book put in in it
makes use of the supply command to activate a digital surroundings, specified by the trail venv_path
makes use of the jupyter-notebook command to begin Jupyter Pocket book and open the required notebook_file
makes use of the deactivate command to deactivate the digital surroundings after the execution of Jupyter Pocket book
prints a message indicating that the execution of Jupyter Pocket book is full

This script ought to be run each time you wish to execute a Jupyter Pocket book or different interactive knowledge science environments.

 I am hoping that these easy scripts have been sufficient to point out you the simplicity and energy of scripting with Bash. It may not be your go-to resolution for each state of affairs, nevertheless it definitely has its place. Better of luck in your scripting.

  Matthew Mayo (@mattmayo13) is a Knowledge Scientist and the Editor-in-Chief of KDnuggets, the seminal on-line Knowledge Science and Machine Studying useful resource. His pursuits lie in pure language processing, algorithm design and optimization, unsupervised studying, neural networks, and automatic approaches to machine studying. Matthew holds a Grasp’s diploma in pc science and a graduate diploma in knowledge mining. He may be reached at editor1 at kdnuggets[dot]com. 



Source link

Tags: BashDataGenuinelyScienceScripts
Next Post

Machine Studying Helps Social Media Entrepreneurs Earn Larger ROIs

como usar a tecnologia para prever riscos

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent News

AI vs ARCHITECT – Synthetic Intelligence +

March 23, 2023

KDnuggets Prime Posts for January 2023: SQL and Python Interview Questions for Knowledge Analysts

March 22, 2023

How Is Robotic Micro Success Altering Distribution?

March 23, 2023

AI transparency in follow: a report

March 22, 2023

Most Chance Estimation for Learners (with R code) | by Jae Kim | Mar, 2023

March 22, 2023

Machine Studying and AI in Insurance coverage in 2023

March 22, 2023

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
A.I. Pulses

Get The Latest A.I. News on A.I.Pulses.com.
Machine learning, Computer Vision, A.I. Startups, Robotics News and more.

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
No Result
View All Result

Recent News

  • AI vs ARCHITECT – Synthetic Intelligence +
  • KDnuggets Prime Posts for January 2023: SQL and Python Interview Questions for Knowledge Analysts
  • How Is Robotic Micro Success Altering Distribution?
  • Home
  • DMCA
  • Disclaimer
  • Cookie Privacy Policy
  • Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In