Friday, March 31, 2023
No Result
View All Result
Get the latest A.I News on A.I. Pulses
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing
No Result
View All Result
Get the latest A.I News on A.I. Pulses
No Result
View All Result

Pure Language Processing for low-resource languages

January 23, 2023
141 9
Home A.I News
Share on FacebookShare on Twitter


A black keyboard at the bottom of the picture has an open book on it, with red words in labels floating on top, with a letter A balanced on top of them. The perspective makes the composition form a kind of triangle from the keyboard to the capital A. The AI filter makes it look like a messy, with a kind of cartoon style.Teresa Berndtsson / Higher Pictures of AI / Letter Phrase Textual content Taxonomy / Licenced by CC-BY 4.0.

Nearly all of pure language processing (NLP) datasets and analysis at current give attention to a small variety of high-resource languages, with research on English dominating the sector. Clearly, such an imbalance is undesirable, placing those that don’t use English at a drawback.

On this article, we spotlight among the work and initiatives being carried out on low-resource languages.

Lanfrica

Africa is without doubt one of the most linguistically various areas on the planet. Regardless of this, African languages are barely represented in expertise and analysis. Lanfrica goals to mitigate the issue encountered within the discovery of African language sources by making a centralised hub. The staff at Lanfrica have constructed a language-focused search engine that makes it quick and simple to seek out info on the web about sources regarding African languages. Now with greater than 1000 sources, their goal is to catalogue and join all African language sources, one file at a time.

In addition to this platform, Lanfrica additionally hosts common on-line talks the place you may hear from researchers within the area. This speak sequence offers a platform for anybody to share/showcase their efforts (analysis, tasks, software program, purposes, datasets, fashions, initiatives, and many others.) in NLP.

Masakane

Masakhane is a grassroots organisation whose mission is to strengthen and spur NLP analysis in African languages. The organisation is at the moment engaged in various tasks, together with:

Urdu

On this paper, Maaz Amjad, Sabur Butt, Hamza Imam Amjad, Alisa Zhila, Grigori Sidorov and Alexander Gelbukh define their method when collaborating within the shared activity UrduFake@FIRE2021, which centred on pretend information detection in Urdu. This shared activity aimed to draw and encourage researchers working in several NLP domains to handle the automated pretend information detection activity and assist to mitigate the proliferation of faux content material on the internet.

The staff have additionally seemed into tweets in Urdu, of their paper Threatening Language Detection and Goal Identification in Urdu Tweets.

Indian regional languages

B. S. Harish and R. Kasturi Rangan present a complete survey on Indian regional language processing, duties similar to machine translation, named entity recognition, sentiment evaluation and parts-of-speech tagging.

Bengali

Md. Rajib Hossain and Mohammed Moshiul Hoque research Bengali phrase embedding of their paper In the direction of Bengali Phrase Embedding: Corpus Creation, Intrinsic and Extrinsic Evaluations. They presents three embedding methods with totally different hyperparameters carried out on a Bengali corpus with consists of 180 million phrases.

Indigenous languages of the Americas

Introducing QuBERT: A Massive Monolingual Corpus and BERT Mannequin for Southern Quechua, by Rodolfo Zevallos et al., introduces a big mixed corpus for deep studying of Quechua. The authors additionally present a public, pre-trained, BERT mannequin known as QuBERT. They’ve examined their corpus and its corresponding BERT mannequin on two main duties: (1) named-entity recognition (NER) and (2) part-of-speech (POS) tagging.

On this paper you may learn in regards to the AmericasNLP 2021 shared activity on open machine translation for indigenous languages of the Americas. Manuel Mager et al. report on the 214 submissions from eight groups, which focussed on 10 totally different languages: Asháninka, Aymara, Bribri, Guarani, Nahuatl, Otomí, Quechua, Rarámuri, Shipibo-Konibo, and Wixarika.

Axolotl: a Internet Accessible Parallel Corpus for Spanish-Nahuatl, by Ximena Gutierrez-Vasques, Gerardo Sierra and Isaac Hernandez Pompa, presents a venture which contains a Spanish-Nahuatl parallel corpus and its search interface.

Gina Bustamante, Arturo Oncevay, Roberto Zariquiey introduce monolingual corpora for 4 indigenous and endangered languages from Peru (Shipibo-konibo, Ashaninka, Yanesha and Yine) of their paper No knowledge to crawl? Monolingual corpus creation from PDF recordsdata of really low-resource languages in Peru.

Dysarthric speech recognition

Karima Kadaoui is researching easy methods to assist speech-impaired folks talk. A part of her venture is to construct an software to “translate” speech which can by unclear. She talks in regards to the inspiration behind her work, and what she plans to attain, on this video.

Signal language

Steven Kolawole created a dataset for Nigerian signal language with the assistance of a TV signal language broadcaster and two faculties. Utilizing this dataset, he constructed a sign-to-speech mannequin for the language. You’ll find out extra on this interview.

Of their place paper, Together with Signed Languages in Pure Language Processing, Kayo Yin, Amit Moryossef, Julie Hochgesang, Yoav Goldberg, and Malihe Alikhani name on the NLP neighborhood to incorporate signed languages as a analysis space with excessive social and scientific influence. They talk about the linguistic properties of signed languages, assessment the constraints of present signal language processing fashions, and determine the open challenges to increase NLP to signed languages.

In her paper Approaches to the Anonymisation of Signal Language Corpora, Amy Isard considers the state-of-the-art for the anonymisation of signal language corpora. She explores the motivations behind anonymisation, and particulars the processes which can be utilized to anonymise each the video and the annotations belonging to a corpus.

Additional studying

tags: AI all over the world

Lucy Smith
, Managing Editor for AIhub.



Source link

Tags: LanguageLanguageslowresourceNaturalProcessing
Next Post

Inspiring North West producers with distant robotics

Leveraging textual content analytics and AI to evaluate police narrative occasions indicating human trafficking

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent News

Interpretowalność modeli klasy AI/ML na platformie SAS Viya

March 31, 2023

Can a Robotic’s Look Affect Its Effectiveness as a Office Wellbeing Coach?

March 31, 2023

Robotic Speak Episode 43 – Maitreyee Wairagkar

March 31, 2023

What Is Abstraction In Pc Science?

March 31, 2023

How Has Synthetic Intelligence Helped App Growth?

March 31, 2023

Leverage GPT to research your customized paperwork

March 31, 2023

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
A.I. Pulses

Get The Latest A.I. News on A.I.Pulses.com.
Machine learning, Computer Vision, A.I. Startups, Robotics News and more.

Categories

  • A.I News
  • A.I. Startups
  • Computer Vision
  • Data science
  • Machine learning
  • Natural Language Processing
  • Robotics
No Result
View All Result

Recent News

  • Interpretowalność modeli klasy AI/ML na platformie SAS Viya
  • Can a Robotic’s Look Affect Its Effectiveness as a Office Wellbeing Coach?
  • Robotic Speak Episode 43 – Maitreyee Wairagkar
  • Home
  • DMCA
  • Disclaimer
  • Cookie Privacy Policy
  • Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • A.I News
  • Computer Vision
  • Machine learning
  • A.I. Startups
  • Robotics
  • Data science
  • Natural Language Processing

Copyright © 2022 A.I. Pulses.
A.I. Pulses is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In