Established in 2020 Wednesday, March 27, 2024


EPFL student creates a new language-analysis program
Jonathan Besomi, a Master’s student at EPFL, has developed a program called Texthero that lets users generate representations of textual data with just a few lines of code, thereby simplifying the analysis of natural languages. Image: Christian Wiediger, Unsplash.



LAUSANNE.- We now live in a data-filled age that has ushered in its own distinct challenges. One of the biggest is how to analyze vast reams of information. In response, Besomi, a Master’s student in data science, has developed Texthero, a program that simplifies the task of analyzing textual data. It was created in the spring of 2020 under the supervision of Kenneth Younge, Chair of Technology and Innovation Strategy at EPFL’s Management of Technology & Entrepreneurship Institute. Designed as open-source software and written in the Python programming language, Texthero swiftly won over developers around the world.

“Texthero has been downloaded over 23,000 times so far, and has been awarded 2,000 stars on the Github platform,” says Besomi. “It got a lot of attention as soon as we released it – people even began sharing it on social media, primarily Twitter and LinkedIn. This indicates that there was strong demand for such a program in the Python/NLP [Natural Language Processing] community.”

Rapid visual representations
Using Texthero, developers can quickly visualize and understand text-based datasets. “Our program takes a text made up of unstructured data, cleans it up, generates a representation of it by converting it into digital format, and finally visualizes it. In other words, Texthero gives users an overall idea of the structure of a completely unfamiliar text,” explains Besomi.




The rudiments of Texthero first came to Besomi when he was working with Professor Younge on Fastlaw, a program for analyzing legal texts. “Fastlaw is a ‘word-embedding’ tool that was trained on a large corpus of legal data provided by Harvard University’s Caselaw Access Project (CAP) – a project to make every ruling published by US courts freely available,” says Besomi. He and Younge presented their program to the Harvard Law School Library.

“As I was developing Fastlaw, I realized there was a need for software that could quickly pre-process, represent and visualize textual data,” says Besomi. Before Texthero, developers who wanted to process natural language were forced to use a series of applications, such as spaCy, scikit-learn, Gensim and NLTK. The process was both time-consuming and complex. “Now, with Texthero, just a few lines of code are enough to plot a text to be processed.”

A new version
To date, 16 developers have contributed to Texthero through pull requests on Github. They’ve fixed bugs, introduced new features and improved the documentation. “We're about to release a new version (1.1) that will boost text processing speeds even further,” says Besomi.

Besomi now wants to consolidate and expand the Texthero community through blog posts and tutorials, in order to increase uptake of his program. “When I think about the billions of pieces of data around us that we can't assimilate, it would seem that text analysis – in all its forms – is the wave of the future," says Besomi, who is currently completing an in-company internship at IBM Research Zurich and writing a thesis on text analysis. “I'm fascinated by these issues and pleased to have created a simple, straightforward program that makes natural language processing easier.”







Today's News

January 14, 2021

Remote sensing data sheds light on when and how asteroid Ryugu lost its water

Single-dose COVID-19 vaccine triggers antibody response in mice

Stanford researchers combine processors and memory on multiple hybrid chips to run AI on battery-powered smart devices

New virus mutation raises vaccine questions

'Game of Thrones' dire wolves far apart from other canines: study

British virus variant now in 50 countries: WHO

Mini robot fish swim in schools, just like the real thing

'Say ahh': Chinese robots take throat swabs to fight Covid outbreak

Superheroes, foods and apps bring a modern twist to the periodic table

Anti-microbial poles for public transport to be made in light of Covid-19 pandemic

The three days pregnancy sickness is most likely to start pinpointed

Mechanophores: Making polymer crystallization processes crystal clear

How will we achieve carbon-neutral flight in future?

How to keep drones flying when a motor fails

EPFL student creates a new language-analysis program

Climate change has caused billions of dollars in flood damages, according to Stanford researchers

Compound from medicinal herb kills brain-eating amoebae in lab studies

Spilling the beans on coffee's true identity



 


Editor & Publisher: Jose Villarreal
Art Director: Juan José Sepúlveda Ramírez



Tell a Friend
Dear User, please complete the form below in order to recommend the ResearchNews newsletter to someone you know.
Please complete all fields marked *.
Sending Mail
Sending Successful