Skip to content Skip to navigation

Research & Ideas

Search this site

Stanford scientists tested free text-analysis tool on the web

Experiment had allowed users to drag and drop text into a linguistic analysis tool powered by machine learning.

From left, computational linguistics researcher Rob Voigt, doctoral student Richard Socher, master's student Kai Sheng Tai and visiting scholar Romain Paulus check out the website etcML, which helps users classify words or phrases that embody viewpoints. | Norbert von der Groeben

Ever wondered whether a certain TV show had a slant in favor of a political candidate?

Stanford computer scientists had created a website to gives anyone who can cut and paste the ability to answer such questions, systematically and for free.

The website was known as etcML, short for Easy Text Classification with Machine Learning. (Editor's note: The site has been taken offline now that the test project is completed.)

Machine learning is a field of computer science that develops systems that give computers the ability to acquire new understandings in a more human-like way.

The etcML website is based on machine-learning techniques that were developed to analyze the meaning embodied in text, then gauge its overall positive or negative sentiment. To access this computational engine, users drag and drop text files into a dialog box.

“We wanted to make standard machine learning techniques available to people and researchers who may not be able to program,” said Richard Socher, a doctoral candidate in computer science at Stanford and lead developer of etcML.

Socher said the new site gives researchers and citizen activists in fields ranging from political science to linguistics an easy way to analyze news articles, social media posts, closed-caption transcripts of television newscasts and other texts of possible interest.

“All users have to do is copy and paste, or drop their text datasets into their browser and click,” Socher said.

Beta users of etcML include Stanford doctoral candidate Rebecca Weiss, whose studies include political polarization and media coverage. She said the website gives her an easy way to classify words or phrases that embody viewpoints, then sift through millions of news articles and broadcast transcripts looking for patterns.

“I can train a classifier and have it label all of my content, and I don't have to write a single line of code to do it,” Weiss said. “I can then share my classifiers with journalists or other researchers for use in their work.”

Rob Voigt, a researcher in computational linguistics at Stanford, has used etcML to evaluate pitches on Kickstarter, a website that provides a platform for artists, entrepreneurs and others who are seeking financial backing for their projects.

Voigt studies what makes a successful pitch. Using etcML, he has found that pitches using plural pronouns – we, us, our – fare better than those written in the first-person singular. Likewise, short films seem to have done better than projects involving comic books, games or fashion.

“We don’t claim that our analyses are definitive, but the classification paradigm at etcML does provide meaningful clues about the likelihood of success,” Voigt said.

Chinmay Kulkarni, a doctoral student in computer science at Stanford, used etcML to help grade short answer tests for a free, online course with roughly 2,000 students. Testing for the online course presented a challenge: Multiple-choice exams were easiest to grade automatically, but short answers offered a better measure of learning. Yet the instructor couldn’t possibly read and grade 2,000 tests.

To solve this problem, students taking the course were required to grade one another. On average, four students ended up grading each exam. This increased the workload on each student. Kulkarni used etcML to help out. The software graded each test. Students still graded one another. But with the software in the grading loop, the average exam only had to be read by three or fewer students.

“We were able to get the same accuracy with less effort,” said Kulkarni, who has published a paper about the project.

Socher believes that making this drag-and-drop tool available to the public will allow many people to pursue interesting projects in semantic analysis while feeding back into the process of improving the computational engine behind the website.

“This is a free and powerful tool,” Socher said. “We hope people will use etcML and tell us what their problems are so that when we make further improvements to underlying algorithms they will have real life impact.”

The etcML development team was advised by Andrew Ng, a professor of computer science and director of the Stanford Artificial Intelligence Laboratory. Other team members include Stanford students Bryan McCann, Kai Sheng Tai and JiaJi Hu, and French visiting student Romain Paulus.

Andrew Myers, formerly of the Stanford Engineering news office, is a freelance science writer. Tom Abate is associate director of communications at Stanford Engineering. To visit and use the etcML site click here.

Get Updates from Stanford Engineering