Steve Mussmann

Assistant Professor in Georgia Tech's School of Computer Science, starting Fall 2024.

Research interests include active labeling/learning, data selection, and data-centric ML.

Georgia Tech Email, Personal Email, Google Scholar, CV


Machine learning is a tool that is incorporated in a quickly increasing variety and number of systems and processes in society. My research is driven by making ML easier-to-use, more effective, and more likely to be used in beneficial ways. My research often takes the form of abstracting machine learning issues (data efficiency, interpretability, robustness, etc.) from specific application areas (computer vision, NLP, computational biology, etc.) to discover insights that lead to more useful algorithms and more reliable best practices.​ By using a mix of theoretical and experimental techniques, my research takes a broad perspective while ensuring practical relevance

Research on learning algorithms has seen remarkable progress over the past decade, especially with regards to text and images, which has ignited interest in machine learning. While the learning algorithm is critical to an ML system, there are many other aspects that are under-studied, including data sourcing, pre-processing, annotation, cleaning, validation, and monitoring which all significantly affect the reliability and usability of the system. My work often falls under the umbrella of data-centric machine learning, where the focus is on improving the quality of the data while the model architecture and optimization algorithm are held fixed.

Much of my previous work falls into one of two categories:


CS 8803 Fall 2024, Data-Centric Machine Learning

In Fall 2024, I am teaching a special topics course CS 8803 titled "Data-centric Machine Learning". This course will be focused on reading, reviewing, and discussing research papers and working on a semester-long team research project. The students are expected to have a strong grasp of machine learning concepts, a solid background in probability and linear algebra, and the ability to implement algorithms and run experiments. This course should not be your first course in machine learning, but should be a course to build knowledge in the sub-area of data-centric ML on top of a strong ML foundation.

Tentative preliminary schedule