Staff Machine Learning Engineer of Cerebral Michael Stefferson
Michael Stefferson / Arize AI
Michael Stefferson received his PhD in Physics from the University of Colorado before deciding to make the jump into machine learning (ML). He spent the last several years as a Machine Learning Engineer at Manifold, where he first started working on projects in the healthcare industry. Recently, Stefferson joined the team at Cerebral as a Staff Machine Learning Engineer and hopes to leverage data to make clinical improvements for patients that will improve their lives in meaningful ways. Here, he talks about use cases, best practices, and what he has learned along his journey into the field of ML.
What is your background and how did you first get into machine learning?
I have a PhD in physics, where I did computational and theoretical biophysics. I transitioned into ML after completing a fellowship at Insight Data Science. Then I worked for three years at Manifold before joining Cerebral.
Do you think having a physics background helped you transition to roles as a machine learning engineer?
My research wasn’t related to ML at all, but I found the transition to data science to be pretty smooth. I was already familiar with a lot of the math, and I think having experience working with research problems—where it’s not really clear how you’re going from point A to point B—was helpful.
What’s the biggest difference going from academia to industry?
Unlike academia, people are actually paying attention to what I’m doing. Industry is more focused, the projects are more clearly defined, and there’s more support. In graduate school, I might spend months going off on my own and testing things. But now it’s all about working as a team, creating a plan, and executing against it to meet real deadlines.
MORE FOR YOU
What key skills did you have to develop to prepare for the transition from academia to industry?
When I was applying and interviewing, I was familiar with the concepts, but I wouldn’t have been able to tell you what precision or recall were. It was only after my time at Manifold that I realized you need to be able to talk the talk—learning the lingo used in ML, understanding the systems, figuring out how applications actually work. That was the biggest gap for me. I wrote a lot of code in grad school, but I didn’t really know how APIs worked or all the different types of databases used in the industry. These things aren’t hard to learn. It’s just hard to get that kind of exposure in academia.
Do you have any best practices to ensure a model is ready for production?
There’s usually some sort of baseline metric that you’re trying to achieve on test sets, and you can typically back test in time to have a better sense of whether you expect the model to be stable. Things can change and your distribution of data can change, so having some confidence and really thinking through what could happen before you release the model is important. And once you do, you need to be looking at what’s going on and making sure it’s doing as expected. And then having the ability to pull the plug if you need to.
How do you prepare to pull the plug on a model that’s not working after you’ve dedicated time to it?
From a process standpoint, it’s good to establish metrics that you’re monitoring and if the model does fail, having something you can revert back to. For example, in a lot of cases, maybe there is a rules engine that was producing outputs and you can revert back to.
From a creator standpoint, it’s good to remember that everything’s a learning experience. Even if the model doesn’t work, you can still learn from it and use what you learn to build a better model the next time. That learning could be an edge case you weren’t really thinking about, or maybe your model had bias you weren’t aware of, or perhaps your user training is following a different data distribution than what you are seeing live. All of these things can happen, but you can learn from them for the next time.
How has the onboarding process been at Cerebral?
It’s been an interesting experience onboarding remotely. I think there are a lot of pros with remote work, but it’s hard because it’s important to have trust among the people you’re working with and it’s harder to build that remotely. Doing everything over Zoom or FaceTime isn’t my favorite—I much prefer in person—but I’ve met a lot of really great people and I’m excited about my immediate team and the projects I’m starting to work on.
Can you share some of Cerebral’s goals as a company and why you wanted to work there?
Cerebral is a telemental health company that offers services—therapy, counseling, online medication prescription—based on different mental health conditions, including depression, anxiety, and opioid use disorder. At Manifold I worked a bit in health care, and I think there’s a huge opportunity for data and software to help in this field. There are a lot of inefficiencies that can be made better, and I really think we can leverage data to make clinical improvements for patients and move the needle in meaningful ways that are directly affecting people’s lives.
What types of projects are you working on and how do Cerebral users interact with the product?
Cerebral has a clinician-facing application and a patient-facing application, so I’m working on projects that touch on the different apps. I’m starting to work on the clinical side, which entails implementing tools to help with clinical prescription monitoring and safety and ensuring that best practices are being upheld on the prescription side.
There are two main users. One is on the clinical side: these users are nurse practitioners, doctors, and therapists. The other is the patient-facing side, where Cerebral acts as the interface between you and your appointments or prescriptions and also features additional tools and support.
How do you select the best model for your use case?
I’m just starting to go through this process at Cerebral, but for most projects in general, there’s usually a domain expert who may not be familiar with ML but has a better sense of what’s important for a particular use case. So it’s really about figuring out what metrics are important for the given problem. Because for any regression or classification problem, there are different metrics you can measure—they’re all telling you something slightly differently and some of them might not be as appropriate for the problem at hand. It’s all about starting with the why, identifying what we want out of this, and then finding how to get it.
What are some key things to keep in mind when monitoring models in production in the healthcare industry in terms of model explainability, fairness, and bias?
When I started learning ML and data science, I was very focused on model performance. One thing I’ve learned is that most people, especially in healthcare, want to know the “why,” because there might be a dial they want to turn to improve the outcome. For example, if you’re looking at a health score and you see someone’s is higher than another person’s, you might want to know whether there is something you can do to improve that number. You care about the prediction, but you want to intervene and make it higher. I think SHAP is a great way to get a sense of that. A more academic way of doing this is through causal inference. It’s a very interesting field of mathematics that tries to get at why the score is the way it is.
Are there resources for causal inference or do you build it in-house?
Judea Pearl is one of the main contributors in the field of causal inference and he wrote “The Book of Why,” which is good. He also has a textbook called “Causality.” It’s a grad-level textbook, though, so I don’t know if I’d recommend starting there. Richard McElreath has a textbook called “Statistical Rethinking” which is a Bayesian stats course that talks about causal inference. These are all great resources for understanding the concept. And then Microsoft has a tool called DoWhy, which is a software package to help with this.
Personally, I think of causal inference as more of a framework for the development stage. A lot of covariants are going to be correlated with each other. So causal inference is asking more of a counterfactual question: If I were to tune this variable in this way, how would I expect that to change the outcome?
Do you think there should be mandatory training around AI governance, AI ethics, and data privacy for anyone working with sensitive data in the healthcare industry?
From an infrastructure perspective, it’s not that difficult to build HIPAA-compliant systems. I don’t think there’s any reason why you can’t have the same sort of standards for encrypting data at rest or in transit for other systems. At Manifold, it’s what we did for everything even though we weren’t dealing with protected health information. Making sure that it’s secure and safe should be a top priority for most people, and training is a great way to get there.
You’ve worked for several years now as a machine learning engineer at two different companies. What is your favorite and what is the most challenging aspect of the role?
I love solving problems. I find engineering challenges and data problems interesting. I like the variety of being a machine learning engineer. I’m in a pretty unique position where I get to think about data, data engineering, and software engineering all in one job. I think it can be challenging to communicate data, especially to non-technical people. People problems can be challenging, but they’re also interesting.