Using Data Science to: Evaluate how my Zoom meeting went.
*All of the code for this project is available on Github.
Introduction
Long before “shelter-in-place” was a term in our collective vocabulary, I was doing much of my work remote. Part of my job was hosting Zoom calls to rollout new programs and discuss execution gaps in existing programs. The audience would regularly be +100, so typically everyone was on mute. This meant that I talked into my computer for a half-hour straight with very little feedback on how the call was going.
I spent a huge amount of time preparing for these calls, so I often wished that I could get some sort of objective feedback on how they went. I left every call with the same questions:
Was my audience interested in the topic?
Were they engaged?
Did I lose them at some point?
Did they love some particular aspect of the meeting?
I wished Zoom could generate some sort of scorecard for me so I could refine and improve, but nothing like that existed. But now that more people than ever are doing their work over video chat, I thought that a product like this could really have value. So, I set out to build one.
The Approach
To build out this proof-of-concept, there were quite a few things I was going to have to do. I broke my plan down into the following steps:
1) Find training data.
2) Train a (moderately-sized) model on it.
3) Apply the model on the Zoom meeting.
4) Save/export the data.
5) Generate a report.
1) The Data
The first step was to track down a data set that would provide labeled images displaying multiple expressions. After some searching, I came across just what I was looking for, in the form of an old Kaggle competition. This data set has thousands of pictures across 7 different emotional categories: angry, disgusted, afraid, happy, sad, surprised, neutral.
2) The Model
With the data in hand I could now move on to building a model to classify these emotions. I would normally use transfer learning to harness the power of a huge pretrained model, but because I was going to be running this on live data, I was concerned about performance. So instead, I opted to train a model from scratch so that I could minimize the number of parameters.
I landed on 6 convolutional layers of increasing size, 2 fully connected layers, and an output layer, all broken up by dropout and max pooling layers. The model is available here.
3) Applying it to Zoom
With the model trained, it was time to figure out how to apply it to a Zoom call. The issue is that I was not going to have access to the streaming video data of the call itself. So instead, I wrote a program to run it on live screen data. This way, I could just position the grid view of my call within the square and it should capture the emotional states of my participants.
This worked well for this proof of concept, but this could lead to major ethical concerns as I could use this on any call without the participants’ knowledge or consent.
4) Save/Export the Data
With this up and running on my screen, the next task was to determine what data I could capture to generate analytics. Because the model was generating confidence predictions for all 7 emotions, I decided that I could use this to track the emotional states throughout the call.
I also thought it could be nice to get some of this information live while I was still on the call, so I build a pop up chart to display the emotion distribution in real time.
The final thing I tracked was the number of faces captured at any moment. If the user turns or postures so that they are clearly not paying attention, the model will not capture their face as it was only trained on frontal images. When this happens, I report them as “distracted”.
By tracking the number of faces and comparing it to how many were expected, I can get an idea of engagement during the call.
Finally, I wrote everything to CSV once the session was closed, so I could generate an analytics report. Below is a sample of it running with a group of 3.
5) Generate an Analytics Report
The final step was to determine how to present all of this in a simple, concise, and anonymous format. The first thing I wanted to display was an overall representation of the emotions during the call. So I landed on a “radio chart” showing all 7 emotions captured.
Because “neutral” was so overwhelming, I was curious to see if the other emotions were mostly positive or negative. I combined angry, disgusted, afraid, and sad into a “negative” group, and happy and surprised into a “positive group”.
Next I wanted to evaluate engagement. As described before, I measured this by how many faces were registered in a frontal position, compared to how many faces were expected.
Finally, I was curious to understand how these all evolved over time. Were there periods of the call when everyone was very engaged, or periods where I said something that upset them? So I plotted positivity, negativity, and engagement over time to get an idea of how the meeting played out.
Putting this all together, a sample analytics report could look something like this.
Potential Uses
With the proof of concept done, there are a few ways I could see this being used productively. This would work well in meetings with a moderator or someone doing the bulk of the talking. This means that it would not be as appropriate in a collaborative meeting among peers, but it could be very valuable for a presentation.
Taking it a step further, recorded Zoom meetings generate a timestamped transcript. So big changes in engagement or attitudes in the audience could be identified programmatically and then referenced against the transcript, so the host can see exactly what they said that led to the reaction.
The other ideal use-case for this would be in online education. With kids all over the country dialing into class for the first time ever, we are in uncharted waters with managing student engagement. A program like this could be used to help give teachers insights into what is working and what isn’t, show them their engagement vs. their peers, and identify children who are routinely disengaged or upset.
However, there are major problems with all of this. First, in its current form, it’s a huge invasion of privacy. If this were built out, it would have to be disclosed and approved by users, but this could be as simple as a new paragraph in the license agreement that none of us would read or notice.
Even if users were informed and signed off, we would likely start to see unintended behavior, like fake modulation of emotions, or more creative disengagement designed to feign participation.
Conclusion
The question I set out to answer this week was: “Can I use Data Science to evaluate how my Zoom meeting went?” It took some work, but I think that the answer here is yes. But this introduces an entirely new question: “Should I?”
Unfortunately, I don’t think that this is a question we’ll get to properly debate. The truth is even if working remote isn’t the new normal, it has definitely become a whole lot more normalized through this process. Once the pandemic is over, I foresee many jobs, and even educational programs, remaining at least partly remote.
Employers and educators are going to want new metrics for this new way of working, and this solution could meet those needs. So ultimately, I do not think that this is a matter of “if”, but “when”. I just hope that whoever does make this a reality stays cognizant of the risks and ethical concerns. And if that someone needs help, I’m just a phone call away.