Figure 1: Inspiration

Figure 1: Inspiration

Concept

This project aims to depict the ways in which body parts inspire media devices like camera lenses & speakers in their appearance and/or functionality. By tracking facial movements (i.e. blinking and speaking) the program will trigger certain animations that illustrate the user as a cyborg.

Process

In order to track facial movements, the ml5 FaceMesh machine-learning model, which provides facial landmark detection by tracking 468 key-points from the face, was utilized in the code (see Fig. 2). After importing the FaceMesh library, the key-points for the eyes and the mouth were isolated as these were the primary focus in creating the cyborg appearance.

Figure 2: Importing FaceMesh

Figure 2: Importing FaceMesh

In isolating those specific landmarks, I initially chose to target the key-points for the eyelid & the lash-line on the eyes and the outer perimeter of the lips. By doing this, I would then track the distance between the key-points to detect when the user was blinking and/or talking (see Fig. 3 & Fig. 4). Based on this logic, if a user was blinking the distance between the eyelid & the lash-line would decrease, whereas the distance between the top lip and the bottom lip would increase when the user begins talking.

Figure 3: FaceMesh Keypoint Diagram

Figure 3: FaceMesh Keypoint Diagram

Figure 4: Planning how to trigger the animations

Figure 4: Planning how to trigger the animations

Testing

Once the targeted key-points were isolated I tested my theory regarding the distance, by simply moving my mouth and blinking to see if the key-points would significantly move closer or farther apart, which was partially successful. The key-points for the lips showed a significant difference in position to track the talking, however, those key-points chosen for the eyes failed as they were limited in their movement. So, I chose to target the key-points for the eyebrows in lieu of the eyelids, which worked as a slightly better key-point to track blinking movement. Following this change, I then tested out what the actual distance threshold was for the blinking and talking movements (see Fig. 5). After applying these threshold values to a boolean, I was able to trigger the flash animation for blinking and a speaker animation for talking.

Figure 5: Testing the distance threshold

Figure 5: Testing the distance threshold

Conclusion

At the conclusion of the first iteration of this project, I found that I was able to develop the concept of Cyborg at a basic level (see Fig. 6). However, there were some things that I found could be revised to allow for a better user experience.

Figure 6: Results from first iteration

Figure 6: Results from first iteration

Room for Improvement

The biggest issue up to this point is capturing the distance without so much volatility. I found that the thresholds that I used were not consistent based on the depth of the user (how close they are to the camera) as well as the size of their facial features (e.g. thick lips vs. thin lips). This inconsistency results in triggering of animations at the wrong time, or the complete opposite where the animations are not triggered when they user does the appropriate action. To address this issue in the next iteration, I’m considering either using percentages for measuring the distance in lieu of hardcoded numbers, or using another machine-learning model that I can train to detect facial movements.