Key Takeaways
- Writing an AR app on Android requires you to know how to work with a camera, how to use native code and algorithms for detecting faces and feature points, how to calculate pose and facial expression, and how to display results leveraging the GPU.
- Thanks to the VrFace library you can simplify all the steps required to build an AR app.
- VrFace is based on OpenCV, a very popular computer vision library which implements methods to detect faces, and dlib, a popular machine learning library providing methods to detect facial feature points.
- You can extend VrFace’s capabilities by creating additional effects implementing OpenGL shaders.
In this article, I’m going to explain how you can create augmented reality applications for Android using the open source VrFace library.
To follow our discussion, you should have a basic understanding of Android app development. i.e. how to build a Hello World app.
Once you can do that, in ten minutes you will be able to build an application to apply a mask on your face in real time. Moreover this app will also track your facial expression.
Additionally, we will focus on creating new effects based on this library using shaders. Here, you will be given a very brief introduction to effects, if you are new to the topic.
At the end, we will give a look at how the library is working internally.
A little history
Before we go deep into details, let me start explaining why this library was built. It was maybe 5 years ago when I first saw an app on the iPhone that could apply facial effects in real time.
I was really impressed by the quality of the app. But I couldn’t find such an app for Android with a similar quality level.
And for that reason I decided to write such an app.
It was a very tough process, since I’m a backend developer and don’t have professional experience on Android. I spent evenings and weekends solving technical problems one-by-one: how to use cameras, how to use native language, how to use native libraries, how to draw fast effects, and how to build everything. Eventually all the problems were solved. The app was written and with some friends of mine I tried to create a startup out of it. Eventually, we couldn’t find the way to make it a success and I almost forgot about it. Finally, I decided to derive a library from the app and share it as open-source, so anyone could use it in their apps.
Using the VrFace library to build an app
First of all, you should add the VrFace library as a dependency to the project. The library format is AAR. It has a similar structure to an APK, i.e. it contains resources, DEX packages, and a native library built for the arm architecture, which should work on most of the devices.
Overall, we need to add the dependency, initialize the library, provide configurations, model, and screen layouts. There are four files we need to touch. Everything is already done in the example repository. You can fork and clone this repository to make it easier to go through the rest of the article.
Adding the library as a dependency
First of all, we need to specify a maven dependency in build.gradle
:
maven {
url "https://maven.pkg.github.com/oleg-sta/VrFace"
credentials {
username GITHUB_USER
password GITHUB_TOKEN
}
}
Provide your credentials in gradle.properties
:
GITHUB_USER=
GITHUB_TOKEN=
To get a GitHub token, go to your account settings and generate a token with read:package
permission. To read more about credentials please go here.
After adding this configuration, the library will be downloaded from the maven repository when you build the project for the first time.
Initialization and configuration
This library requires an external model to detect and trace 68 feature points. Since this model is around 60 MB in size, it is not included in the library itself. Instead, you need to download the model. Unzip it running
bzip2 -d shape_predictor_68_face_landmarks.dat.bz2
As a result, you should have the shape_predictor_68_face_landmarks.dat
file. Rename it to sp68.dat
, and move it to the directory app/src/main/assets/
.
Let’s look now in detail at how the library is initialized. This is done in your MainActivity
class, which initializes all layouts, loads the library, and provides the required configurations.
For library initialization, we are going to use the OpenCV callback, which will be called after OpenCV loads their own native library. This is the best place to load the native library for VrFace. For historical reasons, this library is called detection_based_tracker
, as shown in the OpenCV example.
private BaseLoaderCallback mLoaderCallback = new BaseLoaderCallback(this) {
@Override
public void onManagerConnected(int status) {
switch (status) {
case LoaderCallbackInterface.SUCCESS: {
System.loadLibrary("detection_based_tracker");
Static.libsLoaded = true;
compModel.loadHaarModel(Static.resourceDetector[0]);
compModel.load3lbpModels(R.raw.lbp_frontal_face, R.raw.lbp_left_face, R.raw.lbp_right_face);
}
break;
default: {
super.onManagerConnected(status);
}
break;
}
}
};
Layout for screen
In order to use the camera, we need to add FastCameraPreview
to layout.xml
.
<com.stoleg.vrface.camera.FastCameraView
android:id="@+id/fd_fase_surface_view"
android:layout_width="match_parent"
android:layout_height="match_parent" />
This will make sure we get the frames from the camera in the format required by the library.
Additionally, we also specify view element for the result of applying our effect to the camera preview:
<android.opengl.GLSurfaceView
android:id="@+id/fd_glsurface"
android:layout_width="match_parent"
android:layout_height="match_parent" />
Writing your own effects
To add effects, we need to extend the ShaderEffect
class, as it is shown in the class ShaderEffectMask.
This class only uses one effect, which can apply a mask to a 3D face. To understand how to add more effects, you need to know what shaders are. In short, shaders are scripts that are processed on a GPU along with input information coming from textures, such as an image from the camera or any other picture that you want to apply on any 3D figure.
After all those steps, you could build and launch the app on your phone. On its first launch, the application will take about 30 seconds to initialize the library and the model, then it should apply the effect. A prebuilt APK binary can be found here.
How the library works
The VrFace library is made of a Java component that works with the camera, shaders that can be used to apply effects, and a native library. There are four important parts in the native library: the OpenCV library, which provides a method to detect a face; camera positioning; the C++ dlib library which provides a method to find 68 facial feature points, and additional methods for finding facial expression using a 3D model of the face.
Writing such a library is a very hard task and takes a lot of time. For this reason, we will cover only the basic steps:
- How to work with the Android camera
- How to use native C/C++ code on Android
- Using the OpenCV library to detect faces
- Using the dlib library to detect facial feature points
- How to detect pose and facial expression
- Using OpenGL to draw visual effects
Working with the camera
We are going to use the android.hardware.Camera package. The first thing we need to do is to get the number of cameras available on our device with Camera.getNumberOfCameras()
. Then we need to get a handle to the required camera:
mCamera = Camera.open(cameraIndex);
Additionally, we configure preview parameters:
Camera.Parameters params = mCamera.getParameters();
params.setPreviewFormat(ImageFormat.NV21);
List<Camera.Size> psize = params.getSupportedPreviewSizes();
Among the list of available preview sizes, you should choose the most appropriate based on the size of the screen.
The other complex thing to do is to start previewing. You need to create a buffer and set up a preview callback:
int size = cameraWidth * cameraHeight;
size = size * ImageFormat.getBitsPerPixel(params.getPreviewFormat()) / 8;
mBuffer = new byte[size];
mCamera.addCallbackBuffer(mBuffer);
mCamera.setPreviewCallbackWithBuffer(preview);
For the preview, you need to implement only one method:
void onPreviewFrame(byte[] data, Camera camera)
. Here, data
is a preview frame from the camera.
One thing to mention is our preview frame follows the NV21 format, which means it is split into 2 parts: a grey and a color image. This is very useful, since the next two steps are detecting a face and its feature points, which is done on the grey image. So, we just need to get the first part of the buffer to get the grey image.
Using native C/C++ code on Android
Android gives you the ability to use native code written on C/C++. For this, you need to provide a native interface implementation in a Java class, e.g.: private static native void nativeDetect(long thiz, long inputImage, long faces);
This is our glue between Java and native code. It declares a function written in C/C++ using the keyword JNIEXPORT
:
JNIEXPORT void JNICALL Java_com_stoleg_vrface_DetectionBasedTracker_nativeDetect
(JNIEnv *, jclass, jlong, jlong, jlong);
The last step is to configure the build script for the native code using Android NDK builder.
Detecting a face
There are different techniques to detect faces on a screen. Each has its own pros and cons. For our case, it should be lightweight and fast, because we need to do it in realtime and with enough precision. There is no guarantee that any library could do this reliably in all cases. For this reason, we are going to use a well-known method, haar features.
This algorithm is already available in OpenCV, one of the most popular libraries for computer vision. It is written in C/C++. First, we need to train the model that will be used for face detection. To this aim, we need to have thousands of photos with faces and without faces, i.e. positive and negative cases. Then we need to train the algorithm. The whole process is described here.
Luckily, the library gives you pre-trained models. If you are satisfied with the result provided by them, you are good to go.
In general, you need to load the model and apply it to an image:
CascadeClassifier face_cascade;
face_cascade.load( path_to_xml_model );
face_cascade.detectMultiScale( frame_gray, faces );
This returns all the faces detected in the image.
Detecting facial feature points
On the detected face, we need to find feature points to understand face orientation and facial expression. For this, we are going to use the dlib library. This is a popular library for machine learning written in C++. You can find an example of how to detect facial feature points here. You could as well use a pre-trained model.
This library gives you 68 facial feature points. Moreover it is very fast and can process a single frame in about 100 milliseconds.
Orientation and facial expression
When we have the 68 feature points, we need to calculate the face orientation and the facial expression, e.g. open jaw. For this reason, you need to have a 3D-model of the face and different expressions: smile, sadness, etc. An example of such a 3D model is CANDIDE. Between points of the 3D model and feature points we should have a link, e.g. we are saying that the point in the corner of the eye should be on both 2D and 2D model. Having such correspondence, we could calculate camera position using a method in the OpenCV library.
If we want to find facial expression, we should solve the following equation using SVD decomposition:
dst = argmin X ∥ src1⋅ X − src2∥
Here:
- scr1 – 3d model
- X — transformations on the model, including projection on the surface
- scr2 — found points on image
In other words, we are trying to find those Xs that minimize the error. Those will be our face coefficients.
In the end, we should have the position of the camera and the face coefficients, which will be used later for drawing.
Drawing effects
If we know the face orientation and the facial expression, we can draw the result using OpenGL (Open Graphics Library), a cross-platform language to program GPUs (Graphic Processing Units) for 2D and 3D graphics. It can run many simple computations in parallel.
The key concept in OpenGL is shaders. Shaders are programs that are executed on the GPU. There are two main shaders: vertex and fragment. Vertex shaders are used to do computation for points, i.e. for a given point in 3D space we need to calculate its projection on the screen. Fragment shaders are used to provide colours for each pixel, using the points on the screen as inputs.
Fragment shaders can use textures. A texture is a 2D image, for example, a face mask or input from a camera. Mixing those two textures we are creating effects.
Conclusion
In this article we showed how to build an app to create facial effects for Android using the VrFace library. To accomplish this, we added the library as a dependency to our project file, carried through all required initializations, configurations, and added some code.
We also learned a bit how the library works by describing its main responsibilities:
- Working with the Android camera
- Using OpenCV to detect faces and using computation to find camera position
- Using dlib to find facial feature points
- What is OpenGL and shaders
Building such a library requires a lot of effort, making it open source helps you to build an AR app in simple steps, as described in this article.
About the Author
Oleg Stadnichenko is a Backend Software Developer. With more than 10 years of experience, most of them in big banks and fintech, he is an enthusiast in AR/VR development.