BT

How Apple Does Realtime Recognition of Handwritten Chinese Characters

| by Dylan Raithel Follow 9 Followers on Dec 20, 2017. Estimated reading time: 2 minutes |

A note to our readers: You asked so we have developed a set of features that allow you to reduce the noise: you can get email and web notifications for topics you are interested in. Learn more about our new features.

Apple detailed their real-time machine learning engine for recognizing handwritten Chinese characters, supporting a collection of up to 30,000 characters. The model reportedly performs with asymptotic accuracy degradation as the character pool increases in size. This allowed researchers to recognize characters from large sets like GB18030-2005, with only slightly worse accuracy than when using smaller character sets like GB2312-80.

The Chinese National Standard GB18030-2005 contains 27,533 entries, making keyboard implementation challenging over the years, so a handwritten translator to codified text is especially valuable in Chinese-speaking populations. Several versions of Chinese language sets have been adapted over the years to address variation in frequently used characters over time and geography. A large corpus of potential character values, variation in handwriting methods, nature and properties of each person's unique hand-writing style makes for a challenging machine learning problem.

Convolutional neural networks are typically used for machine learning problems focused on image recognition and labeling. Earlier research methods outlined in the article went through an evolution of model approaches over time, with stroke-order playing a significant part in sub-setting the remaining pool of character possibilities into smaller groups with the hopes of improved odds at finding a match.

While early recognition algorithms mainly relied on structural methods based on individual stroke analysis, the need to achieve stroke-order independence later sparked interest into statistical methods using holistic shape information. This obviously complicates large-inventory recognition, as correct character classification tends to get harder with the number of categories to disambiguate.

A larger pool of characters exposed underlying problems with the stroke-order based approach. Ambiguous handwriting styles, increasing complexity and computational overhead for each n number of strokes per character led Apple researchers to a more "shape driven" approach, agnostic of stroke-order.

The approach Apple employed is similar to what works well for Latin script translators based on MNIST, and where CNN's became the industry standard. But the scalability of real-time CNN's for 30-thousand or more characters made this challenge different. Collisions and ambiguities between character inventories provided additional complexity.

As a speedy input tends to drive toward cursive styles, it tends to increase ambiguity, e.g. between U+738B (王) and U+4E94 (五). Finally, increased internationalization sometimes introduces unexpected collisions: for example, U+4E8C (二), when cursively written, may conflict with the Latin characters “2” and “Z”.

Each hand-input is digested to a 48 x 48 pixel image representing the original character. This is the first convolved feature that's fed into the rest of the feed-forward neural network. The pre-processing step, or convolution is used to minimize the overall size of the CNN needed to process an image. The finite number of pixels and possible values for those pixels provide an upper-bounds on model complexity, and a reliable coarse representation of the input character that can be run through a trained network on peripheral devices like the Apple-watch.

The training data set consisted of tens of million of hand-written characters collected from a wide range of geographies and demographics throughout Chinese-speaking communities. Researchers noted that the success and accuracy should constitute good-enough performance for commercial use.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT