Dartmouth logo Dartmouth College Computer Science
Technical Report series
CS home
TR home
TR search TR listserv
By author: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
By number: 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986

SCML: A Structural Representation for Chinese Characters
Daniel G. Peebles
Dartmouth TR2007-592


Chinese characters are used daily by well over a billion people. They constitute the main writing system of China and Taiwan, form a major part of written Japanese, and are also used in South Korea. Anything more than a cursory glance at these characters will reveal a high degree of structure to them, but computing systems do not currently have a means to operate on this structure. Existing character databases and dictionaries treat them as numerical code points, and associate with them additional `hand-computed' data, such as stroke count, stroke order, and other information to aid in specific searches. Searching by a character's `shape' is effectively impossible in these systems.

I propose a new approach to representing these characters, through an XML-based language called SCML. This language, by encoding an abstract form of a character, allows the direct retrieval of important information such as stroke count and stroke order, and permits useful but previously impossible automated analysis of characters. In addition, the system allows the design of a view that takes abstract SCML representations as character models and outputs glyphs based on an aesthetic, facilitating the creation of `meta-fonts' for Chinese characters. Finally, through the creation of a specialized database, SCML allows for efficient structural character queries to be performed against the body of inserted characters, thus allowing people to search by the most obvious of a character's characteristics: its shape.

Note: Senior Honors Thesis. Advisor: Devin Balkcom.

PDF PDF (1480KB)

Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]

Or copy and paste:
   Daniel G. Peebles, "SCML: A Structural Representation for Chinese Characters." Dartmouth Computer Science Technical Report TR2007-592, May 2007.

Notify me about new tech reports.

Search the technical reports.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu

Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Technical reports collection maintained by David Kotz.