JJDic --Yet Another Java Jdic
Hank Cohen
email: h.cohen@computer.org
February 9, 1998
Table of Contents
1. Introduction
2. Acknowledgements
3. Copyrights
4. Installation
    4.1 Distribution format
    4.2 System Requirements
      4.2.1Java
        4.2.1.1 Classpath
        4.2.1.2 Java Font Management
      4.2.2 Fonts & Japanese Language Support
      4.2.3 Locale
      4.2.4 Memory
    4.3 Initialization file
    4.4 Dictionaries
    4.5 Indices
5. Use
    5.1 Startup
    5.2 GUI Controls
      5.2.1 Menu Bar
        5.2.1.1 File Menu
        5.2.1.2 Dictionaries Menu
        5.2.1.3 Filters Menu
        5.2.1.4 Options Menu
        5.2.1.5 Help Menu
      5.2.2 Main Dictionaries
      5.2.3 Kanji Dictionary
        5.2.3.1 Lookup Methods: Character, Yomikata, English
        5.2.3.2 Lookup Methods: Bushu plus strokes
6.0 Known Problems
    6.1 Radical & stroke search is heuristic not deterministic
7. Things to Do
    7.1 Internationalization
    7.2 Printing
    7.3 Help
    7.4 Filters
    7.5 Client/Server
8. Reporting Problems
    8.1 Beta Test Issues
       

1. Introduction

JJDic is yet another jdic application written in Java. It provides a simple but powerful graphic user interface to access edict and other glossaries that follow the edict format. It differs from previous efforts in several ways. First of all, all text is represented internally as Unicode, this allows it to interface seamlessly with the host computerfs input methods and to use some of Javafs internal string processing functions. Java 1.1 provides conversion facilities to translate between various local language character encodings and Unicode. Interfacing with the Japanese input method to the extent required by this application required no additional programming effort.

The GUI design is by intention very simple with a minimum of controls and windows. Most interactions take place within a single frame. My expectation is that users will startup the application and leave it loaded in the background using it from time to time as necessary. The user can cut and paste from other application into the input area and output can be copied out through the host system clipboard.

Kanjidic can be accessed either by radical+stroke count, Japanese readings or English key meanings. The radical+stroke entry can also be used to do kanji data entry.

I wrote JJDic as a pedagogic exercise to teach myself Java. As such it was a great success. It covers a surprising range of the features available in Java. I hope that it can also be a useful application. You, the users, can help it to be a success as an application by letting me know itfs shortcomings so that I can make it better.
 

2. Acknowledgements

As a wise FE once remarked over beers "It all comes down to data entry." This could not be more true than in the area of online dictionaries. I have written an application that I hope may be useful to students of Japanese (myself included) but it could never have even been dreamed of had not Jim Breen compiled the underlying dictionary databases. His help in clarifying some of the more mysterious aspects of EDICT and Kanjidic have also been invaluable.

I am also indebted to the providers of all of the free software that I used to develop this program. Sun Microsystems has created a marvelous development language in Java and has generously made it available to developers for free with very reasonable licensing. Nobody involved with this project will have to pay any royalties for Java or for the runtime system. This application is 100% pure Java but it does not carry any such logo because I donft want to pay for the verification of my test results.

I am also indebted to the GNU project and all of itfs contributors for emacs and the Java Development Environment (JDE). This was my first major development effort on Windoze and I suppose I should thank Bill Gates for not providing a decent program editor with that system. It forced me to learn emacs, something that I had not bothered to do in 20 years of UNIX experience.

3. Copyrights

This is an area in which I am still actively searching my soul. My inclination is to abandon rights through the GNU public license. If I do so it will be for two reasons, first the good example of Jim Breen and second because the wide availability of comparable applications makes commercialization impractical. Actually there is a third which says that people who pay good money for software should get the best and although edict is good it lacks a lot of information that a good dictionary really should have like usage notes and etymology. Serious students of Japanese should invest in a really good dictionary. My favorite electronic dictionary is @JISPA Grand from Gakken. The one reason that I have for not abandoning all rights is that one of my planned enhancements might make this into a commercially viable product. That would be the client/server implementation which would allow it to sit on a server and interface to applets on client browsers.

So for now I reserve all commercial rights. I am distributing all source code but use is restricted to non-commercial applications. A notice to this effect is included in each source file.

In any case the source code is included and users are encouraged to read it and hack on it if the spirit moves you. It is fairly well commented with explanations of the algorithms as well as my observations on Java and programming style.
 

4. Installation

    4.1 Distribution format

    JJDic is distributed as a single zip file and this html document. The zip file contains all of the Java source code for the application and a jar archive of the runtime objects for the application. If Sun is correct about "write once run anywhere" then all anybody should need is this document and the jar file. If I had access to a UNIX machine I would also prepare a tar file with the same contents but I donft have such access at the present time. Anybody care to help?

    4.2 System Requirements

      4.2.1Java

       JJDic is an application not an applet therefore it cannot be run by the Java interpreter in your favorite browser. (It is also Java 1.1 not 1.0 so most browsers wouldnft run it anyway.) So you will need a Java interpreter and associated class libraries. If you are interested in Java generally and want to do programming you can get the entire Java Development Kit from http://java.sun.com/products/jdk/1.1/ or you can make due with the Java Runtime Environment. http://java.sun.com/products/jdk/1.1/jre/ . There is no charge for either product.

        4.2.1.1 Classpath

        Java uses the environment variable CLASSPATH to locate its runtime objects (classes in the Java ergot). When the Java Runtime Environment is installed on Windoze it records its installed location in the Registry. It can therefore always find its own runtime classes. The runtime classes for JJDic are stored in a "jar" file "jdic.jar". "jar" is a Java Archive file. The CLASSPATH must include this jar file explicitly by name. Alternatively if you have the jar utility you could unpack all of the component class files into a directory that is in the CLASSPATH. If you plan to hack on the source code you will probably want to put your classes directory into the CLASSPATH.

         4.2.1.2 Java Font Management

        There are many aspects of Java that make compromises in order to present a consistent environment on a diverse collection of different systems. As one might expect portability means giving up a lot of access to the uniqueness of the different systems. One area where this is evident is access to fonts. Java only understands itfs own internal font names; Serif, SansSerif, Monospace, Dialog, and DialogInput. It also may recognize TimesRoman, Courier, and Zapfdingbats but these are aliases for the Java abstract names. There is a file in the Java/lib directory called font.properties.<locale>. This file controls the mapping of host system fonts to the Java aliases. Within the program I can only see the pseudo-names not the original font families so you will not be able to tell from the font selection menu what font you will really get. There is a useful technical note on setting up this font.properties file on the java website http://www.javasoft.com/products/jdk/1.1/docs/guide/intl/fontprop.html. There is also another paper on Unicode and font management available http://www.javasoft.com/products/jdk/1.1/docs/guide/intl/unicode_font.doc.html . In order to display Japanese characters there must be at least one Japanese font included in the standard font mappings.

       4.2.2 Fonts & Japanese Language Support

      This application requires some level of Japanese language support from the underlying host system. At a minimum the system must have Japanese fonts installed. If the host has a Japanese input method then the user should be able to use that to enter Japanese strings directly into the application.

      I developed the application on Windoze 95 Japanese version and I know that all of the necessary components are there, I can only guess for other systems. I thought that Microsoft offered Japanese support for non Japanese operating systems but an hour of searching on their web site failed to turn it up. If you have access to the CD-ROM version of Office you can install Japanese language support from the file c\valuepack\Fareast\Jpnsupp.exe. This contains the Japanese fonts and, I believe, the IME front end processor.

      For Apple users the Japanese Language Kit provides a very nicely integrated way to run both Japanese and English applications on the same system. I am not sure exactly how you should register the application since Java is really an interpreter. The class jdic.JJDic.class should certainly be a Japanese application but the JLKfs registration function surely doesnft know anything about Java classes and you probably donft want to make the whole Java run time system a Japanese application, although you might. I would be very interested to hear by email from Apple users and particularly JDK users. Unfortunately the JLK will set you back about a hundred bucks but if you are serious about studying Japanese you will probably have to spend it eventually.
      Unix users are probably running X-windows. Japanese fonts are available in the public domain for kterm and wnn. These should probably work fine. If you can use kterm then JJDic should also work fine.
       

      4.2.3 Locale

      Proper display of Japanese fonts does not really depend on locale but proper interpretation of kanji input does. The Java runtime system uses the locale to determine local encoding to Unicode mappings. Unix users will usually use EUC whereas Mac and Windoze users will use Shift JIS. The Java runtime will transparently translate between these internal and external codings but the locale must be properly set. How this is done depends on the particular system and I canft give much general guidance. If you are having problems I will try to help by email.

      4.2.4 Memory

      I have 16 MB in my system and it can thrash pretty severely with JJDic and emacs and a few other things loaded. JJDic with kanjidic edict and a bunch of glossaries loaded occupies about 12 MB. However when I load the dictionaries there are some huge temporary buffers used and these explode runtime memory requirements to 30 MB. Unfortunately once Java gets memory from the OS it never returns it so the runtime process will grow to 30 MB and stay there. Obviously the more physical memory that you have the better.

    4.3 Initialization file

    JJDic uses an initialization file $HOME/.jjdic . The problem with this is that the definition of $HOME is OS dependent. There is a natural meaning on Unix and on Windoze-NT but the meaning on Windoze-95 or Mac-OS is vague at best. If there is no home directory defined the Java runtime system will assume that it is the same as the directory where the Java interpreter is installed. If the file is not found then an error message will be printed showing where we looked. The program will then continue and put up a file dialog so the user can tell the program where to find kanjidic. After kanjidic is loaded other dictionaries can be loaded from the File menu.

    The initialization file .jjdic is a Java properties file, it can contain definitions of two properties:

    1. Dictionaries: this is a list of full path names for the dictionaries and glossaries to be loaded automatically on startup. There are no defaults for dictionaries, if no Dictionaries property is present then the application will assume that dictionaries will be loaded from the menu interface. The Dictionaries property consists of the word "Dictionaries" followed by a space and a list of full path names the path names are separated by the system separator character which is e;f on Windoze and e:f on Unix and I donft know what on MacOS. One thing to be careful of is that the backslash e\f directory separator used by Windoze will be interpreted as an escape character when read by Java so it must be doubled. e.g. c:\java\kohi must be written c:\\java\\kohi. Each entry can have an optional prefix separated from the path by a comma e,f , to specify the character encoding. The prefixes are those understood by the Java runtime system, EUCJIS, SJIS, JIS. If no encoding prefix is given then EUCJIS is assumed.
    2. Kanjidic: The kanjidic property tells JJDic where to find kanjidic. The property consists of the keyword "Kanjidic" followed by a blank and a single full path name. If the Kanjidic property is not present JJDic will prompt for itfs location. JJDic cannot initialize without kanjidic.

    4.4 Dictionaries

    JJDic must have kanjidic to properly initialize itself. It can then load any number of dictionaries in edict format. Dictionaries can use JIS and SJIS encoding as well as the standard EUCJIS encoding used by edict.

    4.5 Indices

    Since JJDic uses Unicode internally it must generate its own indices. The indices for edict use the EUC internal representation and cannot be used by JJDic. Unlike other edict applications JJDic will generate its own indices automatically if they cannot be located in the same directory as the dictionary. The index also contains a checksum of the dictionary that it indexes. If the dictionary is changed or if the encoding translation changes the checksum will detect a mismatch between the dictionary and the index and generate a new index. JJDic indices use the filename extension .jndx. (My apologies to Windoze 3.x users. I suggest that you upgrade because Ifll be dammed if I restrict my names to 8.3.)
       
    N.B. Index generation is rather time consuming so be prepared to wait for a good long while when you first load a new dictionary.

    Technical Note: Sorting the index. It appears that the index of edict is sufficiently orderly to push quicksort performance towards itfs O(N2) worst case. After trying several variants of quicksort I finally gave up and used Shellsort which is always O(n log2 n). The difference was immediately noticable.

5. Use

    5.1 Startup

    JJDic is started from the command line of Unix or DOS/Windoze with the following command:

    % java ?mx30m jdic.JJDic

    I donft know how this works on Macintosh where there is no command line but I hope somebody will write to tell me. The ?mx30m argument means that the Java virtual machine must be able to expand to 30 MB. Anything less and edict will not load.

    Regular users will probably want to make an alias or batch file. In under Windoze you can then make a shortcut to the batch file and add it to your start menu. Icons I leave up to you.

    5.2 GUI Controls

    JJDic has a very simple user interface Figure 1shows the main dictionary window. Under the menubar there is a text input area where the user can enter words to be looked up in the dictionary. Next there is a checkbox that controls whether a match must occur at the beginning of a line or anywhere in the dictionary. Next there is a button for kanji data entry and finally a button for kanjidic searches. The main central area is a text output area where the results of searches is displayed and at the bottom of the screen there is a status bar to indicate what the application is doing. I will discuss each of these screen objects in turn.

      5.2.1 Menu Bar

      The Menu Bar holds menu items that affect the global environment and behavior of the application. It has five submenus attached "File", "Dictionaries", "Filters", "Options", and "Help. The "Filters" and "Help menus are not implemented in this release.

        5.2.1.1 File Menu

        The file menu  has two options Open Dictionary and Exit. Each of these options has a keyboard shortcut Cntl+O to open a new dictionary and Cntl+X to exit the application. The Print option is not implemented in this release. The Exit option simply closes the application windows and exits it is equivalent to clicking the close window icon on the window frame.

        The Open Dictionary option presents the user with a dialog for opening new glossary files. The dialog box can be seen in Figure 2. The text area at the top of the dialog box allows the user to enter a file or path name directly. The button labeled "Search" will bring up a file selection dialog. Either one can be used to enter the pathname of the dictionary to be opened. In the middle of the dialog is a set of radio buttons to select the input encoding of the dictionary file. The checkbox at the bottom should be checked if no index exists or if the index may be out of date. The only reason not to leave it checked is if the user does not want to wait for index generation. Finally the user should click either the "OK" button to open the dictionary or the "Cancel" button or the close button on the window frame to close the dialog without opening a dictionary. When a dictionary has been successfully opened itfs name will appear on the Dictionaries menu.

        5.2.1.2 Dictionaries Menu

        The Dictionaries menu shows which dictionaries have been loaded and controls which dictionaries will be searched. Each loaded dictionary appears as a checkbox menu item in this dictionary. If the item is checked then the dictionary will be searched, if not it wonft. There is nothing to be gained by allowing dictionaries to be unloaded because even if the dictionary were unloaded and garbage collected by the Java interpreter the memory would not be returned to the OS and the process size of the application would not be reduced.

        5.2.1.3 Filters Menu

        The filters option is not implemented in this release. Most of the justification for filters has become moot since proper names and place names have been removed from edict. To ignore names the user can just disable the names dictionary from the dictionary menu.

        5.2.1.4 Options Menu

        There are two user settable options in the Options menu. First the user can set the font family and pointsize. Unfortunately the font family is listed as one of the Java pseudo family names "Dialog", "Serif", "Sansserif", etc. not the more useful true font names Osaka, Mincho etc. The user must just try them out and see what you get.

        The second option is "Showdic" this checkbox menu item controls the listing of the dictionary name in the output area. In cases where multiple dictionaries are in use the user may want to see which dictionary was the source of a particular answer. If "Showdic" is checked then each entry in the display area will be prefixed with the name of the dictionary in which it was found. Showdic is set true by default.

        5.2.1.5 Help Menu

        Help is for wusses! Use the Source Luke!

      5.2.2 Main Dictionaries

      The normal mode of access for JJDic is to type something into the input text area and type carriage return to look it up. The default lookup mode is to match only when the string in the input window matches from the beginning of the dictionary entry this is useful when searching for Japanese kanji or jukugo (kanji combination words). In this mode a string will match only if it matches from the beginning of the line in the dictionary. This is probably the best mode to use when searching for a Japanese string. The other mode is equivalent to the kanji-within-compounds mode of xjdic. In this mode any match will be displayed no matter where it occurs within the dictionary entry. This is useful if you want to see all uses of a particular character. Some characters occur much more often as the second or third character of a combination than as the first. This mode is also necessary to find entries based on their yomikata (pronunciation) or to use the dictionary as an English/Japanese dictionary. The checkbox following the text input area controls this function. If you look up the label what you sill see is Edict:ړ [Ƃ] /prefix/. If you are using only edict or if you donft care about which glossary a particular entry comes from you may turn off "ShowDic" in the Options menu.

      The input can include kanji, kana or romaji. Katakana and hiragana are not distinguished for standard dictionary searches. Romaji is not case sensitive.

      Cut and paste will work between the system clipboard and JJDic. This makes it easy to look unknown words when you are reading a Japanese document. Just cut or copy the unknown string from the document and paste it into the text entry field on JJDic. The Java runtime system should take care of the necessary translation from the system default encoding to Unicode in the input area. Similarly you can cut information from either the input text field or the output text area in JJDic and paste it into your other applications.

      If you have a Japanese input method you can type Japanese directly into the JJDic input field.

      N.B. Both the cut and paste of Japanese text and use of a Japanese input method depend on the correct translation from the local system encoding into Unicode. This capability is only present in Java 1.1. If you have an older Java 1.0 runtime system you must upgrade to 1.1 to use JJDic.

      5.2.3 Kanji Dictionary

      Kanjidic has a wealth of information on the individual kanji characters. I am rather conservative about how much of this information I actually present. Figure 3 shows an example of the information displayed from kanjidic. At the upper left is the character that we have looked up. beside it to the right are the on and kun yomikata (readings) for the character. Below are the JIS code and Unicode for the character the Bushu (radical) and the total number of strokes. If there are multiple stroke counts then the first is preferred [sic.] and the others are common miscounts. After that are the Nelson index number and Halpern index number from respectively The Modern Readerfs Japanese-English Character Dictionary, second revised edition by Andrew N. Nelson, Tuttle; and the New Japanese-English Character Dictionary, by Jack Halpern, Kenkyusha. Lastly you will see the English meanings for the character. Only the character, the JIS code and Unicode are always present. Some characters lack on yomi or kun yomi or are not listed in one or the other of Nelson or Halpern and some have no English meanings given.

      My apologies if I have left out some field that you think indispensable. I left out the Skip codes because I was not sure of my position on commercialization and those codes have restrictive covenants. Most of the other codes seem occasionally interesting but not of sufficient general recurrent interest to justify their inclusion in this sort of display. On feature that I am considering for a future enhancement is allowing the user to specify exactly which kanjidic fields will be displayed.

        5.2.3.1 Lookup Methods: Character, Yomikata, English

        There are two methods available for searching kanjidic. The first is perhaps the easiest. The user can type something into the input area or hilight something in the input area or output area using the mouse and press the T button. This will search kanjidic for the selection. Selection rules are
        1. Anything highlighted in the input area or output area; or if nothing is highlighted then
        2. anything in the input area.
        The user is then presented with a dialog showing all of the kanji that match her request.

        The selection can include kanji, kana (hiragana or katakana) or romaji. Kanji are looked up directly and there can be as many as the user likes. Unlike the standard dictionary search katakana and hiragana are distinguished in kanjidic searches. Katakana strings are considered to be Kun yomikata and hiragana strings are considered to be On yomikata. The user is presented with a dialog showing all of the kanji that match the query. The dialog has four buttons across the top of the window for On-Yomi Kun-Yomi Kanji and English pressing one of these buttons will display the kanji that match in that category. The user can then select and click on a kanji to display detailed information about a particular character.

        5.2.3.2 Lookup Methods: Bushu plus strokes

        JJDic provides a means to do kanjidic queries by bushu (radical) and stroke counts. Users of Nelsonfs character dictionary or the Canon Wordtank will be familiar with this procedure. This method can also be used to look up a kanji in kanjidic and enter it into the text imput field so that it can then be the key for a search of the other dictionaries.

        The user begins by pressing the button. The kanji input dialog, Figure 4, will be displayed. This dialog is persistent and can be kept open and used multiple times. It will persist until explicitly dismissed by the user. The Go button is used to select the stroke count value shown in the selection window. If the selection is changed the dialog will automatically proceed. Selecting 1 in this case can be done by pressing go or by selecting another value first and then reselecting 1.

        When the number of strokes in the radical part of the character is selected another dialog will be displayed showing all of the radicals with that number of strokes. Figure 5 shows the dialog for 1 stroke radicals. It also illustrates a point. In JIS and less so in Unicode there are radical forms that have no representation in the character set. Furthermore there are many Unicode characters that have no mapping to a JIS character. In these cases I have chosen a kanji that uses the radical part and as few additional strokes as possible to indicate the radical. I then darken the non-radical part of the character to highlight the radical part. Itfs a crude kludge but it gets the job done.

        After the user chooses which radical to use the original Kanji Input Dialog will be updated to ask how many additional strokes are in the final character. The user should count how many strokes in addition to the strokes that form the radical part there are in the final character and choose that number from the choice object in the dialog box. JJDic will then prepare a dialog containing all of the kanji with the specified radical and total strokes equal to the radical part plus the specified number of additional strokes. Figure 6 shows all of the steps involved in kanji input. The next step is to select a kanji from the final dialog box. The user can then either lookup the character in kanjidic by pressing the lookup button, or he can enter the character into the input area by pushing the insert button. After entering a character in the input area she can search for it in the main dictionary.

6.0 Known Problems

Here are a few of the problems that I know about and I am working on but havenft solved yet. They are not fatal errors just less than ideal.

    6.1 Radical & stroke search is heuristic not deterministic

    Looking up characters by radical and stroke count requires more information that is present in kanjidic. The problem is that some radicals have several different forms with different stroke counts but kanjidic only shows the main radical number so it is impossible to distinguish between the different forms. For example and are both formed from radical B162 and both have five strokes but in the first the radical part has three strokes while in the second only two. In order to disambiguate cases like this it would be necessary to add more information to the bushu numbers to indicate which variant was used. Even that is an imperfect solution since the form of a character may be font dependent. Furthermore is room for debate as to how many strokes a particular radical or character contains. In the previous case some authorities might claim that the radical part of has two strokes (Andrew Nelson). Whereas others hold that it is three (Jim Breen). At least in this case you can see the difference. Other cases are not at all so clear. One well known problem is with kusakanmuri . When used as a radical this can be counted as three or four strokes even though many fonts make absolutely no distinction. For example the character has five strokes whereas has seven. This is obviously a natural language. Anyway I do as well as can be done with the information available. Maybe someday if I have lots of free time and nothing better to do I will go through kanjidic and enhance it to reflect these distinctions.
     

7. Things to Do

This is my list of unfinished business. If you would like to add to the list or vote for your most indispensable feature please send me some mail.

    7.1 Internationalization

    One might have thought that this was an internationalized program but you would have been wrong. Proper internationalization would require that I move all strings in the GUI into what Java calls property bundles and load different bundles for different locales. I plan to do this someday because it is a part of the Java language that I want to study in detail. However I donft really think that the little bit of kanji that I use in the GUI should be too difficult to manage for any serious student of Japanese so it's not considered a high priority.

    7.2 Printing

    I imagine that some users might want to print stuff from the display area to a printer. I think that Java has a way to do this and I will be looking into how to manage it.

    7.3 Help

    Help may consist of showing this file in the display area.

    7.4 Filters

    Xjdic has user definable filters. I have only used them to eliminate proper names so the removal of names to another dictionary solves the problem for me but I will still consider adding them if I get enough votes.

    7.5 Client/Server

    This is the big one. I want to split the functions so that the GUI can be downloaded as an applet while the dictionary remains on a server. Java has a way to do this called Remote Method Invocation (RMI sounds a lot like RPC?? consider the source). This seems like a very useful thing to do. I was very careful while writing this application to strictly segregate the functions of the GUI from the dictionary search and management functions. There are three main classes in the application and a lot of subsidiary classes that they generate. The main control class is the one that starts execution jdic.JJDic This class reads the properties file, starts the GUI and loads the dictionaries. It then passes requests between the GUI and the various dictionaries. There is a strict separation between the GUI and the dictionaries all messages between these two are forwarded through the JJDic class. My expectation is that I will be able to download the GUI as an applet and then let it communicate through JJDic to the dictionaries. The other part of this will be to make the server side stuff multi-threaded. I would be very interested in hearing from people who would like to use such a program.

8. Reporting Problems

Please report any problems or comments by email to h.cohen@computer.org .

    8.1 Beta Test Issues

    This first release should be considered a beta test. It is very important to try out the program on a variety of different systems and configurations. I would appreciate very much if everybody who tries out the program would send me an email with their configuration and whatever comments you might have.

    I have developed and tested the system using Windoze 95 Japanese version. I am particularly interested in users of other systems. Solaris, Mac other Unices etc.