Bella and Yakov and Tillie's Panties: What I Learned in “Construction and Maintenance of Indexing Languages and Thesauri”

by Jeanette Ezzo

Jeanette Ezzo is the research director of JPS Enterprises, a company in Takoma Park, Maryland, that specializes in medical education materials. She took Dagobert Soergel's LBSC775 course at the University of Maryland, College Park, as a non-degree student when the company received a grant to develop an indexing language for studies on spirituality and health. The graduate course, "Construction and Maintenance of Index Languages and Thesauri," required developing a piece of a thesaurus, pilot testing the indexing language, evaluating thesauri and writing a paper on what was learned in the course. This paper fulfilled that final course requirement.  Jeannette can be reached by email at jeanetteezzo@prodigy.net

My name is Jeanette, and I am a middle-aged adult learner. There is a liberating confession in saying this, much like someone must feel walking into an AA meeting and saying, “I am so-and-so, and I am an alcoholic.” The confession is this: I do not learn in the same way that I learned in my 20s and 30s. In my youth, I, like all the other 20-somethings, got through college on rote memorization and use of mnemonic devices. The 12 cranial nerves were memorized with “On old Olympus' towering top, a Finn and German viewed a hop." The bones of the wrist were memorized with the mnemonic, “Never lower Tillie’s panties; mother might come home.”

But now I am middle aged, and memorization is not my strong suit anymore. If you ask me to name the bones of the wrist, I can tell you that the ‘N’ stands for navicular, and the “L” stands for lunate. If you press me for the “T” and the “P” bones, I can only tell you apologetically that they are “Tillie’s panties.” So, how does an adult learner learn new material if not by rote memorization, and how does this apply to what I learned in LBSC775?

Research shows that adult learners learn by hanging their new knowledge onto their existing life experiences, and this appending is often done using similes and metaphors. I knew the first night of class, when the rest of class bobbed their heads up and down in profound understanding during the lecture, that if I were to survive LBSC775, I would have to rely on similes and metaphors. Even Jesus spoke in parables. So, here are my 12 top lessons learned in LBSC775 necessarily told through the similes of an adult learner.

1. Creating a thesaurus without knowing the discipline well is like trying to navigate in a foreign country without speaking the language – you can quickly drown in a sea of words.

When my friends Bella and Yakov immigrated to the United States from Russia, they told me their greatest culture shock was city driving. Yakov would drive, and Bella would try desperately to read all the signs to him. There were so many signs! Yard signs! Campaign signs! Advertisements on billboards! Street signs! Stop signs! Lost dog signs tacked to telephone poles. Lacking a cultural “filter,” Bella described the anguish of having every word look equally important.

Selective perception is important in driving and in thesaurus development. You have to know your discipline well enough (or recruit experts who do know the discipline well enough) to know what, on your list of hundreds of terms, is not important enough to keep. You also have to know your topic well enough to discuss it at the elemental concept level. When we were developing the spirituality and health thesaurus, Dr. Soergel pressed us: "Can you have nonreligious spirituality?" What an apparently odd concept, yet not from an elemental concept point of view! "Can you have nonspiritual meditation? Nonspiritual yoga practice?"

2. Good scope notes are like Map Quest driving directions – they tell you exactly what to do. Bad scope notes are like getting driving directions from the 16-year old at the service station who smoked up before he came to work – they will leave you going in circles for hours and may never get you where you want to go.

By the end of the first night, I had already had several valuable epiphanies. I never knew that there was a difference between a scope note and a definition, the former being much more of an evolving work in progress and written with an eye toward helping indexers, and the latter being a more succinct point of reference. My only experience with scope notes up to that point had been MeSH, (the National Library of Medicine’s Medical Subject Headings) and the MeSH examples had seemed like definitions. Similarly, when I saw the American Psychological Association (APA) Thesaurus scope notes, I still could not see what the difference was between a scope note and a definition.

The epiphany came when I laid eyes on the Dewey Decimal Classification (www.oclc.org/dewey) later in the course. For me, it was love at first sight. I was enamored of the care that the authors of the scope notes had put into anticipating where indexers might get confused and providing explicit road signs at just those junctures. These are the model for scope notes written with an eye to the indexers and cataloguers.

3. When choosing how to display related terms (RTs), you will need to decide between the thrift store approach and the Nordstrom approach.

If you’ve ever shopped at a thrift store for clothes, you know that most thrift stores seem to take the same equally dreadful approach to clothing displays. Clothes are crammed as tightly as they can be on a rack and often arranged by color instead of size. As we reviewed various thesauri in class and analyzed their lists of related terms (RTs), I couldn’t help but think of those crammed clothing racks. Some thesauri such as that for the Education Resources Information Clearinghouse (ERIC) (www.eric.ed.gov/ or http://searcheric.org/) and the APA Thesaurus are quite generous in providing RTs, but some users will find the resultant 12-15 alphabetized RTs under the descriptor about as functional as clothes arranged according to color.

A more user-friendly approach is the Nordstrom approach – showcase the most important items using special display features such as putting them at the top of the list and bolding them or using an asterisk. Nordstrom teaches an important lesson: People like pre-selected options especially when they are in a hurry.

4. Relying on frequency analysis as the sole way to identify terms is as useful as relying on the evening paper to identify important world events: some topics will be mentioned ad nauseum, others not even once.

I went into this course believing that there was something inherently noble and righteous about frequency analysis. Perhaps the greatest epiphany on that first night was rethinking the relative value of the study participants that we had enrolled in a survey in order to generate search terms. Prior to the first night, I had been thinking that our 60 study participants would be the pivotal, central source of information for indexing terms. That first night, I learned that thesaurus development regularly relies on a variety of other sources –from textbooks to specialized dictionaries to other thesauri – and I began to get a sense of the enormity of the task. A frequency analysis of the study participants’ responses simply would not be adequate.

That night, upon returning home, I looked at the 700+ page hardback Handbook of Religion and Health with new eyes. Right there in the index was a wealth of information on terms and their interrelationships. Nearly an hour passed as I sat lost in the pages of the index of that book.

As a result of that first class, I began to view the study participants and the resultant frequency analysis as icing on the cake with perhaps the greatest contribution being not the breadth of terms needed but the ascertainment of preferred terms. Which terms out of a group of equivalent terms did participants use most frequently to capture a particular concept? This selection would be the likely candidate for the preferred term.

5. Creating a thesaurus is like doing a vaccination program in a developing country. You will always have those trips up the Andes on a donkey.

Having a background in public health, I always go back to lessons learned there. Vaccination programs in developing countries possess a kind of universal truth – you can use 60% of your resources vaccinating 90% of the kids and then use the other 40% of your resources climbing the mountains of the Andes to vaccinate the other 10%. Don't expect that expenditure of energy will be evenly spread over all pieces of the schema. Some will require the trip up the Andes.

Some tree structures in our schema, such as health conditions or mental disorders, could be easily modified from existing hierarchies such as MeSH or the DSM IV (the Diagnostic and Statistical Manual of Mental Disorders), respectively. These pieces were the equivalent of vaccinating 90% of the kids. However, other structures, such as how to categorize all the Protestant denominations, were a real trip up the Andes.

Coincidentally, while I was working on this part of the schema, Dr. Soergel reminded us that we should be keeping track of our time spent on the thesaurus so that we have some idea how to estimate time on future projects. Thus, I was extremely aware while walking out of the Eisenhower Library at 2 a.m. that I had been there for three hours holed away in the religious history section trying to find one meaningful way to classify the plethora of Protestant denominations. I had not found one meaningful classification system, but I had read far more than I ever wanted to know about Martin Luther, John Knox and John Calvin. I probably spent an additional three hours searching the Web for classification systems. Some classifications used a liberal, moderate or conservative delineation. Others classified the denominations according to the time in history when they occurred, for instance, Reformation Era, Pietistic Era. There always seemed to be a couple of subgroups that didn't fit nicely anywhere. The temptation to alphabetize and be done with it once and for all became overwhelming, but then I would think of all those clothes on the thrift store rack arranged by color, and I would talk myself out of alphabetizing.

Wei Ching (my project companion) and I had several conversations about which system to use. I can surely understand why thesaurus developers choose to alphabetize rather than go through the arduous process of assigning meaningful proximity to the descriptors. We finally opted for the liberal-to-conservative classification because that is what was used the most. However, in our indexing pilot study, I found that I didn't know where to index "Baptist" because the liberal-to-conservative gradient had placed northern Baptists in one compartment and southern Baptists in another. More refinement to this part of the schema still needs to happen. We need another trip up the Andes.

6. Creating a thesaurus is like doing horticulture; sometimes only a hybrid will do.

This license to create a hybrid is part of thesaurus development. It made more sense to classify the Protestant denominations as mentioned above. However, when it came to non-Christian religions, we were persuaded to use a modification of the Dewey system, which classifies them according to geography of origin, so the Indic religions are classified together, and the African religions are classified together. The Abrahamic religions are classified together as coming from the Middle East. When one looks at the classification schema, it makes sense. One would probably not suggest that the country of origin most meaningfully classifies all those Protestant subgroups. Yet, the other religions do naturally classify that way, so we had to create a hybrid of two concepts to get the pieces to fit.

7. Creating a thesaurus is like building a swimming pool. It depends on your end-users. If the pool is for toddlers, you don't need a 12-foot depth.

If a swimming pool were being created for Olympic divers, you would opt for depth. If the pool were being built for toddlers, you would opt for shallowness. If the pool were going to serve a variety of populations, you would create sections of appropriate depth and breadth to meet the needs of all users.

            The night Dr. Soergel explained breadth and depth being dependent on the purpose of the thesaurus, I had another epiphany. As he explained it, even the Harvard Business Thesaurus has a shallow section on health because it is needed for breadth, but the depth of that health section needs to be nowhere near the depth experienced in MeSH. Similarly, MeSH has a shallow section on economics that is not nearly to the depth of the Harvard Business Thesaurus but is adequate to represent the overlap of economics and medicine. The depth of respective sections should be determined by their use.

8. Creating a thesaurus is like doing marriage counseling. You have to give equal time and credibility to both sides.

In building a thesaurus, there are some terms that are marked terms. This distinction was another learning point for me. Marked terms show a polarity, like one pole of the magnet or one side of the coin. To be fair to the concept, which is really a concept on a continuum, you must show both anchor points. “Hope” is a marked term. It must be shown adjacent to “Hopelessness.” Emotional states are marked terms. For every happiness, there is a sadness. Personality traits are marked terms. For every introversion, there is an extroversion. For every shyness, there is a gregariousness.

9. Creating a thesaurus without doing semantic factoring is like trying to put together furniture from Ikea without following the instructions. You will get interesting configurations, but you will not save time.

From the start of the course, I felt a sense of urgency with trying to get everything done. How could we possibly have time to do semantic factoring when we were looking at a list of hundreds of terms? After we received our first output file from TermMaster, the thesaurus software, I began sorting and sorting terms. I assigned a code to things I could not sort, and I ended up with 30 pages of "unsortables." At my first meeting with Wei Ching I expressed how stuck I was. With her fine-tipped pencil, she began to draw semantic factors. It was as if a fog was clearing up. I could see the elemental concepts emerge under the tip of her pencil. Things never seemed that hard again. Each time I would get stuck trying to sort, I would think of Wei Ching's philosophy: When in doubt, do semantic factoring. The AOD (Alcohol and Other Drugs) Thesaurus (http://etoh.niaaa.nih.gov/AODVol1/Aodthome.htm) exemplifies a nicely semantically factored thesaurus.

10. Creating a thesaurus is like attending Alcoholics Anonymous. Sometimes you need the support of the group. Other times, you need one buddy you can call on at any time.

At the beginning of the course, Dr. Soergel highly suggested that we work in groups. Having never developed a thesaurus, I could not grasp the gravity of what he was saying. Over the course of thesaurus development, I found having a colleague to talk things over with was the best thing. Moreover, I was grateful for the smallness of our class and class time devoted to discussing our projects. Wei Ching and I would save all our mutual questions to present to the class. I can still remember the day that Mary Catherine suggested using "Abrahamic religions" for Judaism, Christianity and Islam. Then there was the day that Dr. Soergel suggested putting Theosophy, Sufism and Babism close to Islam to infer their connection through proximity, but not to put them as subsets of Islam because that might be disputed by some. Then there was the day that Tamar suggested that rather than my interviewing 10 journalists to get their lists of suggested terms, I might want to peruse the New York Times index and the Washington Post index to see what terms they were using. The class discussions were like self-help sessions – go in with a problem, come out with at least one good solution. It would have been folly to try to do this task alone.

11. Creating a thesaurus is like creating a questionnaire. If you don't disambiguate your terms, you will get "Yes," "No" or "Not often enough," written in the space next to "Sex."

Whenever I fill out a demographic form, and the term sex is used instead of gender, I am tempted to write something like "not enough" or simply "yes." Ambiguity of terms invites mischief. The class discussions with the group developing the Judaica thesaurus were particularly interesting on this subject because they illustrated how important it was to disambiguate descriptors. One could not just use the term Spanish Judaica, for there is Judaica in the Spanish language, which is different from stories that take place in Spain, which are still different from stories about Spanish culture around the world.

Interestingly, when creating the spirituality and health thesaurus, I thought we were disambiguating terms until we did the indexing pilot study. It became obvious to me that Born again could be a pregnancy and birth classification, so we changed the term to Religious born again. We had Conversion, which we realized could also mean conversion reaction, so we changed it to Religious conversion. The lesson here in selecting terms is to find creative ways to disambiguate them. Identical terms should not be used for two different concepts.

12. Indexing articles is like trying to smell yourself. There's such a thing as getting so close to something that you lose your objectivity.

We have a saying among several of us who write professionally for a living. When we are too close to our own work, we pass the manuscript off to an editor who has never seen it before. We send a note along saying, "I can't edit my own work; it's like trying to smell myself."

            In our indexing pilot test, I indexed 150 abstracts in order to gather pilot data for our funders as well as to improve the thesaurus. I remember getting the feeling, after I had done about 20 abstracts, that the process was now going very well.

            However, when I looked at my indexing on the following day, I realized that the more articles I had done in a row, the worse my indexing had gotten. I had gotten too close to it, and my indexing had gotten sloppy. I had begun indexing based on terms used in the article instead of also indexing on the underlying concepts not explicitly stated. For example, in the sloppy stage, I might have indexed a study on prayer under “Prayer” but have been too mentally fatigued to see that I should have also indexed it under “Spiritual coping” since the article was about how terminally ill people use prayer to deal with their illnesses.

A fresh mind is needed to see implicit concepts and not just explicit terms. I learned that I shouldn't index articles for more than an hour at a time without taking a break and doing something else that draws on another part of my brain.

            In short, mnemonics and rote memory aren't the only way to learn. I will never shop at a thrift store again without thinking of it as the most dreadful way to display RTs. I will never swim in a pool again without thinking of the shallow health section in the Harvard Business Thesaurus. Most of all, I will never be appalled that some odorous locker room does not offend the regular users, for I have learned that indexing is like a locker room – if you're in it long enough you just can't smell yourself anymore.

