Boolean, stemming, natural language: You’ve heard these terms before, but what do they mean and how do they relate to the litigation support product you’re using? There are so many litigation support products available today, and many use product-specific terms to describe the type of search they offer. With the creation of new tools comes new terms to describe them. This glossary can act as a reference guide to which you may add new terms as you encounter them. Here’s a start: Boolean search: This utilizes Boolean logic — mathematic formulas for adding or subtracting. The Boolean operators are AND (the word(s) must be included), OR (one or more of the words may be included),and NOT (the word(s) must not be included). Most of the products with word search support Boolean search. Categorizing: Grouping documents with similar characteristics together. For litigation support, those characteristics are most often extracted from the metadata, e.g. author, recipient, document type. Comparison search: A “more like this” search. Users find, or create, a paragraph expressing what they are looking for. The system compares that paragraph to other documents in the database to find similar information. No queries are used. It is helpful for finding supporting data. Concept search: It looks for the meaning of words in context, then finds paragraphs related to the query term. (A variation of “fuzzy search.”) Field search: See filtered search. Filtered search: Narrowing-down the document set. Only documents containing specific words or dates in the selected database field(s) are returned. Field search, metadata search, and categorization are some of the other names for it. Full text indexing notes the appearance of each word in the text of the documents. The two most common types of full text indexing are word occurrence and semantic profile indexing. Fuzzy search: A search that returns information similar to, but not necessarily matching, the query term. For example, when an exact spelling is not known, or the OCR is inaccurate. Concept and comparison searches are the other types of fuzzy search. Keyword search: The most common form of search used for legal discovery, it is a search within the content of the documents for specific word(s). The opposing counsel agree on specific words to include in the queries and only documents containing those words are produced. Metadata search: See filtered search. Natural language is not generally used in litigation support. It allows the user to type an entire sentence for the query. The system eliminates stop words (see below) and searches for the remaining words in the query. Proximity looks for words occurring close to each other. Some systems allow the searcher to specify the distance between the words. For adjacent words, many systems allow the use of quotation marks around phrases. Query: A word, phrase or group of words, possibly combined with other syntax, such a Boolean operators, used to pass instructions to a search system. Query expansion builds a new query from an old one. Two techniques are used to create the new query: 1. The system adds synonyms of query terms (as found in a thesaurus or discovered when the system produces semantic profiles). 2. It stems the query terms and returns their various inflections. Relevance is the measure of how well the document meets the user’s information needs. Relevance ranking is subjective, and the criteria are different for each product that provides this feature. Semantic profile/signature: A mathematical representation of the content in a paragraph/document. It considers the relationships among words. Similarity search: See comparison search. Stemming removes suffixes to discover the root word, then returns inflections, e.g. query = stemming, results = stem, stems, stemmed, stemming. Stop words are words such as conjunctions, prepositions or articles which are ignored in a query because the words are commonly used and don’t contribute to relevancy, e.g. I, me, and, the, you. Text search: See keyword search. Thesaurus search looks for synonyms. Wildcard is a way for users to indicate their desire to expand the query term to include all inflections. E.g. query = stem* means to include any word beginning with those four letters. Wildcard search results can include completely irrelevant documents if the query root is too common. Word occurrence is an aspect of word search. Systems often use the number of appearances of a word in a document to determine the relevance of that document to the query. Word search is the most common type of search available in litigation support products. It involves searching for the occurrence of specific word(s) in the document set. RESOURCES Search Engine Glossary www.searchenginewatch.com/ facts/glossary.html Glossary for Information Retrieval www.cs.jhu.edu/~weiss/glossary.html Dataflight Software Inc., Los Angeles www.dataflight.com Cricket Technologies, Reston, Va. www.crickettechnologies.com DolphinSearch Inc., Ventura, Calif. www.dolphinsearch.com Electronic Evidence Discovery, Seattle www.eedinc.com Fios Inc., Portland, Ore. www.fiosinc.com Ibis Consulting Inc., Providence, R.I. www.ibisconsulting.com Kroll Ontrack Inc., Eden Prairie, Minn. www.krollontrack.com Steelpoint Technologies Inc., Boston www.steelpoint.com Summation Legal Technologies Inc., San Francisco www.summation.com Syngence L.L.C., Dallas www.syngence.com Lynn Frances is director of technical communication for DolphinSearch Inc., based in Ventura, Calif. E-mail: [email protected]. Web: www.dolphinsearch.com.

