DATAssimilate Systems Inc., a provider of optical character resolution, imaging, coding and document processing software for legal professionals, unveiled a major upgrade to the company’s PowerSearch application at the International Legal Technology Association’s annual meeting in Washington, D.C., which ran from August 26 to 30. PowerSearch is designed to search and cull content acquired from multiple sources to form a collection.

The new version 5 includes a processing tab to manually include and exclude files by extension and size, as well as the ability to import the most current National Software Reference Library Reference Data Set to deNIST or remove common computer files, such as binary and executable files, from a proposed collection. The NSRL RDS contains metadata on computer files which can be used to uniquely identify the files and their origin using an MD5 hash. The new PowerSearch version also lets you add custom MD5 values to remove known files from a collection.The processing tab for PowerSearch includes a window to deNIST a collection and exclude known files by extension and size.

The new PowerSearch also has more advanced tagging capabilities, but has yet to incorporate any predictive coding. Girts Jansons, CEO of DATAssimilate, says predictive technology will be included in the next major release.

To further remove known, irrelevant files from a collection, PowerSearch can automatically exclude small files, from one to 50 bytes, that may not be sufficient in size to hold content relevant to litigation or investigation. After you make all the file exclusions, PowerSearch can report on excluded files in the index tab, along with files that were not indexed. But you have to remove a check from a box to surface a report of excluded files that is normally hidden.

Test Drive

The program installs into 235 megabytes of disk space but space requirements grew as I collected and indexed content. Along with the PowerSearch program, DATAssimilate also installed onto my Lenovo T520 ThinkPad Laptop (running Windows Windows 7) a copy of Transym TOCR Viewer Pro 3.3 to extract text from images for indexing.

After the installation, I created a new project and selected file system sources for possible collection. When I earmarked directories on the local file system, PowerSearch identified archive files such as .zip and .pst for advanced processing. I skipped the Exchange mail collection and entered my Twitter handle to search and collect Tweets of interest to a query, litigation or investigation.

The selective content acquisition via keyword search of Tweets from my Twitter account LTNSeanDoherty was successful. Collecting messages from Gmail also went well, but I could not search and collect messages. I had to download all the mail and then search, cull and collect it. PowerSearch extracted messages from all the Gmail system folders, e.g., All Mail, Sent Mail, Starred and Trash, without prompting.

When my content was set, I turned to the processing tab and received a dialogue box that contained: “Some plugins were not registered and will not run.” Make a note of any exceptions displayed in this window. Such plugins extend PowerSearch’s capabilities for viewing files and opening them for indexing.

After I viewed the content collected, I clicked on the search and cull tab to view a multifaceted search window with advanced tagging options and basic search functions for a word or phrase, with proximity searching. There are also separate tabs to hone in and search on email, file and extended file metadata. New to version 5 are some advanced tagging features and query analytics.

The far left window pane provides direct access to indexed words. I started typing in the window at the top of the pane and words that matched the characters I typed appeared below the window for me to select and add to my query. With this feature, you can see how the word is used in indexed documents to begin your search. I typed in “secur,” highlighted the word “security,” and added it to my search.

The “security” search returned 185 files out of 1,202 in the collection. I then used analytics and viewed the “security” word in the context of the documents returned to see if I could identify other relevant terms to search that can narrow or perhaps broaden my original inquiry.

After reviewing a number of documents with “security” in context, I identified “compliance” as a term that might work to narrow my search, i.e., search within the result set. On the query analytics screen, I highlighted “compliance,” right-clicked it, and added it to my search. My other options were to analyze the word alone in the document set or search for the word by itself, without “security.” Content analytics is driven by WordNet.

Tagging one or more documents in search results was accomplished with the right-click of the mouse. The search result windows did not reflect recently tagged documents until I refreshed the window by “applying” a tag configuration. When I was done tagging documents to include in my collection, I configured their export from PowerSearch with file details and parent document along with folder detail.

PowerSearch is free to select, extract, index and search documents. The program uses tokens to save, export and OCR documents. PowerSearch comes with 50 free tokens. Other tokens can be purchased via PayPal starting at $.05 per token for 1,000, with bulk discounts available.

Product Information

Manufacturer: DATAssimilate

Product: PowerSearch

Version: 5.0.0 (Beta)

Price: Free software powered by micro-payment tokens. •