Custody courts routinely rely upon the opinions of custody evaluators. Considering the significant impact of the custody decision on the future of the children who are the subject of these proceedings, nothing is more important than assuring that the evaluators’ opinions are in fact reliable. Not all of them are. Accordingly, evaluating the reliability of the methods that underlie the expert’s opinion is mission critical in these cases. This article will explore a current controversy pertaining to psychological testing that bears upon the reliability issue.

Reliability Is King (or Should Be)!

In 1993, the U.S. Supreme Court in Daubert v. Merrill Dow Pharmaceuticals (113 S.Ct. 2786 [1993]) rejected the unidimensional “general acceptance” approach to assessing the reliability of expert testimony that was established in Frye v. United States (293 F. 1013 [D.C. Cir 1923]). Daubert put forth a more exacting, direct, multi-factored protocol that has since been embraced by some 40 states in one form or another. New York is not among them.

Just a few months after the Daubert decision came down, New York’s Court of Appeals decided People v. Wesley (83 N.Y.2d 417 [1994]), the seminal DNA decision. The court adhered to the Frye standard of assessing reliability by determining whether the principle or method in question has gained general acceptance in the relevant scientific community.

Frye (at 1014), in one of the most frequently quoted passages in evidentiary jurisprudence, articulated the general acceptance approach in the following terms:

Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.

As described by the Court of Appeals in Wesley, the Frye test poses the “elemental question of whether the accepted techniques, when properly performed, generate results accepted as reliable within the scientific community generally.” Wesley, at 423.

Putting aside Frye’s fatal foible that its method is akin to assessing the value of a used car based on what the salesman claims it to be, Frye had the virtue of making clear the vital distinction between “experimental” theories and empirically “demonstrable” knowledge. Only the latter is welcome in our courtrooms. In the custody context this is especially important given that the fate of children hangs in the balance. Courtrooms are not laboratories and children are not research rats scurrying about a psycho-legal maze for our amusement. Thus, an expert opinion based on anything less than empirically demonstrable knowledge should be unwelcome in the courtroom.

Controversy vs. Consensus

In Wesley (at 439), Chief Judge Judith Kaye penned an excellent concurring opinion that brought the general acceptance concept into clearer focus: it is a search for controversy.  She wrote:

The point of noting controversy about the reliability of the forensic technique is not for our Court to determine whether the method was or was not reliable – - – but whether there was consensus in the scientific community as to its reliability. The Frye test emphasizes “counting scientists’ votes, rather than on verifying the soundness of a scientific conclusion.” Where controversy rages, a court may conclude that no consensus has been reached.

The message to the marketplace of expertise is clear: until you get your act together, don’t come knocking at the courthouse door.

Custody Evaluations: A Cauldron of Controversy

Turning now to forensic evaluations in child custody cases, one finds that there is very little that is not controversial, though one would hardly glean that by studying New York case law. Sadly, in the custody realm, there seems to be a distinct dearth of systemic interest in such trifling issues as evidentiary reliability and scientific validity. When one turns to the literature of the psychology discipline, however, controversy abounds. Everything from the unfettered impact of an evaluator’s “accumulated personal bias” masquerading as “accumulated clinical experience” (Wittmann, J.P., “Child Advocacy and the Scientific Model in Family Court: A Theory for Pre-Trial Self-Assessment,” The Journal of Psychiatry and Law, 13[1], [1985], pp. 77-78), to the unwarranted interjection of “diagnostic labels [that] are often more prejudicial than probative”  (Martindale, D.A., “Diagnoses in Child Custody Evaluation Reports,” Matrimonial Strategist, 34(2) [2016]), to the lack of basis for opinions on the consequential issue of what custody outcome is in the child’s best interest (Tippins, T.M. & Wittmann, J.P. “Empirical and Ethical Problems with Custody Recommendations: A Call for Clinical Humility and Judicial Vigilance,” 43 Fam. Ct. Rev. 193 [2005], see also, Guidelines for Child Custody Evaluations in Family Law Proceedings, Section 13, American Psychological Association, [2010]), has come under fire within the forensic community. High on the list of controversies is the use of psychological tests. As one leading text states, “psychological testing remains one of the most important yet controversial topics in psychology.” (Kaplan, R.M., Saccuzzo, D.P., Psychological Testing: Principles, Applications & Issues, 8th Edition, [Wadsworth, 2013] p. 21.)

The professional polemic is often as passionate as it is persistent. For example, Robyn M. Dawes, recipient of the American Psychological Association’s William James Award, offered the following trenchant advice to anyone facing a psychological evaluation:

If a professional psychologist is “evaluating” you in a situation in which you are at risk and asks you for responses to ink blots or to incomplete sentences, or for a drawing of anything, walk out of that psychologist’s office. Going through with such an examination creates the danger of having a serious decision made about you on totally invalid grounds.

(Dawes, R.M., House of Cards: Psychology and Psychotherapy Built on Myth [Free Press, 1994], pp. 152-153.)

While Dawes was speaking of projective instruments that are especially dubious due to their inherent subjectivity, structured tests, such as the Minnesota Multiphasic Personality Inventory, Second Edition, (MMPI-2), are hardly immune to controversy.

In its most recent issue, the Journal of the American Academy of Matrimonial Lawyers (JAAML) published dueling articles that debate whether any psychological testing should be used in custody evaluations. (See, Garber, B.D., Simon, R.A., “Individual Adult Psychometric Testing and Child Custody Evaluations: If the Shoe Doesn’t Fit, Don’t Wear It,” JAAML, Vol. 30, No. 2 [2018]; and Rappaport, S.R., Gould, J.W., Dale, M.D., “Psychological Testing Can Be of Significant Value in Child Custody Evaluations: Don’t Buy the ‘Anti-Testing, Anti-Individual, Pro-Family Systems’ Woozle,” JAAML, Vol. 30, No. 2 [2018]).

In calling for a moratorium on the use of tests, Garber and Simon write: “Child custody evaluations that rely upon test data risk misleading the court, breaching relevant ethical rules, creating unnecessary, time-consuming and expensive legal straw-men, and doing harm to families and to the vulnerable children whose best interests the courts must serve.” (Garber & Simon, at p. 326.)

In response, Rappaport, et al, posit: “Used within a multi-method approach to data gathering, psychological testing often helps evaluators develop hypotheses about the parties’ behavioral tendencies, mental health issues, and psychological functioning as they may affect parenting, parent-to-parent communication, and other custody-related areas of concern.” (Rappaport, et al, at p. 405.)

Both articles make valid points and one would expect others in the forensic community will weigh in as well.  While it is for that community to resolve the controversy, until it achieves consensus the very existence of the dispute is relevant to the question of evidentiary admissibility in a state such as New York that adheres to the Frye test. Accordingly, until such time as the forensic community reaches consensus, custody lawyers may wish to bring Frye applications to exclude evaluations that incorporate test data.

A Point of Concordance

It is worth noting that there is a point of agreement among the authors of the aforesaid competing articles. Garber and Simon make the point that the risks associated with psychological testing is exacerbated when evaluators rely upon computerized interpretations provided by outside vendors. (Garber & Simon, at p. 327.) Rappaport, et al, acknowledge the existence of “substantial professional practice and ethical concerns about relying on computer-generated interpretations.” (Rappaport, et al, at p. 411.)

Not all evaluators personally score and interpret psychological test data as they were trained to do in their doctoral studies. Instead, they farm it out to commercial computerized services that provide a nicely written narrative interpretive report describing various personality traits discerned from the test data. It is not unusual to find that the custody evaluator lifts significant portions of its  text verbatim and plops it into the evaluation report without quotation marks or footnotes to make clear to the reader that the interpretive statements are the product of a commercial computer program.

With or without quotation marks, an evaluator’s reliance on computerized reports presents significant hearsay issues that—when raised before a judge who takes the rules of evidence seriously—threaten admissibility of the evaluator’s conclusions. (Tippins, T.M. & DeLuca, L.K., “The Custody Evaluator Meets Hearsay: A Star-Crossed Romance,” JAAML, Vol. 30, No. 2 [2018].)

The problem here is that the computer-reliant expert is desperately treading water in a sea of ignorance. This becomes clear when one compares this practice with the process of personal interpretation.

When a psychologist personally interprets test data, he or she turns to the various manuals and treatises that provide guidance as to their meaning based upon reported empirical research studies. The evaluator can be cross-examined with respect to the interpretive methodology employed and any deficiencies in that process can be revealed. He or she can be cross-examined about the intellectual integrity of the research studies used as the basis for the conclusions drawn. Because the witness personally interpreted the data he or she can at least explain how the conclusions were reached.

In contrast, when the evaluator relies upon computerized reports, he or she embraces conclusions that were generated by a computer program that is based upon programming algorithms (decision-making rules). These algorithms are closely-guarded proprietary secrets that are unknown to the evaluator.

The evaluator reading the interpretive report does not know which precise scale scores on the test generated each descriptive statement. The evaluator does not know which research studies, if any, provided the basis for each stated conclusion. The evaluator does not know which interpretive statements came about because of specific research findings as opposed to those that were produced by someone’s unproven subjective judgment. As forensic psychologist David A. Martindale explains:

A practitioner utilizing a respected text and generating his/her own interpretive statements, can explain the bases for the statements offered and can, if called upon to do so, cite the specific pages on which pertinent information can be found. Companies that derive income from the sale of computer-generated interpretive reports protect their proprietary rights to the decision rules by which the computer programs operate. As a result, no user of an interpretive report is able to identify the data patterns that have triggered the computer to produce a particular descriptive statement.

(Martindale, D. A. [2004], “Child Custody Evaluation: Legal, Clinical, and Ethical Issues,” Presentation offered to the Department of Mental Health Law and Policy of the Florida Mental Health Institute, Tampa, FL, 10/6, 7.)

Another respected authority in the field of psychological testing in custody evaluations, James R. Flens, has written of the ethical dangers of reliance upon computer-generated interpretations. Noting that section 9.09(b) of the APA’s Ethical Principles and Code of Conduct requires that “[p]sychologists select scoring and interpretation services (including automated services) on the basis of evidence of the validity of the program and procedures as well as on other appropriate considerations,” Flens states:

A problem in the use of interpretive scoring programs provided by testing services is that the ethical criteria of 9.09(b) may be impossible to meet. Presently, the algorithms (i.e., the program logic and decision rules) used to generate the statements in the computer-generated test interpretations (CGTI) are proprietary secrets and not available for review by the evaluator. Therefore, it is not possible for evaluators to know how to answer important questions about how the program generates the statements found in CGTI’s.

(James R. Flens & Leslie Drozd, Psychological Testing In Child Custody Evaluations 17 [Routledge; 2005].)


The controversy concerning the use of psychological testing in custody evaluations is important. Its present status and future course should command the attention and scrutiny of bench and bar. With respect to the use of the computerized interpretations, the concordance of viewpoints among testing proponents and opponents alike should be viewed as persuasive evidence of fatal infirmities that cast their forensic use into the dustbin of evidentiary exclusion.


Timothy M. Tippins is an adjunct professor at Albany Law School and is on the faculty of the American Academy of Forensic Psychology and on the Affiliate Postdoctoral Forensic Faculty at St. John’s University.