With the common goal of more accurately and consistently quantifying ambient concentrations of free metal ions and natural organic ligands in aquatic ecosystems, researchers from 15 laboratories that routinely analyze trace metal speciation participated in an intercomparison of statistical methods used to model their most common type of experimental dataset, the complexometric titration. All were asked to apply statistical techniques that they were familiar with to model synthetic titration data that are typical of those obtained by applying state-of-the-art electrochemical methods – anodic stripping voltammetry (ASV) and competitive ligand equilibration-adsorptive cathodic stripping voltammetry (CLE-ACSV) – to the analysis of natural waters. Herein, we compare their estimates for parameters describing the natural ligands, examine the accuracy of inferred ambient free metal ion concentrations ([Mf]), and evaluate the influence of the various methods and assumptions used on these results. The ASV-type titrations were designed to test each participant's ability to correctly describe the natural ligands present in a sample when provided with data free of measurement error, i.e., random noise. For the three virtual samples containing just one natural ligand, all participants were able to correctly identify the number of ligand classes present and accurately estimate their parameters. For the four samples containing two or three ligand classes, a few participants detected too few or too many classes and consequently reported inaccurate ‘measurements’ of ambient [Mf]. Since the problematic results arose from human error rather than any specific method of analyzing the data, we recommend that analysts should make a practice of using one's parameter estimates to generate simulated (back-calculated) titration curves for comparison to the original data. The root–mean–squared relative error between the fitted observations and the simulated curves should be comparable to the expected precision of the analytical method and upon visual inspection the distribution of residuals should not be skewed. Modeling the synthetic, CLE-ACSV-type titration dataset, which comprises 5 titration curves generated at different analytical windows or levels of competing ligand added to the virtual sample, proved to be more challenging due to the random measurement error that was incorporated. Comparison of the submitted results was complicated by the participants' differing interpretations of their task. Most adopted the provided ‘true’ instrumental sensitivity in modeling the CLE-ACSV curves, but several estimated sensitivities using internal calibration, exactly as is required for actual samples. Since most fitted sensitivities were biased low, systematic error in inferred ambient [Mf] and in estimated weak ligand (L2) concentrations resulted. The main distinction between the mathematical approaches taken by participants lies in the functional form of the speciation model equations, with their implicit definition of independent and dependent or manipulated variables. In ‘direct modeling’, the dependent variable is the measured [Mf] (or Ip) and the total metal concentration ([M]T) is considered independent. In other, much more widely used methods of analyzing titration data – classical linearization, best known as van den Berg/Ružić, and isotherm fitting by nonlinear regression, best known as the Langmuir or Gerringa methods – [Mf] is defined as independent and the dependent variable calculated from both [M]T and [Mf]. Close inspection of the biases and variability in the estimates of ligand parameters and in predictions of ambient [Mf] revealed that the best results were obtained by the direct approach. Linear regression of transformed data yielded the largest bias and greatest variability, while non-linear isotherm fitting generated results with mean bias comparable to direct modeling, but also with greater variability. Participants that performed a unified analysis of ACSV titration curves at multiple detection windows for a sample improved their results regardless of the basic mathematical approach taken. Overall, the three most accurate sets of results were obtained using direct modeling of the unified multiwindow dataset, while the single most accurate set of results also included simultaneous calibration. We therefore recommend that where sample volume and time permit, titration experiments for all natural water samples be designed to include two or more detection windows, especially for coastal and estuarine waters. It is vital that more practical experimental designs for multi-window titrations be developed. Finally, while all mathematical approaches proved to be adequate for some datasets, matrix-based equilibrium models proved to be most naturally suited for the most challenging cases encountered in this work, i.e., experiments where the added ligand in ACSV became titrated. The ProMCC program (Omanović et al., this issue) as well as the Excel Add-in based KINETEQL Multiwindow Solver spreadsheet (Hudson, 2014) have this capability and have been made available for public use as a result of this intercomparison exercise.With the common goal of more accurately and consistently quantifying ambient concentrations of free metal ions and natural organic ligands in aquatic ecosystems, researchers from 15 laboratories that routinely analyze trace metal speciation participated in an intercomparison of statistical methods used to model their most common type of experimental dataset, the complexometric titration. All were asked to apply statistical techniques that they were familiar with to model synthetic titration data that are typical of those obtained by applying state-of-the-art electrochemical methods – anodic stripping voltammetry (ASV) and competitive ligand equilibration-adsorptive cathodic stripping voltammetry (CLE-ACSV) – to the analysis of natural waters. Herein, we compare their estimates for parameters describing the natural ligands, examine the accuracy of inferred ambient free metal ion concentrations ([Mf]), and evaluate the influence of the various methods and assumptions used on these results. The ASV-type titrations were designed to test each participant's ability to correctly describe the natural ligands present in a sample when provided with data free of measurement error, i.e., random noise. For the three virtual samples containing just one natural ligand, all participants were able to correctly identify the number of ligand classes present and accurately estimate their parameters. For the four samples containing two or three ligand classes, a few participants detected too few or too many classes and consequently reported inaccurate ‘measurements’ of ambient [Mf]. Since the problematic results arose from human error rather than any specific method of analyzing the data, we recommend that analysts should make a practice of using one's parameter estimates to generate simulated (back-calculated) titration curves for comparison to the original data. The root–mean–squared relative error between the fitted observations and the simulated curves should be comparable to the expected precision of the analytical method and upon visual inspection the distribution of residuals should not be skewed. Modeling the synthetic, CLE-ACSV-type titration dataset, which comprises 5 titration curves generated at different analytical windows or levels of competing ligand added to the virtual sample, proved to be more challenging due to the random measurement error that was incorporated. Comparison of the submitted results was complicated by the participants' differing interpretations of their task. Most adopted the provided ‘true’ instrumental sensitivity in modeling the CLE-ACSV curves, but several estimated sensitivities using internal calibration, exactly as is required for actual samples. Since most fitted sensitivities were biased low, systematic error in inferred ambient [Mf] and in estimated weak ligand (L2) concentrations resulted. The main distinction between the mathematical approaches taken by participants lies in the functional form of the speciation model equations, with their implicit definition of independent and dependent or manipulated variables. In ‘direct modeling’, the dependent variable is the measured [Mf] (or Ip) and the total metal concentration ([M]T) is considered independent. In other, much more widely used methods of analyzing titration data – classical linearization, best known as van den Berg/Ružić, and isotherm fitting by nonlinear regression, best known as the Langmuir or Gerringa methods – [Mf] is defined as independent and the dependent variable calculated from both [M]T and [Mf]. Close inspection of the biases and variability in the estimates of ligand parameters and in predictions of ambient [Mf] revealed that the best results were obtained by the direct approach. Linear regression of transformed data yielded the largest bias and greatest variability, while non-linear isotherm fitting generated results with mean bias comparable to direct modeling, but also with greater variability. Participants that performed a unified analysis of ACSV titration curves at multiple detection windows for a sample improved their results regardless of the basic mathematical approach taken. Overall, the three most accurate sets of results were obtained using direct modeling of the unified multiwindow dataset, while the single most accurate set of results also included simultaneous calibration. We therefore recommend that where sample volume and time permit, titration experiments for all natural water samples be designed to include two or more detection windows, especially for coastal and estuarine waters. It is vital that more practical experimental designs for multi-window titrations be developed. Finally, while all mathematical approaches proved to be adequate for some datasets, matrix-based equilibrium models proved to be most naturally suited for the most challenging cases encountered in this work, i.e., experiments where the added ligand in ACSV became titrated. The ProMCC program (Omanović et al., this issue) as well as the Excel Add-in based KINETEQL Multiwindow Solver spreadsheet (Hudson, 2014) have this capability and have been made available for public use as a result of this intercomparison exercise.