Constraining parameters in state-ofthe-art marine pelagic ecosystem models – is it actually feasible with typical observations of standing stocks ?

Introduction Conclusions References


Introduction
The challenges of a warming climate and our apparent inability to significantly reduce emissions of climate-relevant species to the atmosphere, fuel a discussion on geoengineering options.Among the circulating ideas is ocean fertilisation on a large scale.There, the underlying assumption is that the supply of nutrients to the sun-lit, nutrient depleted surface ocean will increase biotic carbon sequestration away from the atmosphere (e.g., Williamson et al., 2012).However, because of the often nonlinear and complex entanglements of relevant processes an evaluation of such approaches is not straight forward.To this end, studies like Yool et al. (2009); Dutreuil et al. (2009); Oschlies et al. (2010), which apply numerical earth system models comprising presumably most of the fundamental feed-back mechanisms, appear to be best suited, to address pressing questions.
On the other hand, projections of earth system models are uncertain because the current generation of pelagic ecosystem models are, in contrast to the physical modules, based on empirical relationships, rather than being derived from first principles.
The associated problem is two-fold.First, there is no proof that the ubiquitous, socalled N-P-Z-D (nutrient-phytoplankton-zooplankton-detritus) pelagic ecosystem models, which are at the core of most biogeochemical modules, are an admissible mathematical description of the real world (e.g., Anderson, 2005).This specifically applies to systems that are exposed to extreme conditions by a warming climate or largescale ocean fertilisation (e.g., Löptien, 2011).Second, biogeochemical models depend on several parameters, mostly describing rates (such as, e.g., the maximum growth rate of phytoplankton), which are, even though they exert crucial control on the model behaviour (e.g., Kriest et al., 2010), per se, not known.The need to address the latter issue is reflected in an increasing number of studies which aim to optimize the free model parameters from observations of standing stocks (e.g.Fan and Lv, 2009;Friedrichs et al., 2006;Rückelt et al., 2010;Schartau, 2003;Spitz et al., 1998;Tjiputra et al., 2007;Hemmings and Challenor, 2012;Matear, 1995;Ward et al., 2010;Figures Back Close Full Yongjin and Friedrichs, 2014).Nevertheless, it is in practice often impossible to determine an optimal parameter set (Ward et al., 2010;Schartau et al., 2001;Rückelt et al., 2010).Several conclusions were drawn from these unsuccessful parameter estimation exercises.For example, Matear (1995) concludes that the optimization problem might be underdetermined while Fasham et al. (1995) and Fennel et al. (2001) assume that the underlying equations do not represent actual processes and conditions.Due to the manifold problems, it is common practice in the 3-dimensional coupled ocean circulation ecosystem modeling community to assume that the model equations are accurate and to determine parameters during trial-and-error exercises which aim to achieve a "reasonable" similarity with observations of standing stocks of prognostic variables.We mimic this approach focusing on the parameter uncertainty only, while we presume that the N-P-Z-D model equations are accurate.
Major problems associated with parameter allocation in 3-dimensional coupled ocean circulation biogeochemical models are caused by the sparseness of observations in space and time (e.g., Lawson et al., 1996).Also, biases and deficiencies in the physical models aggravate the comparison to real data (Friedrichs et al., 2006;Sinha et al., 2010;Dietze and Löptien, 2013).In addition, high computational costs hinder the search for the optimal parameter set because only a limited number of sets can be tested.
In this study we put the assumption to the test, that parameter estimation relying on typical observations is feasible if enough computational resources were available.
We use a simple modeling framework (box model) that is computationally cheap such that many thousands of parameter combinations can be tested.We conduct twin experiments (i.e., we construct our own genuine truth that is consistent with our model equations).By this approach, we can control the sparseness and noise levels of (synthetic) observations used to retrieve parameter sets with optimization techniques.Foregoing similar approaches of Lawson et al. (1996), Schartau et al. (2001) and Spitz et al. (1998) indicate that it should be possible to recover most model parameters, when using various sampling strategies or when the synthetic observations are Figures

Back Close
Full disrupted with white noise.Our study adds to the discussion by using an even more idealized model setup, which makes the interpretation of the results easier and saves computational cost.Further, we use reddish noise to disrupt the genuine truth, as reddish noise is more typical for oceanic conditions than the white noise considered in earlier studies (Hasselmann, 1976).
In the following section, we describe the design of the numerical experiments: Sects.2.1, 2.2 and 2.3 present the model equations, the external forcing and the optimization techniques, respectively.Section 2.4 introduces our quantitative measure of model performance, the so-called "cost function".The underlying observations, the synthetic data sets from the twin experiments, and the real-world set are described in Sect.2.5.Section 2.6 lists all numerical experiments.In Sects. 3 and 4 we present and discuss our results.Section 5 closes with a summary.

Model
We use an N-P-Z-D model similar to those used by Yool et al. (2009), Dutreuil et al. (2009), Oschlies et al. (2010), Oschlies and Garcon (1999), Oschlies (2002), and Franks (2002).Our configuration is simpler in that it comprises one grid box only.This box resembles the surface mixed layer at a given location.Our approach is computationally cheap and enables us to test many thousands of parameter combinations during the search for optimal parameter sets.
Our prognostic variables are, as indicated by the name N-P-Z-D, nitrate (N), phytoplankton (P), zooplankton (Z) and detritus (D).In a nutshell, phytoplankton takes up nitrate during growth fuelled by photosynthetically available radiation (PAR).If phytoplankton lacks nitrate or PAR, or both, its growth is hampered.Phytoplankton is grazed on by zooplankton.Both P and Z contribute to particulate organic matter, here called detritus (D).These contributions represent, e.g., dead plankton and fecal pellets.D is Figures

Back Close
Full remineralized back to nitrate which closes the cycle.An additional, faster loop from P and Z to N represents, e.g., extracellular release and "messy feeding".It might be noteworthy that reliable observations of D and Z are rare.Even so they are typically (as in our case) explicitly represented in order to simulate the effects of sinking organic matter and as a means to mimic the (typically) rapid termination of phytoplankton spring blooms.
Our prognostic equations that determine the temporal evolution of nitrate (N), phytoplankton (P), zooplankton (Z) and detritus (D) are: with the maximum grazing rate, g max , and the quotient of maximum grazing and preycapture-rate , H z := g max / .The second, non-linear term in the zooplankton equation (Eq. 3) is the quadratic mortality m ZD • Z 2 .
The behaviour of the system of partial differential equations ( 1)-( 4) is determined by parameters such as maximum growth rates (e.g., µ max in Eqs. 1 and 2) and the MM parameters (also referred to as half saturation constants) H PAR and H N in Eqs. ( 1) and (2).These parameters are generally only poorly constrained and thus optimized.All respective model parameters are listed in Table 1.
To simplify the search procedure during optimization we define By substituting µ max by µ new , we guide the optimization algorithm to search only among those parameter combinations which yield net phytoplankton growth.Thus, definition (6) reduces the number of required model integrations considerably.In a similar manner we merge the maximum grazing rate (g max in Eq. 5) with the linear zooplankton mortality (m ZN in Eqs. 1 and 3) such that These definitions do not reduce the number of free parameters (because the loss rates m PN , m PD and m ZN are still optimized), but makes the optimizations computationally more efficient.A side aspect is that obvious and strong dependency between µ max and m PD (and between g max and m ZN ) are avoided.
The 10 free model parameters are listed in Table 1.To avoid unrealistic values, we restrict their allowed ranges (Table 1), following, e.g., Schartau (2003).Note, that in our terminology "model parameters" are constants and not to be confused with "prognostic variables" (N, P, Z and D).

Setup, external forcing and initialization
For the sake of simplicity and low computational cost, we consider a system that is homogenous in space, both in horizontal and in vertical direction.As a direct consequence, we do not have to account for advective and diffusive transport processes because their divergences, owed to the spacial homogeneity, are always null.To this end we have a closed system -and a highly idealized one; also because we do not resolve the process of detritus sinking out of the euphotic zone (which is an important process contributing to the "biological pump" of carbon in the ocean).However, this implication does not affect our rather generic conclusions which apply also, probably even more so, to models of a higher complexity (i.e.models that compose more prognostic variables, or are coupled to a more complex physics).We prescribe the photosynthetically available radiation (PAR) averaged over 24 h and the surface mixed layer.The seasonal variation affected by the combination of solar zenith angle, water turbidity (Löptien and Meier, 2011), cloudiness and surface mixed layer (Hordoir and Meier, 2012) depth is idealized by a sinusoid.Our PAR ranges from 2 W m −2 in winter to 50 W m −2 in summer.This range is, as regards the order of magnitude, representative for mixed layer conditions observed in the Baltic Sea (Leppäranta and Myrberg, 2009).In addition to PAR, we prescribe the total amount of fixed nitrogen in the system.Our choice of 3 to 4.5 mmol Nm −3 is typical for the sun-lit surface of the Baltic Sea (e.g.Neumann and Schernewski, 2008, their Fig. 10).The combination of prescribed PAR and the total amount of fixed nitrogen are subsequently referred to as the "forcing data set".We apply two forcing data sets in this study, which we name OPTI and SENSI.OPTI is applied whenever model parameters are optimized.We will find, in some cases, that very differing parameter sets yield almost identical, or at least very similar, solutions when driven by OPTI.In those cases we will test if the apparent similarity among the models is preserved when the external forcing is slightly modified, i.e., when it is changed from OPTI to SENSI.The rational behind the design of OPTI and SENSI is: -As idealized as possible, but yet, similar to real-world conditions in the Baltic Sea.
-OPTI and SENSI should be as similar to one another as possible, but yet, different enough to illustrate that parameter sets that feature similar solutions when driven by OPTI may well differ substantially when driven by SENSI.
Forcing data set OPTI is shown in Fig. 1: the PAR features a steady-going seasonal cycle until year 9, when the summer maximum decreases at a rate of 3.6 W m 2 year −1 .
The nitrate inventory is set constant at 3.3 mmol Nm −3 until year 5, increases at a rate of 0.2 mmol Nm −3 year −1 between year 5 and 11, and set constant at 4.5 mmol Nm −3 thereafter.Both, this reduction of PAR and this increase of nutrients (each by ≈ 30 %), are of a magnitude similar to changes detected during the previous decades in the Baltic Sea (Sanden and Håkansson, 1996;Kratzer et al., 2003;Sanden and Rahm, 1993).This does also apply to SENSI (Fig. 1b) which is, as discussed above, very similar to OPTI.The differences between OPTI and SENSI (Fig. 1e and f) are a slightly lower nitrate inventory with a weaker trend in SENSI, an earlier PAR decreased and a modified seasonal PAR-cycle, as depicted in Fig. 1c.The initial values for phytoplankton, zooplankton, detritus and nitrate in OPTI are set to 0.1, 0.1, 0.1 and 3 mmol Nm −3 , respectively.Note however, that our simple model configuration is not sensitive to the initial distribution of total nitrogen among the prognostic variables.After one year, the model solutions are barely distinguishable from one another (this holds, of course, only if zooplankton and phytoplankton are initialized with values greater than zero).Hence we discard the first year of simulation in all model-data misfits presented here.Thereafter the simulation is determined by the model parameters, the external forcing, and the initial total fixed nitrogen in the system -rather than being affected by the initial distribution of nitrogen among the prognostic variables.Figures

Back Close
Full

Cost function and optimization
This study explores the requirements for an estimation of model parameters based on parameter optimization.A precondition for such an approach is the definition of a numerical measure of the misfit between the model simulations and data (i.e., synthetic or real-world observations).To define such a measure, or metric, is not straight forward and there are various approaches, including rather sophisticated ones based on correlations and Fourier transformations (e.g., Stow et al., 2009).That said, we pragmatically chose the sum of the weighted squared differences between the model and data; for no other reasons than (1) this metric is commonly applied in the field of optimization of pelagic ecosystem models (e.g., Ward et al., 2010;Schartau et al., 2001;Rückelt et al., 2010) and ( 2) systematic analyses of other cost functions are beyond the scope of this study.We define the cost function J as: M denotes the number of prognostic variables, N m the number of data values (i.e., synthetic or real-world observations) of each prognostic variable, a j a data value at time j and âj the corresponding model result.W m determines the weight, that each data-model pair contributes to the overall cost J.It is known that different weighting strategies can yield differing optimization results (e.g., Evans, 2003).The problem is that weighting (or no weighting) adds a subjective element to the optimization process.
As we convert all prognostic variables to the same units (mmol Nm −3 ), we make a pragmatic choice and compute the cost J with equal weights, W m = 1 (in agreement with, e.g., Prunet et al., 1996;Stow et al., 2009) which is usually referred to as root mean squared differences (RMSE).If N 1 = N 2 = N 3 = N 4 , this approach assumes implicitly that all state variables can be measured equally well at all time steps.Note, that we consider some deviations from this assumption by investigating sparse data (which is Figures

Back Close
Full equivalent to testing a different weighting), as described in Sect.2.5.For the sparse data experiments, we calculate the RMSE based on only a part of the synthetic truth, assuming that the measurements are restricted to certain periods.All data a j (synthetic and real-world observations) used in this study are converted to the same units (mmol Nm −3 ) and the noise added to disrupt the synthetic truth is of the same order of magnitude for all prognostic variables.In some cases (Figs. 3, 5 and 7) we will consider the time evolution of the model-data misfit, which is equivalent to omitting the time average in the definition of the cost function: The cost is minimized by an automized numerical optimization, known as simulated annealing (Belisle, 1992), followed by a down-gradient search (Lagarias et al., 1998).
We provide a more detailed description in Appendix A. Note, that the results of Ward et al. (2010) indicate that the choice of the optimization method is not so crucial if enough model integrations can be afforded (which is the case in our computationally cheap setups).

Data -synthetic and real-world observations
We use two sorts of data, synthetic and real-world observations (cf.Table 2).The synthetic "observations" are constructed by subsampling a model simulation (Fig. 2).This gives, in contrast to using real-world observations, full control over sampling frequency and distortion by noise (which can be added to the synthetic samples at will).This "twin experiment" approach is generally used to test optimization techniques in an idealized environment which is free of problems associated with model deficiencies and structural errors.In this study, the approach is used to explore what kind of (synthetic) data is needed to retrieve the genuine parameter set that is underlying a synthetic set of "observation".More specifically, we define, a priori, a genuine truth by integrating Introduction

Conclusions References
Tables Figures

Back Close
Full the model with a typical set of parameters (as defined in the second column of Table 3).By subsampling this model simulation we create a synthetic set of observations.In a second step we try to retrieve the genuine parameter set from the synthetic observations by minimizing the cost calculated with Eq. ( 8).By adding increasing levels of synthetic noise to our genuine truth we approach, step by step, realistic conditions typical for observations which are, generally, distorted by reddish noise stemming from unresolved processes such as, e.g., mesoscale dynamics.A detailed description of the noise process is given below in Sect.2.5.
The second sort of data we use, are real-world observations from the surface of the Baltic Sea at 55.15 • N, 15.59Even so, there are long data gaps, and this is why we merge all data into a climatological seasonal cycle.This yields a relatively homogenous data density throughout the climatological year for nitrate and chlorophyll a measurements, available roughly every three days.We use a constant chlorophyll a to nitrate ratio of 1.59 g Chl a mol N by implicitly assuming a Chl a : C ratio of 1 50 g Chl a g C and a carbon to nitrogen ratio of 6.625 mol C mol N .Here, we apply this constant ratio in order to stay consistent with typical N-P-Z-D modeling approaches (e.g., Chai et al., 1995;Gunson et al., 1999;Löptien et al., 2009;Spitz et al., 1998).

Parameter retrieval experiments
We perform a suite of numerical experiments where we strive to retrieve model parameters.The experiments differ with respect to the underlying data base, i.e. they Introduction

Conclusions References
Tables Figures

Back Close
Full are based on either, (1) synthetic or real-world observations as described above in Sect.2.4, or (2) daily or intermittent sampling, or (3) samples of all prognostic variables or just of a subset, or (4) observations distorted by differing artificial noise levels.In addition, the experiments differ with respect to the number of parameters that we aim to retrieve simultaneously (by optimizing a cost function as explained in Sect.2.3).In order to test whether the optimization algorithm got trapped in a local minimum, each experiment comprises an ensemble of five identical parameter optimizations.Whenever the algorithm has problems to identify an unique global minimum, this approach can result in five differing parameter sets due to a stochastic element in our optimization algorithm and varying initial parameter guesses (Appendix A).Note, that we define the term "global minimum" as the minimum within the parameter ranges given in Table 1 and that an unique global minimum may not exist.
Table 2 lists all experiments performed.Experiment EASY, is based on daily synthetic "observations" of all prognostic variables.Because there is no noise added, this resembles ideal conditions never to be attained in reality.The NOISE experiments are more realistic to this end because there we add reddish noise, which is more typical for ocean processes than the white noise considered in earlier studies (Hasselmann, 1976).This noise mimics processes such as, e.g., mesoscale dynamics which can add considerably to the misfit between model and observations because it is hard (and may even be impossible) to resolve the non-linear effects of eddies on a one-to-one basis.
To construct reddish noise time series, we use an autoregressive model and define an AR(3)-processes (E t , t = 1, . ..n) by: The fourth noise time series, which is added to N, has the same characteristics but is chosen to depend weakly negative (r 2 = 0.25) on the noise of P. Typical observations are not only noisy, i.e. are not only affected by unresolved processes.In addition, observations are generally intermittent because of the enormous financial expenses associated with open-sea measurement campaigns.This intermittency applies to time, space and also sparseness as regards the number of measured variables (which is predominantly the consequence of the differing grades of automation of measurements).In order to mimic the combined effect of sparse observations and noise on parameter estimation efforts, we designed the experiments MISSING-ZD, SPARSE1 and SPARSE2.MISSING-ZD is based on daily samples of nitrate and phytoplankton only (i.e., zooplankton and detritus are not sampled as it was the case in the other experiments), while in SPARSE1 and SPARSE2 it is additionally assumed that nitrate and phytoplankton are only occasionally observed.While MISSING-ZD is tested for various noise levels, SPARSE1 and SPARSE2 are based on synthetic "observations", distorted by a very modest "level 1" noise.
Figure 2 shows the respective times of sampling in the SPARSE experiments: SPARSE1 covers the autumn only and the average number of observations is roughly seven per year.SPARSE2 is based on a larger data set which, in addition to the SPARSE1 autumn data, comprises spring data with an average sampling rate of two observations a day.
The experiments OBS10 and OBS4 are based on real-world observations (Sect.2.4).They differ in terms of the number of simultaneously optimized parameters.OBS10 aims to optimize all parameters listed in Table 1.OBS4 is less ambitious in that it strives to constrain a subset of 4 parameters (µ new , m PD , m DN , g new ) only, while, a priori, setting the other parameters to values flagged with in the last column of Table 3. Introduction

Conclusions References
Tables Figures

Back Close
Full

EASY
True (i.e., underlying the genuine truth simulation) and retrieved (i.e., obtained by parameter optimization based on subsampling results from the genuine truth simulation) parameter values for each experiment are presented in Table 3.All five repetitions of the optimization lead to the same solution that is the parameter set that underlies the genuine truth simulation.We conclude that in the absence of noise, the model parameters can be retrieved by sampling the standing stocks of prognostic variables.This applies even to monthly, instead of daily subsampling (not shown).

NOISE
Noise, even at the very modest "level 1", prevents our optimization procedure from reliably finding an unique global minimum of the cost function.Hence, as we repeat our optimization (which contains a stochastic element, cf.Appendix A.) we keep getting different results with very similar low costs (0.083-0.087 mmol Nm −3 ).Table 3 (fourth column) shows the result of the first optimization together with the range enveloped by all of the five repetitions which compose a parameter retrieval experiment (as described in Sect.2.5).It is straight forward to argue that an optimization that yield repeatedly different results is indicative of either the non-existence of an unique global minimum or of a deficient optimization algorithm that is not up to the task to identify the global minimum.
In our case, however, the situation is more complex: we set out with a genuine truth simulation, subsample it, and add noise to the subsamples.Ideally, a parameter optimization based on these subsamples would retrieve the original parameters that underlie the genuine truth simulation.If it would, the noise added to our synthetic "observations" would induce a cost of 0.086 mmol Nm −3 (as calculated with Eq. 8).Surprisingly, the optimization algorithm finds minima associated with costs lower than that (e.g., as Figures

Back Close
Full  3, last entry of row 3 lower value in brackets).It is hard to overrate the implications of this finding.Obviously, the noise inherent to the synthetic "observations" has opened up a multitude of local minima, some of them smaller than the minimum that is associated with the original parameters that underlie the genuine truth simulation (and which we set out to retrieve).Hence, a new global minimum, distinct from the original parameter set must have emerged or, alternatively, the global minimum is not unique within a certain precision.In any case, the existence of this new minimum infers that the ambiguity of the minimum that is associated with the genuine truth is not caused by a deficiency of our optimization algorithm, but highlights a generic over-fitting problem associated with noisy observations.
The problem associated with retrieving the genuine truth parameter set is especially pronounced for the parameters determining the phytoplankton growth µ, H PAR , H N as well as m PN and for the assimilation efficiency of herbivores H z (Table 3).These parameters show substantial differences between the repeated parameter retrieval experiments.E.g., the mean percentaged differences for the parameters influencing phytoplankton growth (H PAR , H N , µ new ) are in the range of 20-60 % while, at the same time, the corresponding costs are on an equally low level.
The question arises whether it makes any difference to use the parameter set which gives a better fit to the data instead of the one underlying the genuine truth.After all, parameters that are hard to constrain may be non-influential.To explore this question we change the external forcing from OPTI to SENSI.
When applying the forcing OPTI, the first ensemble member of the retrieval experiment NOISE based on noise at "level 1" is very similar to the genuine truth and the cost remains on an equally low level throughout the whole simulation period.Nevertheless, the model behaviour deviates considerably from the synthetic truth when the forcing is changed to SENSI.For example, in the final year of the simulation the cost increases by a factor of 10 compared to the average cost during OPTI (Fig. 3), (even so the total nitrate inventory is lower in SENSI).This is of concern because, clearly, the models sensitivities to changes in the external conditions differ considerably.In the following, we explore the impact of noise more extensively by distorting the synthetic "observations" with noise at various levels.Note that even the highest noise level considered here is still within the range of what is inherent to real-world observations.All parameter estimates show an increase of the parameter misfit with increasing noise level (Fig. 4, gray shaded areas) and, in particular, the spread among the repeated parameter retrieval experiments increases -while at all noise levels respective ensemble members feature similar costs (Table 4).The spread among the experiments is largest for the MM parameters and at noise level 4 their estimates are scattered all over the permitted range.Thus, very different MM parameters can lead to very similar model simulations.
This finding is confirmed by an exaggerated example in Fig. 5, which illustrates the difficulties in estimating the MM parameters on the one hand and their influence on the models sensitivity on the other hand.We compare the genuine truth to a simulation where H PAR , and H N are strongly increased -much more than usually permitted (H N is increased by a factor of 4, H PAR by a factor of 3).In a second step, µ new (µ new = 3.6) was chosen to match the genuine truth as close as possible under forcing conditions OPTI.Despite the extreme changes of the MM parameters, the simulation is relatively similar to the genuine truth (Fig. 5; black line).This, however, does not apply when switching to forcing data set SENSI (red line in Fig. 5) which leads to very different model behavior.We conclude, that the MM parameters are particularly hard to constrain and that their estimate depends strongly on the forcing data set used during the optimization process while, at the same time, they are key to the models sensitivity.

The problem with NOISE
The major difficulty in estimating the MM-parameters is, apparently, their strong dependency on the maximum growth rate of phytoplankton (and on one another).Hence, an increased H PAR or H N can be compensated to a large extent by choosing a larger appropriate value µ new .Figure 6a  The curves are normalized such that all curves cross at a nutrient concentration of 4 mmol Nm −3 (which corresponds to a normalization of α = 4 4+0.5 / 4 4+H N ).This illustrates that, by normalization, all curves can be (roughly) squeezed into an ±0.1 envelope around the H N = 0.5 curve (gray shaded area).Such a compensating normalization of the actual phytoplankton growth (µ max PAR PAR+H PAR ) for distinct MM parameters can be easily achieved by changing the maximum growth of phytoplankton (µ adapt = µ max •α) accordingly.In our setup, we find for every choice of H N a µ adapt such that the overall phytoplankton growth is changed by typically less than 10 % relative to H N = 0.5.
The extent to which a 10 % change of the overall phytoplankton growth effects a deviation from the genuine truth is, naturally, dependent on the choice (or retrieval) of the other parameters.By performing Monte Carlo simulations (as described in Appendix B) we derive a measure that is representative for the whole range of parameters (as listed in Table 1): a change of ±10 % of the actual phytoplankton growth results in a mean change of the cost function of less than 8 %.Reverse reasoning implies that, on average, a precondition for detecting a change of ±10 % in the actual phytoplankton growth is a cost function which can be determined with a precision higher than ±8 %.This is, however, unrealistic given the typical noise levels inherent to observations.
In summary, we conclude that in the presence of even only modest noise different settings of the MM parameters (within the permitted range, Table 1) can not be distinguished from one another -if the maximum growth rate of phytoplankton is changed accordingly.Note, that the level of potential compensation between MM parameters and maximum phytoplankton growth depends on the forcing and gets more effective as the range of nitrate variations decreases (Fig. 6b).This implies, in turn, that the smaller the nitrate variations in the forcing data set used for parameter estimations, the larger the difficulties to retrieve the MM parameters.Introduction

Conclusions References
Tables Figures

Back Close
Full

MISSING-ZD, SPARSE1, SPARSE2
Per se sparse data are, surprisingly, not problematic.As mentioned in Sect.3.1, a variation of experiment EASY with monthly samples of standing stocks succeeded to retrieve the genuine truth parameter set by optimization.Under real-world conditions, however, sparse and irregular sampling is typically accompanied by noise and some prognostic variables may even be not observed at all (e.g., detritus, zooplankton).
The effect of unavailable observations of Z and D, in combination with noise added to P and D, is illustrated by the red shaded areas in Fig. 4. (The experimental setup corresponding to these illustrations is identical to the setup used to plot the gray shaded areas, except that zooplankton and detritus are not incorporated in the cost (experiment MISSING-ZD)).In MISSING-ZD, the estimates of most parameters worsen and this holds especially for the retrievals of the parameters included in the zooplankton growth equation (Eq.3).Interestingly, this does not hold for the MM parameters (and µ new , which is not independent as described above) where the inclusion of D and Z observations into the cost function is of no benefit for the parameter retrieval -on the contrary, it appears as if the inclusion of Z and D "dilutes" the relevant information in the cost function and the detailed information about Z and D adds unnecessary, and potentially misleading, information for the estimation of the MM parameters.
Another real-world problem is sparse data in time such as, e.g., a seasonal bias in the number of available observations.Often, the data coverage is characterized by strong seasonal differences.The experiments SPARSE1 and SPARSE2 are designed to explore this.In both cases "level 1" noise is added.SPARSE1 is based on subsamples originating mainly from autumn.SPARSE2 uses in addition observations from late winter and spring (cf. Figure 2).
A technical detail is that our optimisation algorithms seem to be more prone to converge towards a local minimum associated with relatively high cost when sparse observations are used.As a consequence we had to discard one estimate during experiment SPARSE1 and two estimates during experiment SPARSE2 (where the cost Introduction

Conclusions References
Tables Figures

Back Close
Full exceeded 0.2).Apart from this technical issue we find that SPARSE1 leads to surprisingly good parameter estimates, i.e, only slightly worse than the estimates obtained in MISSING-ZD, pertubed by noise at the same level, even though the data coverage is sparse.SPARSE2, even though based on more data and generally associated with lower costs, shows no overall improvement.To the contrary, the estimated parameters related to zooplankton grazing are further away from the genuine truth parameters than in SPARSE1 (Table 3).In particular, the estimates of g new , m ZN and m ZD worsen considerably (while the ensemble mean estimate of m PN and µ new improve).Some deviations are substantial, e.g., some retrievals of g new exceed the genuine truth parameters by more than 2.5-fold and some retrievals of m ZD exceed the genuine truth parameters by more than 10-fold.These increases are not independent from one another.The parameter estimates of µ new and g max are, e.g., correlated at 0.5 in the repeated experiments (which is statistically significant at the 0.05 %-level).Correlations between µ new and m DN are at 0.96 while the correlations between g max and m ZD (m ZN ) are with 0.98 (0.99) even higher.
This illustrates that many parameters are not independent from one another -in addition to obvious pairs such as growth and loss terms, and the ones described in Sect."The problem with NOISE".These additional (compensatory) dependencies are the consequence of constraining parameters with standing stocks, as is common practice.Standing stocks alone do not contain information on residence times of the "base currency" nitrogen in the prognostic components.What they do contain, is (only) information on the difference of in-and outgoing fluxes.Hence standing stocks are not necessarily affected by accelerated or decelerated nutrients cycles.This causes dependencies among, e.g., µ new and m DN , as described above.Figure 7 shows an example where an increase of m DN can, in large part, be compensated as regards its effect on the cost by increasing µ new and m PD accordingly (even though the model structure is non-linear).Increasing all three parameters results in a strongly increased NPD-loop (up to 200 % as indicated by the red numbers) and, at the same time, to a simulation Introduction

Conclusions References
Tables Figures

Back Close
Full of all prognostic variables that is very similar to the genuine truth.Another example is the optimization study (based on real-world observations) of Oschlies and Schartau (2005) who found a comparable model-data misfits among two model versions which featured a factor 2.5 difference in primary production.In line, Friedrichs et al. (2007) report "different element flow pathways" for similar model-data misfits.

The problem with SPARSE
The comparison of the parameter retrieval experiments SPARSE1 and SPARSE2 illustrates that the timing of the observations is relevant because not all times of the year contain information for the estimation of all parameters: SPARSE2 contains predominantly spring information which improves the estimates of some phytoplankton growth parameters.For all other parameters, however, the strong seasonal bias in SPARSE2, i.e., the underrepresentation of the second half of the year in the cost, seriously hampers their successful retrieval.Figure 8 illustrates this in detail.Apparently, the periods that contain information about the limiting effects of light and nutrients, i.e. about the MM-parameters, are very short.The "information containing" period for H PAR (dark yellow patch in Fig. 8) starts in spring when the surface mixed layer is shallowed below the "critical depth" (Sverdrup, 1953).It ends as nutrient limitation kicks in (which at the same time denotes the start of the "information containing" period for H N ).Note that the changes in PAR incorporated by the dark yellow patch are only a fraction of the full amplitude of the season cycle.The "information containing" period for H N (dark brown patch in Fig. 8) ends when the dominant control is exerted by top-down controlling Z.Likewise, the information content about the other parameters is not equally distributed throughout the year.For example, the information about the zooplankton growth parameters is most pronounced in summer when high abundances prevail.Obviously, loss rates are hard to assess during times when the respective prognostic variables are close to the limit of detection.

OBS10, OBS4
Real-world observations are typically sparse as regards time, space and sampled variables.In addition they are noisy.Hence, by using real-world observations we face a combination of the problems described in the Sects.3.2 to 3.3.We use quality checked data with a rather unusual high data coverage -but still data gaps exist and the noise level is considerable (Sect.2.4).The typical noise level inherent to our real-world observations, approximated by the SD of observed P from July to December amounts to 0.4 mmol Nm −3 which corresponds to a "level 3-4 noise" as defined in Sect.2.5.Since we use real-world observations now (and no genuine truth), the "true" parameter set is unknown and we compare different parameter estimation strategies with one another.From the lessons learned above, we expect a similar model performance when estimating all free parameters simultaneously (OBS10) and when estimating only the subset µ, m PD , m DN and g new (OBS4) because: the MM parameters and H z can not be constrained in the presence of noise (Sect.3.2), i.e., one choice is as good as another given that the remaining parameters are adjusted accordingly.
the effects of changes in µ max can be compensated by a similar change of m PN .Hence, given the noise level and probable model deficiency, they can not be constrained simultaneously, the observational data do not contain information about Z and D and we thus do expect that prescribing the loss rates of Z (m ZD and m ZN ) will not worsen the model-data misfit considerably.
Given the lower degree of freedom in OBS4, we expect in addition that the global minimum will be easier to detect.We find that indeed all speculations hold: when optimizing 10 parameters, repeated cost, while for OBS4 repeated optimization leads to strikingly similar parameter estimates.Figure 9 shows the simulations based on the "best" parameter estimates of OBS10 and OBS4, corresponding to costs of 0.764 and 0.783 mmol Nm −3 , respectively.Because the difference between the two simulations is much smaller than the misfit of each of the simulation to the (noisy) observations, we conclude that it is impossible to judge which parameter set performs better.

Discussion
All attempts to constrain all the parameters of state-of-the-art pelagic ecosystem models simultaneously with in situ data were to no avail so far (and as far as we know, e.g., Ward et al., 2010;Schartau, 2003;Matear, 1995;Spitz et al., 1998;Rückelt et al., 2010).We can, admittedly, not rule out that the definition of the model-data misfit (cost) is at the core of these problems and this has to be investigated in studies to come.Nevertheless, satisfactory results for cost functions other that the one applied in this study appear unlikely because the above mentioned studies comprise a wide range of cost functions already and, even so, report similar problems.Among the conclusions generally drawn from these unsuccessful parameter estimation exercises are: the optimization problem is underdetermined (Matear, 1995), or even ill-posed because the underlying equations do not represent actual processes and conditions (Fasham et al., 1995;Fennel et al., 2001).This study adds to the discussion in that, by design, we can disentangle some of the potential problems: by using "twin experiments" (or, in other words, a subsampled synthetic "truth" rather than real-world observations), we can rule out the effects of a potentially deficient model formulation.Hence, we know that our problem is not illposed as regards the underlying equations, non-resolved processes or uncertainties in the external forcing and boundary conditions other than we willfully introduced.(Note that, even so, no unique solution may exist.)Further, the twin experiment approach gives us full control over the "observations", i.e., we can adjust the sampling (with re-Figures

Back Close
Full spect to both, variables and time) and the noise inherent to our "observations" at will.The advantages of this approach have been appreciated in previous studies already (e.g., Friedrichs, 2001;Gunson et al., 1999;Lawson et al., 1996;Schartau et al., 2001;Spitz et al., 1998).One major difference here, however, is the usage of reddish noise.Previous studies focused on data sampling, neglecting noise at all or used white noise to mimic errors in the observational data.The difference in the noise structure is essential, probably because red noise impinges behavior on time scales (days, seasonal to interannual) relevant for biogeochemistry rather than on the high frequencies (which we do not set out to model in the first place).This disrupts parameter estimation way more than white noise which is in-line with the findings of Friedrichs ( 2001) who rates systematical biases much more detrimental than the presence of white noise.Note that the definition of noise is ambiguous.Generally, one would define noise as a random, unbiased modification of the genuine truth, effected by, e.g., measurement accuracy.In parameter optimization based on real-world observations, however, the definition of noise is often broadened and refers to noise with the combined effects of all unresolved processes that can cause deviations between simulated and observed values.These unresolved processes comprise, e.g., the effects of uncertainties in the external forcing or boundary conditions that drive the model as well as unresolved mesoscale processes in the ocean.
Amplitude and structure that comes along with this broadened definition of noise is hard to assess and we are not aware of any studies giving guidance on this.For the time being we assume reddish noise which is typical for ocean processes (Hasselmann, 1976).As regards typical noise amplitudes we find that the median of the relative standard error of all surface nitrate in the global monthly climatology of Garcia et al. ( 2010) is 20 %.
Returning to our experiments, we find that even a fraction of these typical noise levels does already prevent any meaningful parameter retrieval, as illustrated in Sect."spurious" minima are often related to very different parameter values than the genuine truth and might either be distortions of the minimum related to the genuine truth or open up in addition.Note that such difficulties were not reported by earlier studies using twin experiments disrupted by white noise.We conclude that the structure of the noise is relevant.
The parameter set associated to a spurious minimum may well imprint differing sensitivities when the external forcing is changed (Sect.3.2, Fig. 3) and the overall behaviour of such a model reminds of extrapolation with an overfitted polynomial.In agreement with the early supposition of Matear (1995), who used real data and calculated the error-covariance matrix (via inversion of the Hessian matrix), we can relate major problems back to parameter dependencies.Here we broaden the common definition of "parameter dependencies" and refer to changes in a certain parameter that can, in large parts, be compensated by changing other parameters accordingly.Such dependencies are reflected by high correlations of some parameter estimates in repeated runs.Note, however, that the level of compensation depends on the forcing conditions.
To our knowledge, however, there is no technique to examine all dependencies in a formal and useful way for the task at hand.Typical approaches such as correlation analysis or those based on analyzing the Hessian matrix (as e.g.Fennel et al., 2001;Kidston et al., 2011), have, in our context, their limitations because the dependencies are manifold and include up to 4-way-parameter-interactions.We thus have to rely on the model structure and illustrative examples to illustrate dependencies.Beside the obvious potential compensations of growth and loss terms, such dependencies occur, e.g., in the growth term of phytoplankton (and similarly in the growth term of zooplankton).Other strong dependencies are a consequence of optimizing the model parameters with standing stocks.Since standing stocks are not necessarily affected by accelerated or decelerated nutrients cycles, all loops in the model structure contain strong parameter dependencies (Fig. 7).
A large part of the problems, associated with the parameter retrievals in the presence of noise, can be traced back to the Michaelis-Menten formulations which determine Introduction

Conclusions References
Tables Figures

Back Close
Full most of the model's sensitivity to the external forcing (Sect.3.2, Fig. 5).This finding is in agreement with Friedrichs et al. (2006), who point out that the half-saturation constant for nutrients appears highly correlated to the maximum phytoplankton growth rate.Surprisingly we find that even an extraordinarily well sampled, full seasonal cycle of prognostic variables does not contain enough information to constrain the sensitivity of a N-P-Z-D model such that it can unambiguously project system dynamics in, e.g., a warming world.In a nut shell, the reason is that periods, predominantly controlled by actual nutrients and/or light deplete conditions, are short (Fig. 8).Generally these periods are not ended by replete conditions but by other dominating processes such as, e.g., top-down control of zooplankton.This means that information of only a limited range of conditions (corresponding to the short period) is actually comprised by the seasonal cycle, and this limited range is not enough to constrain the systems' behaviour e.g.under anticipated climate change.An additional (somewhat related) problem is that a realistic (i.e., within the bounds spread by typical observational errors) simulation of standing stocks does not ensure a correct cycling speed among prognostic variables (Fig. 7).Hence, more sophisticated observations such as rates or fractionating isotopes are mandatory for constraining fluxes or transfer rates among the prognostic variables.

Summary and conclusions
To-date, parameters associated with biogeochemical pelagic models, coupled to 3-D ocean circulation models, are, more often than not, assigned by rather subjective trial-and-error exercises.It seems straight forward to assume that automatic numerical optimization procedures which minimize some quantitative measure of the deviation between observations and their simulated equivalents are more objective and thus represent a reliable procedure for parameter allocation.In the past, such an approach was barred by excessive computational demands.Recent advances in so-called offline approaches (Khatiwala, 2007(Khatiwala, , 2008;;Kriest et al., 2012) and optimization procedures Introduction

Conclusions References
Tables Figures

Back Close
Full ( Prießet al., 2013) have severely reduced the demands and pushed the tasks within reach.However, per se, it is unclear how far such approaches will carry, even if the underlying model equations were exact.Our study provides some insight by applying optimization procedures which minimize a generic cost function (root mean squared errors) based on a synthetic set of "observations", which was produced by sampling this simulation at specific times.Differing levels of artificial noise were added to the synthetic "observations" to mimic typical real-world conditions.By mimicking sampling strategies and noise inherent to the observations we systematically explored what kind of observations are required to retrieve the parameters of the "genuine truth" simulation by parameter optimization.This "twin experiment" approach (as it is often referred to) gives guidance on the question what kind of information, or model parameters, can be extracted from observations.The caveat here is that we implicitly assume that the underlying mathematical equations are exact -certainly an overoptimistic assumption since the equations are not derived from first principles (cf., Smith et al., 2009).
Our exercises suggest that monthly observations of all prognostic variables are sufficient to retrieve the actual correct model parameters.However, this does not hold if the observations are defiled by noise.Even modest noise levels (≈ 10 %) can already lead to minima in the cost functions which are associated to a lower cost than the genuine truth simulation.The implication is that, in the presence of noise, the optimal parameter set in terms of cost is not necessarily the correct one.This is of concern because we find that, although the optimal solution and the genuine truth are similar under the given external forcing, they can feature very diverging behavior, once the forcing is adjusted within the envelope of, e.g., anticipated changes in the Baltic Sea.Most of this behavior is apparently caused by the hyperbolic Michaelis-Menten (MM) formulation that is, although the formulation is controversial (Smith et al., 2009), commonly used to describe nutrient and light limitation of phytoplankton growth (Fasham et al., 1993;Yool et al., 2009;Dutreuil et al., 2009;Oschlies et al., 2010).Introduction

Conclusions References
Tables Figures

Back Close
Full Other than noise, we find that typical characteristics of real-world observations such as irregular sampling in time or the absence of observations of simulated prognostic variables such as, e.g., zooplankton and detritus, do also seriously hamper attempts to retrieve all model parameters simultaneously.An exception to the latter rule are the MM parameters which are apparently easier to constrain when zooplankton and detritus observations are not part of the cost function.Hence, more data is not necessarily better data.The inclusion of more data into the cost function might even degrade the ability to constrain certain parameters.
As regards real-world conditions (where only noisy and irregularly sampled data of some prognostic variables are available to assemble some quantitative measure of model performance) our findings suggest a sequential, two-step approach, starting with an estimation which focuses on the MM constants and to take care that the measure of model performance contains relevant information only.This applies both to sampled variables and to sampling time interval and should prevent unnecessary "dilution" of information which is vital because the parameter estimation is so sensitive to noise.For example, it does not make sense to constrain MM constants which determine growth limitation with data characterized by a period of plenty or with observations of D and Z. Further, the larger the range of sampled conditions the higher the probability to retrieve meaningful model parameters.A cross-validation with a second independent data set as a last step can increase the confidence in the model further (Gregg et al., 2009) .
Our experiments with real-world-observations imply that the other parameters may be estimated in a second step during which the MM parameters are held constant to a priori values.This approach is seemingly in-line with earlier studies which suggest to estimate only a sub-set of the model parameters rather than all in one (e.g., Friedrichs et al., 2006;Kidston et al., 2011).One downside of such an approach is that in this case generally unknown parameters have to be given assumed values before optimization, while prescribing incorrect values for a sub-set of the unknown model parameters prevents the correct estimate of all dependent parameters (which is particularly problematic for the MM parameters).This approach might thus result in a satisfactory fit Introduction

Conclusions References
Tables Figures

Back Close
Full to the observations while it does not guarantee a correct model behaviour once the external forcing conditions of the model are changed.Nevertheless, such an approach can be reasonable, e.g., when comparing different models for hypothesis testing.

Appendix A: Optimization algorithm
Simulated annealing is a "stochastic global optimization method".This meta-heuristic method, adopted from annealing in metallurgy, is developed to find the global optimum in a large parameter space.Because of its design it is less prone to end up in a local optimum as are, e.g., gradient search algorithms.The method is described in detail by, e.g., Kirkpatrick et al. (1983) and was successfully applied to optimize biogeochemical models by, e.g., Matear (1995).We use an initial "temperature" of 100 for each dimension which is reduced iteratively by multiplication with 0.95.After 200 iterations the temperature is raised to a higher value (= reannealing), anticipating to restart the search and leave a local minimum.We use three parameter sets ((1): µ new = 0.65, H PAR = 20, H n = 0.6, g new = 0.2, H z = 0.7, m = 0.08, m ZN = 0.15, m DN = 0.08, m ZD = 0.17, m PD = 0.15, (2): µ new = 0.9, but is usually approximated by rank-one updates specified by gradient evaluations (or approximate gradient evaluations).

Appendix B: Monte Carlo simulations
In order to assess the effect of a ± 10 % change in phytoplankton growth on the cost (see Sect. "The problem with NOISE") we perform a set of Monte Carlo simulations.
This approach is necessitated by the fact that the sensitivity towards changes of phytoplankton growth is determined by the combination of all parameters.We chose randomly 600 parameter sets within the bounds of Table 1 and use forcing data set OPTI.
In a second step these parameter sets are modified such that they feature a ± 10 % change in the phytoplankton growth.Finally, we integrate the 3 × 600 configurations.
The difference in cost of the +10 and −10 % configuration, relative to the unperturbed, averaged over all of the 600 triples yields the sensitivity we set out for.Introduction

Conclusions References
Tables Figures

Conclusions References
Tables Figures

Back Close
Full  Res., 81, 34-43, 2010. 229, 230, 236, 237 Full    Full  Full  Full The y axis limits match the associated parameter range explored (Table 1).The straight black dashed line refers to the original parameter values (that underlie the genuine truth).The black lines refers to the ensemble mean retrieved by optimization.The gray shaded areas depict the ranges enveloped by all of the five ensemble members that compose a parameter retrieval experiment at each noise level (NOISE).In all cases shown here the external forcing is OPTI.
The red lines and red shaded areas are similar to the solid black line and gray shaded area , except that they refer to MISSING-ZD (as described in Sect.2.5).Introduction

Conclusions References
Tables Figures

Back Close
Full Temporal evolution of the model-data misfit (Eq.9).The model-data misfit measures the difference between the genuine truth and a simulation where the "truth" parameter set (Table 3) is modified by multiplying H N , H PAR and µ by 4, 3 and 3.6, respectively.The black (red) line refers to simulations driven by OPTI (SENSI).Introduction

Conclusions References
Tables Figures

Back Close
Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 4) All prognostic variables are scaled to units mmol Nm −3 .g I = PAR PAR+H PAR and g N = N N+H N are the hyperbolic MM equations describing the limiting effect of light (here PAR refers to photosynthetically available radiation averaged over 24 h and the surface mixed layer) and nitrate on nitrate uptake (Eq. 1) by phytoplankton (Eq.2), respectively.The terms m PN • P, m ZN • Z and m DN • D in Eq. (1) represent linear mortality of phytoplankton, zooplankton and remineralisation of detritus, respectively.The zooplankton equation (Eq. 3) comprises two nonlinear terms: first, the "Holling III-type" term G(P) = g max Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 3 Results Discussion Paper | Discussion Paper | Discussion Paper | low as 0.083 mmol Nm −3 , Table Discussion Paper | Discussion Paper | Discussion Paper | illustrates this compensation.It shows various MM curves describing nutrient limitation for various half-saturation constants, H N .Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | parameter retrievals lead to different parameter estimates associated with almost equal Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 3.2.The reason is that noise at this level can open up "spurious" minima which are associated with a cost lower than the cost associated with the genuine truth simulation.Such Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | PAR = 5, H n = 0.35, g new = 0.8, H z = 0.6, m = 0.02, m ZN = 0.03, m DN = 0.12, m ZD = 0.08, m PD = 0.3 and (3): µ new = 0.75, H PAR = 35, H n = 0.1, g new = 0.2, H z = 0.9, m = 0.1, m ZN = 0.2, m DN = 0.03, m ZD = 0.3, m PD = 0.2) randomly as initial guesses.Note, that these initial guesses are randomized by the non-deterministic algorithm simulated annealing.After 10 000 iterations we switch, consecutively, to the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method in order to refine the parameter set retrieved by simulated annealing.BFGS is an approximation of Newton's method and belongs to the class of hill-climbing optimization techniques that seeks a stationary point of a (preferably twice continuously differentiable) function.A necessary condition for a minimum is a gradient of zero.The algorithm does not necessarily converge unless the function has a quadratic Taylor expansion near the optimum.It uses the first and second derivatives.Note, that the Hessian matrix of second derivatives does not need to be evaluated Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Rückelt, J., Sauerland, V., Slawig, T., Srivastav, B., Ward, C., and Patvardhan, C.: Parameter optimization and validation of a marine biogeochemical model using a hybrid algorithm, Discussion Paper | Discussion Paper | Discussion Paper | Ward, B. A., Friedrichs, M. A. M., Anderson, T. A., and Oschlies, A.: Parameter optimisation techniques and the problem of underdetermination in marine biogeochemical models, J. Mar.

Figure 2 .Figure 3 .Figure 4 .
Figure 1.(a-b) Forcing OPTI: (a) shows the photosynthetically available radiation (PAR) averaged over 24 h and the surface mixed layer and (b) the total fixed nitrogen in the system.(c-d) like (a-b) but for SENSI, (e) PAR difference, SENSI -OPTI., (f) difference in total nutrient content, SENSI -OPTI.
Figure 5. Temporal evolution of the model-data misfit (Eq.9).The model-data misfit measures the difference between the genuine truth and a simulation where the "truth" parameter set (Table3) is modified by multiplying H N , H PAR and µ by 4, 3 and 3.6, respectively.The black (red) line refers to simulations driven by OPTI (SENSI).

Figure 6 .Figure 7 .Figure 9 .
Figure 6.Normalized Michaelis-Menten curves (α N N+H N ) for various half-saturation constants H N , as indicated in the figure legends.The normalization constant α is chosen such that all curves cross at the same nutrient concentration.Panel (a) and (b) feature nutrient crossover concentrations of 4 and 0.3 mmol Nm −2 .Shaded areas envelope the H N = 0.5 curve by adding a constant value of ±0.1 to this curve.

Table 1 .
Model parameters and associated ranges explored in this study.

Table 2 .
Parameter retrieval experiments.The experiments are based on different sets of data (Sects.2.4,2.5) and aim to retrieve all, or a subset of the model parameters listed in Table1simultaneously. PD , m DN , g new ) Introduction

Table 3 .
Genuine truth parameter set (column 2) and estimates (column 3-8) obtained by the parameter retrieval experiments described in Table2.The values in column 3 to 8 correspond to the first ensemble member of each retrieval experiment.In brackets we list the range enveloped by all ensemble members.

Table 4 .
Costs (range enveloped by all five ensemble members) for differing noise levels in the experiments NOISE [mmol Nm−3].Two outliers were discarded.