Ocean data assimilation systems encompass a wide range of scales that are difficult to control simultaneously using partial observation networks. All scales are not observable by all observation systems, which is not easily taken into account in current ocean operational systems. The main reason for this difficulty is that the error covariance matrices are usually assumed to be local (e.g. using a localisation algorithm in ensemble data assimilation systems), so that the large-scale patterns are removed from the error statistics.

To better exploit the observational information available for all scales in the assimilation systems of the Copernicus Marine Environment Monitoring Service, we investigate a new method to introduce scale separation in the assimilation scheme.

The method is based on a spectral transformation of the assimilation problem and consists in carrying out the analysis with spectral localisation for the large scales and spatial localisation for the residual scales. The target is to improve the observational update of the large-scale components of the signal by an explicit observational constraint applied directly on the large scales and to restrict the use of spatial localisation to the small-scale components of the signal.

To evaluate our method, twin experiments are carried out with synthetic
altimetry observations (simulating the Jason tracks), assimilated in a

Results show that the transformation to the spectral domain and the spectral localisation provides consistent ensemble estimates of the state of the system (in the spectral domain or after backward transformation to the spatial domain). Combined with spatial localisation for the residual scales, the new scheme is able to provide a reliable ensemble update for all scales, with improved accuracy for the large scale; and the performance of the system can be checked explicitly and separately for all scales in the assimilation system.

Over the last decades, the spectral window of the oceanic processes observed
from space has steadily increased. At the same time, model resolution has
also improved to better understand and interpret the observed signals. This
progress in observations and models is a challenge for ensemble data
assimilation because the size of the ensemble is always very small compared
to the number of degrees of freedom to be monitored. The model is usually too
expensive to perform large-size ensemble simulations. This means that the
probability distribution of the possible states of ocean is described by a
small sample as compared to the dimension of the subspace over which
uncertainties develop. In particular, the rank of the ensemble covariance
matrix is much smaller than the rank of the real error covariance matrix. A
traditional approximation to solve this problem is to localise this error
covariance matrix

The large-scale structures, although they are well observed (in the ocean by altimetry, ARGO floats, etc.), are therefore only indirectly controlled by the algorithm. Observations simultaneously contain information about small-scale structures (especially at the observation point) and about larger-scale structures, taking into account the full observational network. Spatial localisation does not directly take advantage of each scale contained in the observations system.

Because of the limited size of the ensemble, it is difficult to explicitly
control the full range of scales without separating the spectral components
of the signal. Separation of scales during the analysis step of data
assimilation algorithms allows us to adjust localisation according to the
considered spectral band of the signal. This is helpful to directly control
the large scales which are frequently and precisely observed (altimetry, ARGO
floats, etc.). To separate scales in data assimilation, two approaches have
been previously studied: the multiscale filter and the spectral
transformation. The multiscale filter consists in separating the signal in
two spectral bands, delimited by a cutting scale, in order to achieve two
distinct ensemble analysis in the spatial domain

Following a similar idea of combining the multiscale filter and spectral transformation approaches, we propose in the present paper to combine these two algorithms by applying a spectral analysis with spectral localisation (hereinafter called spectral localisation) to the large-scale components of the signal and a spatial analysis with spatial localisation (hereinafter called spatial localisation) for the residual scales. By separating the components, we avoid using an augmented covariance matrix, and we thus potentially neglect useful statistical relationships. However, this makes the multiscale system less expensive and easier to implement in an existing ensemble data assimilation system. It is indeed expected that the spectral transformation of the large scales is cheap enough to be applied to a large-size global ocean system, and that spectral localisation is more appropriate than spatial localisation to capture the large-scale components of the observed signal. On the other hand, for the small-scale components, the spectral transformation becomes too expensive, and the local correlation structure prevails. The target is thus to improve the observational update of the large-scale components of the signal by an explicit observational constraint applied directly on the large scales and to restrict the use of spatial localisation to the residual-scale components of the signal. These analyses should be done one after the other to be included in an existing sequential algorithm as operated, for instance, by Mercator Océan.

The performance of this multiscale observational update is then studied with
an example application in the context of Copernicus Marine Environment Monitoring Service (CMEMS) systems. We performed a
70-member ensemble simulation using the oceanic model NEMO (Nucleus for
European Modelling of the Ocean;

The objective of this paper is to describe the multiscale observational
update algorithm that we have developed and to evaluate its performance using
the CREG4 ensemble system. The paper is organised as follows. In
Sect.

The purpose of this section is to introduce the example application that is
used in this paper to study the performance of the multiscale observational
update. This example application is chosen to serve the development of the
CMEMS systems and to display the multivariate character of the assimilation
problem. The model configuration and the prior ensemble simulation are
described in Sect.

Our example application is based on a

Uncertainties in the model are explicitly simulated using the standard NEMO
stochastic parameterisation module developed by

Simulation of model uncertainties to perform the 70-member ensemble. It follows the working configuration used at Mercator Océan to perform ensembles for research and development in the context of CMEMS systems.

The standard deviation of the equation of state is defined according to longitude and latitude.

With this stochastic modelling system, a 70-member ensemble simulation,
without assimilation, is performed for the 8-month period between mid-January
and mid-September 2011. It will be used to perform the analyses in the
present paper. This ensemble simulation yields a probability distribution for
the evolution of the system, in particular the ensemble mean, hereafter

SSH (in metres). Ensemble mean

The assimilation problem investigated in this study is based on twin
experiments with altimetry. In this kind of experiment, the true state is
known and synthetic observations are built from this true state. It is
generated by the same model to which data assimilation is applied. This
method has the advantage that the effectiveness of the different algorithms
can be directly evaluated thanks to the known true state. One member of the
ensemble simulation is left apart to be used as a reference (the simulated
truth) from which the observations will be simulated:

In this study, to illustrate the behaviour of the multiscale algorithm, we
will concentrate on studying the observational update of the prior ensemble
on 30 August 2011. Figure

The observational update of the prior ensemble will be performed with a
square root algorithm. The analysis scheme used at Mercator Océan is derived
from the singular evolutive extended Kalman filter (SEEK)

The 69-member ensemble correlation structure (without the true state, which
has been left apart) is illustrated in Fig.

Two examples of ensemble correlation for the prior ensemble (SSH),
according to two different reference points indicated by black crosses,
computed for the full spectrum

However, if we look at the same correlation structure (from the same
ensemble) for the large-scale component of the signal (characteristic scale
larger than

It seems difficult to explicitly control all scales of the system without separating the different spectral components of the signal. In this study, the main idea is to do a spectral transformation of all variables of the system in order to do the analysis in the spectral domain before going back in the spatial domain to do the next steps of the assimilation scheme.

The purpose of this section is to describe the linear transformation that
will be applied on the state vectors and on the observation vectors to
separate scales. The forward and backward transformations of the model data
are described in Sect.

The forward transformation step involves transforming
each input parameter used for the analysis into the spectral domain, namely each member of the prior
ensemble, but also observations and observational errors. A full
two-dimensional signal in spherical coordinates,

This spectral transformation provides a new point of view on the ensemble
because it separates scales. Each degree

Standard deviation

From the spectrum

Any spectral band can thus be extracted by choosing the range

Ensemble mean

The use of spherical harmonics is not the most natural way to separate scales for fields that do not extend over the whole sphere. In principle, it would, for instance, be better to use the eigenfunctions of the Laplacian operator defined for the model domain. They would account for the land barriers and would display a better relation to the system dynamics. However, they would also be much more expensive to compute than the spherical harmonics and would need to be stored and then loaded each time they are needed to separate scales. This is why we preferred using spherical harmonics in this study: they make the method numerically efficient and they are sufficient to obtain a relevant spectral decomposition of the input signal.

In theory, transformation of observations is not needed to separate scales in the assimilation system. It should be sufficient to introduce the scale separation operator in the observation operator of the existing algorithm. However, for practical reasons, the algorithm that we are proposing requires a preprocessing of the observations to separate scales. This is done to keep the algorithm easy to implement in an existing system: nothing new needs to be implemented except the scale separation operator and to keep the resulting algorithm efficient enough to be applicable to a large-size assimilation system.

In this section, we show how this transformation of observations can be
performed by regression of the observations on the spherical harmonics (see
Sect.

For all observations that are not available on a regular grid (for which
Eq.

The approach is to look for the spectral amplitudes

In practice, several additional modifications may need to be introduced in
the algorithm and have been implemented for our study. (i) For a non-global
model domain (such as CREG4), it may be better to reduce the basis of the
spherical harmonics (for each degree

The observational error results from both the initial Gaussian error with a
standard deviation of

This error has been quantified following these steps. In this twin
experiment, the true state is known. The chosen true member, from which the
observation has been created, initially belongs to an ensemble of

This method is directly applicable to twin experiments and can be transposed to a real system by simulating observational error and looking at how it is transformed in the spectral domain. In a realistic case, the above method can directly be transposed by simulating observational error in model results and by transforming the difference between the perturbed and unperturbed data. The standard deviation of the result is then an estimate of the observation error standard deviation along each spherical harmonics.

We need to study the main dependencies and correlations between the different
spectral components of the ocean fields in order to determine whether and how
the scale separation could be used in the data assimilation scheme.
Figure

Two examples of ensemble correlation for the prior ensemble (SSH) in
the spectral domain, according to the degrees

To exploit this property of weak correlations between very different scales, a spectral analysis thus also requires to be localised, at least for the large scales in our study. The method of spectral localisation is the same as that usually used in the spatial domain. For the same reasons, each localisation window will contain a number of degrees of freedom sufficiently low to be controlled with an ensemble of moderate size.

The objective of this section is to introduce and demonstrate the multiscale
observational update algorithm, combining spectral localisation for the large
scales and spatial localisation for the small scales. In
Sect.

We propose an algorithm for the multiscale analysis based on a combination of
a spectral analysis with spectral localisation for the large scales
(described by Eq.

The analysis step is usually
done in the spatial domain with a spatial localisation (observational update

Another approach is to apply the observational update in the spectral domain
with spectral localisation (

Multiscale analysis combines a
spectral localisation for the large scales and spatial localisation for the
residual scales.

Then, compute

Compute

The relevance of implementing a multiscale analysis rather than the usual
spatial localisation is only validated if spectral localisation better
retrieves large-scale patterns of the signal than spatial localisation. To
verify the validity of this assumption, we perform two different analyses in
the context of the twin experiments described in
Sect.

Each scale of spatial and spectral analysis increments has to be as close as
possible to the corresponding scale of the true anomaly. The large-scale part
of this spatial analysis increment (

Ensemble mean of large-scale part of the analysis increments (SSH
in metres), with

Spectral localisation recovers large scales much better than spatial
localisation; see Fig.

On average, spectral localisation only gives better results than spatial
localisation for the large scales, but we need to check that this affirmation
remains valid at each scale or that there exists a critical scale,

Reduction of spatial RMSE for each degree for the SSH, computed
using Eq. (

This gives a new point of view to evaluate the results of an analysis, giving
the efficiency of the spatial or spectral analysis at each scale and no
longer only for the full field.
Spatial localisation deals with all scales at the same time. The score is
almost the same at each scale: around

Until around

The aim of this section is to evaluate the multiscale analysis and to compare
it with spatial analysis, for the full spectrum but also at each scale. For
that purpose, we did a multiscale analysis following the algorithm presented
in the previous Sect.

In Sect.

On average, the updated ensemble produced with the multiscale analysis should
better approach the true state than those obtained with the spatial
localisation only. To evaluate the efficiency of the multiscale analysis, the
error has been computed in two ways: at each scale in the spectral domain,
following Eq. (

The previous score showing the evolution of the RMSE after/before the
analysis on average on the model domain, following
Eq. (

Multiscale analysis keeps the advantages of both localisations (spectral
localisation in green and spatial localisation in blue). As expected, for the
large scales

The analysis increments obtained with spatial localisation
(Fig.

Same as Fig.

These analysis increments can be evaluated at each scale. Figures

The updated ensemble should be reliable in the spatial domain but also in the
spectral domain. This involves checking the coherence between the assumed
probabilities and the observed statistics when the ensemble is compared to
the verification data (the true state in our twin experiment, or observation
in a real system). To check ensemble reliability, ranks are traditionally
computed in the spatial domain and summarised in a rank histogram. They show
the distribution of observations with respect to the ensemble

Rank histograms have been computed, with respect to the true state, from
spatial maps limited to the Jason domain for the prior ensemble, the
spatially updated ensemble and the multiscale updated ensemble.
Figure

Spatial rank histogram on the Jason domain for the SSH. “Prior” (in
red), “spat” (in blue), “spct” (in green) and “spct

Rank histograms show that all these updated ensembles can be considered as reliable as the prior ensemble, both for the full spectrum and for the large scales. Indeed, the prior ensemble looks somewhat underdispersed but can be considered reliable because the true member originates from the ensemble itself. The rank histograms of the updated ensembles are of the same order of magnitude as that of the prior ensemble. Thus, the small underdispersion of the prior ensemble (which can only result from the limited size of the sample) has not increased during the analysis step. These consistent rank histograms confirm that the observational error has been properly evaluated.

Reliability of all updated ensemble (spatial localisation only and multiscale
analysis) is now tested for degrees

Maps of ranks in the spectral domain for the SSH, according to the
degrees

Ranks maps in the spectral domain provide additional indication that all
algorithms provide reliable updated ensembles. Observational error has been
consistently evaluated. The ranks are computed for each spectral coordinate

The spread, or variance, of the prior and the updated ensemble (with

Ensemble spread of the prior

The multiscale analysis allows to decrease the ensemble spread more than the
spatial localisation. The spread is much more reduced along Jason tracks; see
Fig.

Same as Fig.

Improvement obtained by the multiscale analysis for
temperature

Multivariate analysis consists in extending the observational update to non-observed variables, like temperature and salinity, in the state vector during
the analysis. The experimental setup remains the same. The aim is to evaluate
the impact of the multiscale analysis on these non-observed variables and to
check that it does not introduce more error than the spatial localisation.
These errors could increase during the next forecast and cause some
unrealistic values. For this purpose, we compute the score defined by
Eq. (

On average, below and around the critical degree

We have formulated and evaluated a multiscale analysis approach for ensemble ocean data assimilation that provides a better recovering of the large scales than the current spatial analysis with spatial localisation. It has been developed to be used in the existing data assimilation system of Mercator Océan used in the CMEMS project. This new scheme consists in performing a spectral analysis with spectral localisation for the large scales and a spatial analysis with spatial localisation for the residual scales.

The transformation to the spectral domain and the spectral localisation provides consistent ensemble estimates of the state of the system (in the spectral domain, or after backward transformation to the spatial domain). In terms of accuracy, this spectral localisation recovers the large-scale structures better than the spatial localisation. For the large scales, spectral localisation yields lower errors than spatial localisation while keeping a reliable ensemble. Conversely, the spatial localisation is still preferable for the small scales.

This new spectral approach also gives a new point of view to diagnose the system. Traditional diagnostics as ensemble mean, spread, correlations structures, rank histograms, etc., give information at each scale and no longer only for the full field.

The multiscale analysis, which is a hybrid scheme combining spectral localisation for the large scales and spatial localisation for the residual scales, keeps the advantages of these two localisations. Consequently, it can significantly improve the current use of various ocean observing systems, particularly with regard to the large-scale information contained in sparse distribution of observations as altimeters or ARGO floats.

The direct perspective of this study is to implement and test the method in the real CMEMS system developed at Mercator Océan. The target is (i) to check that the method can be applied without deep modification of the existing system, (ii) to evaluate the operational gain that is obtained by an improved control of the large-scale signal and (iii) to enhance the diagnostic of the system by evaluating the performance separately for each scale. Some data assimilation steps have already been successfully carried out in the same context of our study (not shown). In the longer perspective, the implementation of this multiscale approach for ensembles might improve the CMEMS products of Mercator Océan as the reanalysis which is used by a large scientific community.

The basic code developed to introduce scale separation into the assimilation system is available on request from the authors.

The authors declare that they have no conflict of interest.

This article is part of the special issue “The Copernicus Marine Environment Monitoring Service (CMEMS): scientific advances”. It is not associated with a conference.

This work was conducted as a contribution to the GLO-HR-ASSIM project, funded by the Copernicus Marine Environment Monitoring Service (CMEMS). CMEMS is implemented by Mercator Océan International in the framework of a delegation agreement with the European Union. Additional support for this study was also provided by the CNES/OSTST/MOMOMS project. The calculations were performed using HPC resources from GENCI-IDRIS (grant 2017-011279).

This paper was edited by Marina Tonani and reviewed by Benedicte Lemieux-Dudon and two anonymous referees.