In many coastal areas there is an increasing number and variety
of observation data available, which are often very heterogeneous in their temporal and
spatial sampling characteristics. With the advent of new systems, like the radar
altimeter on board the Sentinel-3A satellite, a lot of questions arise concerning the
accuracy and added value of different instruments and numerical models. Quantification of
errors is a key factor for applications, like data assimilation and forecast improvement.
In the past, the

In the first step, the method is assessed using synthetic observations and Monte Carlo simulations. The technique is then applied to a data set of Sentinel-3A altimeter measurements, in situ wave observations, and numerical wave model data with a focus on the North Sea. Stochastic observation errors for the significant wave height, as well as bias and calibration errors, are derived for the model and the altimeter. The analysis indicates a slight overestimation of altimeter wave heights, which become more pronounced at higher sea states. The smallest stochastic errors are found for the in situ measurements.

Different observation geometries of in situ data and altimeter tracks are furthermore analysed, considering 1-D and 2-D interpolation approaches. For example, the geometry of an altimeter track passing between two in situ wave instruments is considered with model data being available at the in situ locations. It is shown that for a sufficiently large sample, the errors of all data sources, as well as the error correlations of the model, can be estimated with the new method.

Coastal areas like the German Bight are often characterised by strongly heterogeneous
ocean dynamics, typically associated with complicated bathymetry, small-scale coastline
features, and river runoff. A few instruments, like high-frequency
(HF) radar, are able to capture at least 2-D
surface currents with large coverage and high resolution quite nicely. Such systems have
a typical range of about 100 km, spatial resolutions on the kilometre scale, and about
20 min sampling

In the following, this situation is studied in more detail with respect to ocean waves
and the significant wave height in particular. Wave height information is of paramount
importance for many applications, e.g. shipping, offshore operations, or coastal
protection. Although numerical wave forecast models have reached an impressive level of
accuracy, there is still room for improvement, in particular in coastal areas with
complicated dissipation processes associated with wave breaking and bed friction

Bathymetry of the North Sea with the locations of some in situ wave observation instruments considered in this study. The plot shows isobaths for 30, 60, 90, and 120 m water depth.

Traditionally, validations of new data sets are performed by comparing to data from
established standard in situ measurements, which are regarded as a reference. As a first
step this is acceptable; however, one has to take into account that these reference
instruments are affected by measurement errors as well, and the separation of the error
contributions from the new data set and the reference instrument is, in general, not
possible unless additional information is used. This is easy to see if two data sets,

In this study the

In the

So far, assumptions about correlation errors were made a priori

So far, no systematic approach was presented to deal with more than three data sources.

The quantification of uncertainties concerning estimations of systematic and stochastic data source errors was
so far only done based on bootstrap approaches

The work presented here addresses the issues mentioned above and makes the following main
contributions:

A generalisation of the

In certain configurations, i.e. definitions of truth vectors and spatial distributions of data sources, the approach allows an estimation of cross covariance components of the stochastic errors contained in the considered observations or numerical models.

The theory includes the definition of a general data source vector, which can contain an arbitrary number of observations and numerical model data.

Analytical expressions are derived for the estimation errors regarding both systematic calibration errors and stochastic errors of the different data sources.

As an example for the generalised parameterisation of the truth, one can imagine two wave buoys and a satellite altimeter track passing between them. Let us furthermore think about a situation where the wave buoys are too far away from the track to assume that all three instruments measure the same quantity. However, it may be an acceptable assumption that the wave height measured by the altimeter is a linear combination of the wave heights observed by the two buoys. If independent numerical model wave height estimates are available at the buoy locations, the method presented in the following provides a systematic approach to estimate not only the stochastic errors of all data sets, but also the error correlation of the model at the buoy locations.

The present study is supposed to make a contribution to the exploitation of measurements
with larger distances, where additional assumptions about the spatial variation in the
truth are required. As an illustration, Fig.

With regards to the estimation errors, expressions are derived which provide a quantification depending on the covariance matrices of the data sources, and the number of available data samples. These results can give valuable information on the trustworthiness of estimated observation errors, in particular in situations with a small number of samples.

The paper is structured as follows. The multi-collocation method is introduced in
Sect.

In this section the multi-collocation method is explained, which includes the

The approach presented in this section to estimate stochastic errors does not require
bias-free reference instruments. Calibration errors are not considered in this first
step. Let us assume the truth is given by a vector

Let us now define a matrix

The number of data source error variances

Illustration of three considered
observation scenarios.

If there are more equations than unknowns, a standard linear squares approach can be used
to find a reasonable estimate for the unknown variance and covariance components of

For the case of the

If the available number of samples

In this section a special, but also typical, situation is considered, where for a couple
of measurements systematic errors can be neglected. Typically, this assumption is made
for standard in situ observations systems, like waverider buoys

To obtain expressions for the scaling parameters contained in

Let us assume for a moment that the scaling parameters

There are now two basic approaches to estimate the scaling parameters.

Direct method: Those terms in Eq. (

Iterative method: Terms in Eq. (

In the following, the techniques presented in Sects.

Background statistics used for the Monte Carlo Simulations.

Mean, variance (var), covariance (covar), and correlation
(corr) parameters used for the simulation of the background wave height
statistics at the locations of the Heligoland and Elbe buoys in the
German Bight. These numbers were derived from measurements taken during the
period June 2016–April 2017. The respective probability distributions with
a log-normal approximation are shown
in Fig.

In this section a brief analysis is presented concerning the impact of coastal gradients
on the standard

As explained before, the

We are now estimating these error contributions for the background statistics derived in
Sect.

The second issue to be discussed in this section is the role of the
spatial resolution of the different models and observations. The
main point to consider here is that sub-resolution variations in wave height
become part of the estimated data set error if the triple or multi-collocation
methods are applied. This has two main consequences:

The estimated data source errors are influenced by the background statistics.

For two data sources with a common unresolved band of
spatial scales, the data source errors are correlated

The above analysis has shown that both the collocation distance and the spatial data set resolutions are important factors for the quantification and interpretation of the respective data set errors. The separation of instrumental errors and sub-resolution-related errors is a challenge, because it requires knowledge about the truth background wave statistics on a sub-resolution scale. In general, such information can only be obtained if one of the data sources has a significantly higher spatial resolution than the other data sources.

As an example, we consider the case where we have data sources which are approximately
located along a straight line. This corresponds to the scenario depicted in
Fig.

The Monte Carlo experiments were then performed as follows:

120 observation vectors

The observation errors and their uncertainty were estimated using the
approach described in Sect.

These experiments were repeated 1000 times to obtain statistically robust results.

The uncertainties are estimated directly by computing the variance of the estimated observation errors over all experiments. This is called the “averaged experiments” approach (AVEXP) in the following.

The uncertainties were estimated for each experiment from the
input data covariance matrices as explained in Sect.

Parameters used for the Monte Carlo simulations in Sect.

In a second step the same exercise was done for the estimation of the systematic errors.
The first column of Table

Parameters used for the Monte Carlo simulations to validate
the approach described in Sect.

In this section the observation and numerical model data used for the multi-collocation analysis are introduced. The data sets are from the period April 2016 to August 2017.

The satellite data used here were taken by the European satellite Sentinel-3A launched in
February 2016. The satellite flies
on a sun-synchronous orbit with an exact repeat cycle of 27 days.
The spatial accuracy of the revisit is

In this study Sentinel-3A data with 1 Hz sampling are analysed, which correspond to
measurements taken every 7 km along the track. The analysed data were acquired in the
so-called reduced synthetic aperture radar (RDSAR) mode, which
provides data comparable to measurements from a traditional satellite altimeter. A
comparison of different Sentinel-3A altimeter modes can be found in

Figure

In this study in situ wave height measurements distributed over the
Global Telecommunication System (GTS) were used, which are archived
at the European Centre for Medium-Range Weather Forecasts (ECMWF;

For this study, data generated with the spectral wave model WAM were used

Compared to previous studies

In this section the triple collocation method, as a special case of the multi-collocation approach, is applied to
the Sentinel-3A altimeter wave height measurements introduced in Sect.

Traditionally, validations of new data sets are performed by comparing to
data from established in situ measurements, which are regarded as a
reference. Here, the following assumptions are made:

Sentinel-3A and the WAM model may be affected by calibration problems represented by the calibrations factors

Sentinel-3A and the WAM model may be affected by biases

Buoys are regarded as reference systems, i.e. they are assumed as bias free and without calibration errors.

Each of the Sentinel-3A tracks shown in Fig.

The model is interpolated to the buoy using linear interpolation.

The model is interpolated to the closest altimeter point using linear interpolation.

Both the buoy and the model are interpolated to the satellite overflight time.

The model value used for the location is taken as the average of the buoy and the satellite interpolation (see

As an example, Fig.

Figure

Colour coded biases

The scaling parameter for the satellite altimeter shown
in Fig.

Results for the stochastic errors are summarised in Fig.

The stochastic errors of the WAM model (Fig.

The finding that, on average, the in situ stations have the smallest stochastic errors is
at first sight in disagreement with results presented in

It is evident that the observed heterogeneity of in situ measurements is a big complicating factor in the analysis. Wave model computations and satellite altimeter observations have reached a level of accuracy where further improvements require a very careful selection and treatment of validation data sets. This in particular requires more knowledge about the type of in situ instruments and applied data processing techniques (e.g. averaging intervals). This could also be an argument for investments into dedicated validation instruments with more transparent and better documented error characteristics and quality control. The deployment of such instruments should take into account both research aspects and requirements for operational use.

In this section different examples are presented where more than three observations are
combined, i.e. this is beyond the standard

The geometry of the first example is depicted in Fig.

The idea to relate both in situ measurements to the altimeter track is to use a linear
interpolation of the truth wave height between the two stations, which makes the use of
the instrument with the larger distance more acceptable in the collocation procedure. In
principle, this corresponds to the 1-D case depicted in Fig.

Using this geometry allows for the estimation of the errors of all data sources, as well
as the error correlations between the model wave heights (see
Table

The numbers obtained for the stochastic errors are as follows.

The covariances estimated for the WAM wave height errors at the two buoy locations
correspond to a correlation value of 0.58. If we assume that the error autocorrelation
function is Gaussian shaped, i.e.

Because of

The geometry of the second example is depicted in Fig.

The scaling values and their standard deviations obtained with the direct method are as
follows.

The respective values for the stochastic errors and their standard deviations
with the same naming convention and obtained with the direct method are as follows:

For the correlation, a value of 0.39 was found for the altimeter and a value close to 1 for the WAM model. This corresponds to a correlation length of about 30 km for the satellite data. It makes sense that the correlation length for the WAM model is longer in this case compared to the configuration discussed in the previous section, because the analysed area is in deeper water quite far offshore, and can therefore be assumed as more homogeneous with respect to model errors.

The examples show that the multi-collocation method is in fact applicable to real data
source configurations. In particular, the matrix

The presented study provides an extension of the known

An approach was proposed to estimate the uncertainties in estimated calibration and
stochastic errors, which is also useful in the context of the standard

The proposed techniques were validated using Monte Carlo simulations with realistic background statistics. It was shown that the obtained error estimates and their respective uncertainties are in good agreement with the expected values, although a couple of approximations had to be used in the derivation.

The method was applied to a data set of in situ wave measurements, Sentinel-3A altimeter observations, and numerical wave model data. The number of available samples was relatively small and estimation errors had therefore to be taken into account. The usefulness of the derived error bars for the interpretation of the data could be demonstrated. For the analysed 16 months data set presented here, the estimation errors are significant, in particular if individual geographic locations are analysed. It would therefore be interesting to continue some parts of the analysis at a later stage of the Sentinel-3A mission, when a larger data set will be available. More robust results are obtained if the systematic and stochastic data set errors estimated for different in situ instrument locations are averaged. The results obtained for the North Sea indicate the smallest stochastic errors for the in situ measurements, as expected. The stochastic errors of the model and the altimeter seem comparable if averaged over all in situ locations. The analysis indicates that on average the altimeter is overestimating wave heights by about 10 % for above-mean wave conditions. Two examples of multi-collocations were analysed, which included a group of two and three in situ platforms. In both cases a Sentinel-3A track passed nearby, and model data were used in addition. The use of 1-D and 2-D parameterisations for the first and second example, respectively, resulted in estimates for the spatial decorrelation of model and altimeter errors.

The proposed method can be used for many other applications not discussed in this study.
For example, it is straightforward to extend the analysis of error correlations to the
time domain. The method can also be applied in situations where different instruments do
not measure exactly the same quantity, but different components of a truth vector, for
example HF radar providing 2-D current vectors and satellite SAR providing one current
component (e.g.

This study is supposed to make a contribution to the optimal use of the growing number of observations, in particular in coastal areas. For applications, like data assimilation, knowledge about the errors of different data sources is essential. Analysis of observation errors is also a critical component in the design and extension of observatories used for various applications. This subject will be of growing concern, for example, in the context of the European marine core service (CMEMS), where in situ data are required to optimise forecasts for all European seas.

The WAM model code can be found at

The authors declare that they have no conflict of interest.

This article is part of the special issue “Coastal modelling and uncertainties based on CMEMS products”. It is not associated with a conference.

This publication has received funding from the European Union's H2020 Programme for Research, Technological Development and Demonstration under grant agreement no. H2020-EO-2016-730030-CEASELESS. We thank Jean Bidlot from ECMWF for providing GTS in situ data. BSH kindly gave access to waverider buoy measurements. We are grateful to Luciana Fenoglio-Marc from the University of Bonn for providing Sentinel-3A altimeter data. We thank Arno Behrens from HZG for assistance with the wave model. The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association. Edited by: Agustín Sánchez-Arcilla Reviewed by: two anonymous referees