Self-Organizing Maps approaches to analyze extremes of multivariate wave climate

Introduction Conclusions References Tables Figures


Self-Organizing Maps approaches to analyze extremes of multivariate wave climate
The assessment of wave conditions at sea is fruitful for many research fields in marine and atmospheric sciences and for the human activities in the marine environment.To this end, in the last decades the observational network, that mostly relies on buoys, satellites and other probes from fixed platforms, has been integrated with numerical Figures models outputs, which allow to compute the parameters of sea states over wider regions.Apart from the collection of wave parameters, the technique adopted to infer the wave climate at those sites is a crucial step in order to provide high quality data and information to the community.In this context, several statistical techniques have been proposed to provide a reliable representation of the probability structure of wave parameters.While univariate and bivariate probability distribution functions (pdfs) are routinely derived, multivariate pdfs that represent the joint probability structure of more than two wave parameters are hardly obtained.For individual waves, for instance, the bivariate joint pdf for wave height and period was derived by Longuet-Higgins (1983) and the bivariate joint pdf for wave height and direction was obtained by Isobe (1988).
A trivariate joint pdf for wave height, wave period and direction is due to Kwon and Deguchi (1994).For sea states, attempts have been made to model the joint probability structure of the integral wave parameters.For instance, a joint pdf for the significant wave height and the average zero-crossing wave period was derived by Ochi (1978), and Mathisen and Bitner-Gregersen (1990).
Recently, the Self-Organizing Maps (SOM) technique has been successfully applied to represent the multivariate wave climate around the Iberian peninsula (Camus et al., 2011a, b) and the South America continent (Reguero et al., 2013).SOM (Kohonen, 2001) is an unsupervised neural network technique that classifies multivariate input data and projects them onto a uni-or bi-dimensional output space, called a map.The SOM technique was originally developed in the 1980s, and over three decades has been largely applied in various fields, including oceanography (Falcieri et al., 2013;Solidoro et al., 2007).Typical applications of SOM are vector quantization, regression and clustering.SOM gained credit among other techniques with same applications due to its visualization properties that allow to get multidimensional information from a 2-D lattice.SOM has also the advantages of unsupervised learning, therefore vector quantization is performed autonomously.However, the quantization is strongly driven by the input data density.Hence, the most frequent conditions force the map to place clusters Introduction

Conclusions References
Tables Figures

Back Close
Full there, while the most rare (i.e. the extreme events) are often missed.Consequently, it is highly unlike to find on the map a cluster representing the extremes.
In the context of ocean waves, drawing upon the works of Camus et al. (2011a, b) and Reguero et al. (2013), SOM input is typically constituted by a set of wave parameters measured or simulated at a given location and evolving over the time t, e.g. the triplet significant wave height H s (t), mean wave period T m (t) and mean wave direction θ m (t), even if other variables can be added (examples of five-or six-dimensional inputs can be found in Camus et al., 2011a).Several activities in the wave field could benefit of the SOM outcomes: selection of typical deep-water sea states for propagation towards the coast to study the longshore currents regime and coastal erosion, individuation of typical sea states for wave energy resource assessment and wave farm optimization.In addition the experimental joint and marginal pdfs can be derived from SOM.As accurately shown in Camus et al. (2011b), besides interesting potentialities, especially in visualization, some drawbacks in using SOM for wave analysis have emerged with respect to other classification techniques.Indeed, the largest H s are missed by SOM because such extreme events are both rare (few comparisons in the competitive stage of the SOM learning) and distant from the others in the multidimensional space of input data (poorly influenced during the cooperative stage).
Moving from this fact we have asked: how can we employ SOM with its visualization properties to improve representation of the edges of a multivariate wave climate at a location, i.e. extremes?To answer this question we have followed three different strategies.Firstly, we have pre-preprocessed the input data using the Maximum Dissimilarity Algorithm (MDA) in order to reduce the redundancies of low and moderate sea states, as done by Camus et al. (2011a).Indeed, MDA is a technique that reduces the density of inputs by preserving only the most representative (i.e. the most distant from each other in a Euclidean sense).Doing so, the most severe sea states are expected to gain weight in the learning process.We have called this strategy MDA-SOM.Then, we have focused on the post-processing of the outputs.In this context, the improved extremes representation has been searched by running SOM on a dataset of events exceeding Introduction

Conclusions References
Tables Figures

Back Close
Full a prescribed threshold.Hence, we have applied a two-steps SOM approach, by firstly running a standard SOM to get a representation of the low/moderate (i.e. the most frequent) wave climate, and then we have run a second SOM on a reduced input sample.This new sample has been obtained in one case by taking the events that exceed a prescribed threshold (e.g.97th percentile of H s , calling this strategy TSOM) from first step SOM results, and in another case by taking the peak of the storms, individuated by means of a Peak-Over-Threshold analysis (calling this strategy POT-TSOM, Boccotti, 2000).To present results of two-steps SOMs, we propose a double-sided map, which shows on the left the map with the low/moderate sea states, while on the right the map with the most severe sea states (i.e. the extremes).

Data
The dataset employed for the SOM analysis consists of wave time series gathered at the Acqua Alta oceanographic tower.Acqua Alta, owned and operated by the "Italian National Research Council -Institute of Marine Sciences" (CNR-ISMAR), is located in the northern Adriatic Sea (Italy, northern Mediterranean Sea), 8 miles off the Venice coast at 17 m depth (Fig. 1).The tower is a preferential site for marine observations (wind, wave, tide, physical and biogeochemical water properties are routinely retrieved), with a multi-parameters measuring structure on board (Cavaleri, 2000).For this study, we have relied on a 30 years  dataset of 3-hourly significant wave height H s , mean wave period T m and mean wave direction of propagation θ m , that were measured by pressure transducers.Preliminarily, data have been preprocessed in order to remove occasional spikes.To this end, at first the time series have been treated with an ad-hoc despiking algorithm (Goring and Nikora, 2002).The complete dataset is therefore constituted of three variables and 50 503 sea states.Basic statistics of the time series, given in This is represented more in detail in the experimental pdf of θ m (Fig. 2, bottom panel), where the most frequent directions of propagation are indeed in the range 180 < θ m < 360 • N (western quadrants), with peaks at 247.5 and 315 Storms in the area (denoted as sea states with H s ≥ 1.5 H s ) are generated by the dominant winds, i.e. the so called "Bora" and "Sirocco" winds (Signell et al., 2005;Benetazzo et al., 2012)."Bora" is a gusty katabatic and fetch-limited wind that blows from north-east; it generates intense storms along the Italian coast of Adriatic Sea characterized by relatively short and steep waves."Sirocco" is a wet wind that blows from south-east; it is not fetch-limited and it generates longer and less steep waves than "Bora" which come from the southern part of the basin.Denoting as "Bora" the events with 180 ≤ θ m ≤ 270 • N, and as "Sirocco" the events with 270 < θ m ≤ 360 • N, it results that "Bora" storms have an occurrence of 12 %, while "Sirocco" storms of 8 %. marginal pdfs of H s and T m are also shown in Fig. 3, confirming that the most probable H s at Acqua Alta is 0.2 m, and the most probable T m is 3.6 s.

Theoretical background
In this Section, we recall SOM features that are functional to the study.For more comprehensive readings we refer to Kohonen (2001) and the other references cited in the following.
SOM is an unsupervised neural network technique that classifies multivariate input data and projects them onto a uni-or bi-dimensional output space, called a map.Typically a bi-dimensional lattice is produced as output map.The global structure of the lattice is defined by the map shape that can be set as sheet, cylindrical or toroidal.The local structure of the lattice is defined by the shape of the elements, called units, that can be rectangular or hexagonal.The output map produced by a SOM on wave input data (e.g. as in Camus et al., 2011a) furnishes an immediate picture of the multivariate wave climate and allow to identify, among the others, the most frequent sea states along with their significant wave height, mean direction of propagation and mean period.
The core of SOM is represented by the learning stage.Therefore, the choice of the learning functions and parameters is crucial to obtain stable and reliable maps.In SOM, the classification of input data is performed by means of competitive-cooperative learning: the elements of the output units compete among themselves to be the winning or Best-Matching Units (BMUs), i.e. the closest to the input data according to a prescribed metric (competitive stage), and they organize themselves due to lateral inhibition connections (cooperative stage).Usually, the chosen metric is the Euclidean distance, and inputs have to be normalized before learning (e.g. by imposing unit variance or [0,1] range for all the input variables) and de-normalized once finished.The lateral inhibi-Introduction

Conclusions References
Tables Figures

Back Close
Full tion among the map units is based upon the map topology and upon a neighboring function, that expresses how much a BMU affects the neighboring ones at each step of the learning process.During the learning process, the neighboring function reduces its domain of influence according to the decrease of a radius, from an initial to a final user-defined value.Learning can be performed sequentially as done by the original incremental SOM algorithm, where input data are presented one at a time to the map, or batchwise as done by the more recent batch algorithm, where the whole input data set is presented all at once to the map (Kohonen et al., 2009).While the sequential algorithm requires the accurate choice of a learning rate function, which decreases during the process, the batch algorithm does not.At the beginning of the learning stage, the map has to be initialized: randomly, or preferably as an ordered 2-D sequence of vectors obtained from the eigenvalues and eigenvectors of the covariance matrix of the data.In both the SOM algorithms the learning process is performed over a prescribed number of iterations that should lead to an asymptotic equilibrium.Even if Kohonen (2001) argued that convergence is not a problem in practice, the convergence of the learning process to an optimal solution is however an unsolved issue (convergence has been formally proved only for the univariate case, Yin, 2008).The reason is that, unlike other neural network techniques, SOM does not perform a gradient descent along a cost function that has to be minimized (Yin, 2008).Hence, the degree of optimality of the resulting map has to be assessed in other ways, e.g. by means of specific error metrics.The most common ones used for SOM are the mean quantization error and the topographic error (Kohonen, 2001).The former is the average of the Euclidean distances between each input data and its BMU, and is a measure of the goodness of the map in representing the input.The latter, is the percentage of input data that do not have adjacent first (the BMU) and second units in the map and is a measure of the topological preservation of the map.In this context, in order to achieve reliable and stable maps, it is crucial to follow practical advices and to keep error metrics under control.Introduction

Conclusions References
Tables Figures

Back Close
Full

SOM set-up
In this paper, the SOM technique has been applied by means of the SOM toolbox for MATLAB (Vesanto et al., 2000), that allows for the most of the standard SOM capabilities, including pre-and post-processing tools.Among the techniques available, we have chosen the batch algorithm, because together with a linear initialization permits repeatable analyses, i.e. several SOM runs with the same parameters produce the same result (Kohonen et al., 2009).This is not a general feature of SOM, as the non-univoque character of both random initialization and selection of the data in the sequential algorithm lead to always different, though consistent, BMUs (Kohonen, 2001).Parameters controlling the SOM topology and batch-learning have been accurately examined and their values have been chosen as the result of a sensitivity analysis aimed at attaining the lowest mean quantization and topographic errors.Therefore, SOM have produced outputs as bi-dimensional squared lattices, that are sheet-shaped and with hexagonal cells.This kind of topology has been preferred to others (e.g.rectangular lattice, toroidal shape, rectangular cells, etc.) because the maps so produced had the best topological preservation (low topographic error) and visual appearance.The maps size is 13 × 13 (169 cells), hence each cell represents approximately 300 sea states on average, if the complete dataset is considered.The lateral inhibition among the map units is provided by a cut-gaussian neighborhood function, that ensures a certain stiffness to the map (Kohonen, 2001) during the batch learning process (1000 iterations).At the same time, to allow the map to widely span the dataset, the neighborhood radius has been set to 7 at the beginning, i.e. more than half the size of the map, and then it linearly decreased to 1.The process has been performed once, without considering a rough phase followed by fine-tuning.Indeed, this is the condition that have produced the lowest mean quantization and topographic errors, hence the most reliable maps.
Input data have been normalized in oder that the minimum and maximum distance between two realizations of a variable are 0 and 1, respectively.To this end, according Introduction

Conclusions References
Tables Figures

Back Close
Full to Camus et al. (2011a), the following normalizations have been used: Doing so, H and T range in [0, 1], while θ ranges in [0, 2].To take into account the circular character of θ m in distance evaluation, following Camus et al. (2011a) we have considered the Euclidean-circular distance as the metric for SOM learning.In this context, the distance d i j between input data {H i , T i , θ i } and SOM unit {H j , T j , θ j } is defined as: (2) The Euclidean-circular distance has been therefore implemented in the scripts of SOM toolbox for MATLAB where distance is calculated.

Results
In this Section, results of the standard SOM approach and of the different strategies proposed to improve extremes representation are presented.The performances of standard SOM, MDA-SOM and TSOM are assessed by comparing the wave parameters time series and their experimental marginal pdfs to the time series reconstructed from the results of the different strategies and relative pdfs, respectively.POT-TSOM is treated separately because a direct comparison with the other strategies using the described methods is not possible.

Standard SOM
Standard SOM has been applied using the set-up illustrated in Sect.Gradual and continuous changing in wave parameters over the BMUs points out that the topological preservation is quite good, as confirmed by the 22 % topographic error.
According to the map, the most frequent sea states are represented by the triplet [0.17 m, 3.5 s, 323 • N], which resembles the information that one could have gather from the bivariate (H s -T m ) and (H s -θ m ) diagrams, i.e. [0.2 m, 0.36 s, 315 • N], though these two diagrams are not formally related one to the other.The most of the BMUs show wave propagation directions that point towards the western quadrants, as displayed in the joint and marginal pdfs of θ m .The BMUs denoting sea states forced by land winds (pointing toward east) are clustered in the top-left corner of the map and have very low frequencies of occurrence (individual and cumulated).The frequency of occurrence of calms is 80 %, while that of "Bora" storms is 12 % and that of "Sirocco" storms is 8 % (using definition of calms, "Bora" and "Sirocco" storm events given in Sect.2).Hence, the integral distribution of the observed events over H Quantitatively, for this particular event, standard SOM underestimates the peak of 32 % H s , 12 % T m and 2 % θ m .Although H s appears to be the most affected (T m and θ m after SOM are in a better agreement with the original data), all the variables processed by SOM experience a tightening of the original ranges of variation as it is shown in Fig. 6 displaying the marginal experimental pdfs of H s , T m and θ m after SOM, compared to the original pdfs.Generally, marginal experimental pdfs provided by SOM are in good agreement with the original ones.However, the range of variation of H s is reduced from 0.05-5.23 to 0.17-2.75m, the range of T m from 0.5-10.1 to 2.4-7.4 s, and the range of θ m from 0-360 to 41-323 • N. The maximum H s value given by SOM (2.75 m) is pretty close to the 99th percentile value (2.68 m), pointing out that SOM provides a good representation of the wave climate up to the 99th percentile approximately.Nevertheless, the remaining 1 % of events not properly described (extending up to 5.23 m) is for some applications the most interesting part of the sample.This confirms that standard SOM provides an incomplete representation of the wave climate.

Maximum Dissimilarity Algorithm and SOM (MDA-SOM)
In order to reduce redundancy in the input data and to enable a wider variety of represented sea states, in previous studies (e.g.Camus et al., 2011a) authors applied the "Maximum Dissimilarity Algorithm" (MDA) before the SOM process.In doing so, a new set of input data for SOM is constituted by sampling the original data in a way that the chosen sea states have the maximum dissimilarity (herein assumed as the Euclidean-circular distance) one from each other.As a result of MDA a reduction of the number of sea states with low/moderate H s , i.e. the most frequent at Acqua Alta, is observed.Hence, MDA-SOM is expected to provide a better description of the extreme sea states.Nevertheless, as pointed out by Camus et al. (2011a) the reduction of the sample numerosity leads to lower errors in the 99th percentile of H s (chosen to represent extremes) but also to higher errors in the average of H s .Therefore, in terms of percentage reduction of the original input dataset, an optimum balance has to be found in order to get good descriptions of the average and of the extreme wave climate.Introduction

Conclusions References
Tables Figures

Back Close
Full In the MDA-SOM application, we have pre-processed the input dataset by applying MDA, as described in detail in Camus et al. (2011a).Looking for the best reduction coefficient, the original dataset has been reduced by means of MDA from the initial 50 503 sea states (100 %) to 5050 (10 %), with step 10 %.The absolute errors on H s and on the 99th percentile of H s after MDA-SOM, relative to the original dataset, are summarized in Table 2.The error on H s , initially 2 %, monotonically increases up to 57 %, while the error on the 99th percentile of H s , initially 9 %, decreases down to 3 % at 50-60 % and then increase up to 27 %.With widening of the variables' range as principal target (hence a better description of extremes) but without losing the quality on the average climate description, we chose to consider 80 % reduction (7 % error on H s , 40 % error on 99th percentile H s ).The corresponding MDA-SOM output map displayed in Fig. 7 is topologically equivalent to that produced by SOM (Fig. 4), except for minor differences on the location of some BMUs.However, the most frequent BMU has [H s , T m , θ m ] = [0.28m, 2.8 s, 328 • N], which is more distant to what have emerged from diagrams of Sect.2, with respect to standard SOM, especially for T m .Also, the BMU with highest H s has the triplet equal to [2.8 m, 6.0 s, 275 • N], hence even if the input dataset has been reduced, the representation of extremes is still unsatisfactory.This is confirmed by the comparison of the original and the reconstructed (after MDA-SOM) time series.In Fig. 8, for the sequence of events already shown, the comparison has been extended to the results of 60 % MDA-SOM (smaller error on 99th percentile H s , see Table 7) and 10 % MDA-SOM (maximum input dataset reduction), in order to investigate if MDA-SOM can enhance extreme wave climate representation even accepting a worsening of the average one.Actually, 60 % MDA-SOM performs only slightly better than 80 % MDA-SOM in describing the chosen events; indeed the highest H s triplet, which represent the sea states at the peak of the most severe storm, is (2.93 m, 5.8 s, 258 • N).A better reproduction of H s at this peak is provided by 10 % MDA-SOM, though the maximum is however missed and in its proximity the original data are overestimated.Indeed, 60 and 10 % MDA-SOMs locally overestimate H s in the low/moderate sea states.Introduction

Conclusions References
Tables Figures

Back Close
Full The marginal experimental pdfs after MDA-SOM are compared in Fig. 9 to the pdfs of the original dataset.The distributions are in good agreement and the representation is more complete with respect to standard SOM, especially concerning H s .Nevertheless, 10 % MDA-SOM distribution for H s exhibits a larger departure from the original distribution at 1.7 m with respect to standard SOM.Also 10 % MDA-SOM distributions, which provides the widest ranges, locally depart from the reference distributions, in particular for T m and θ m .The frequency of occurrence of calms is 81 %, while that of "Bora" storms is 12 % and that of "Sirocco" storms is 7 %.Hence, except for a minor change in the frequency of calms and "Sirocco" events, the overall statistics resembles that one observed at Acqua Alta.

Two-steps SOM (TSOM)
A two-step SOM (TSOM) has been then applied to provide a more complete description of the wave climate at Acqua Alta.To this end, the SOM algorithm has been run a first time on the original dataset, without reductions (first step).Then, outputs have been post-processed: a threshold H * s was fixed, and the BMUs having H s > H * s have been considered to constitute a new input dataset that is composed of the sea states represented by the BMUs exceeding the threshold.Hence, a second SOM has been run on the new dataset (second step).The SOM set-up is the same as in the first step.At the end, we have obtained a two-sided map that represents the wave climate (Fig. 10, for instance): the first map on the left side describes the climate below H * s , the second map on the right side focuses on the climate over H * s .Three thresholds were tested that correspond to the 95th, 97th and 99th percentile of H s : 1.80, 2.12 and 2.68 m, respectively.In the following, we have focused on the results with 97th percentile threshold, since they have turned out to be more representative of the extreme wave climate than the others.
Figure 10 depicts TSOM results with H * s = 2.12 m (97th percentile).The first map, on the left, is the map already shown in Fig. 4, representing the wave climate at Acqua Alta.On that map, the BMUs with H s > 2.12 m have been encompassed by a black line.Introduction

Conclusions References
Tables Figures

Back Close
Full Without such BMUs, the map on the left represents the low/moderate wave climate, i.e. the 97 % of the whole original dataset constituted by events with H s below or equal to the 2.12 m threshold.The remaining 3 % of events, represented by the encompassed BMUs, are the most severe events at Acqua Alta.The first step SOM associates to such events 2.12 < H s < 2.75 m, 5.0 < T m < 6.5 s and 249 < θ m < 299 • N. Hence, according to SOM, the most severe sea states pertain to a rather tight directional sector (50 • ) making hard to discriminate between "Bora" and "Sirocco" events.A more detailed representation of such extreme events is provided by the second map in Fig. 10, on the right, where extreme "Bora" and "Sirocco" events are more widely described by BMUs.Indeed, a sort of diagonal (from the top-right corner to the bottom-left corner of the map) divides the BMUs."Bora" events are clustered on the left of this diagonal (top-left part of the map), while "Sirocco" events on the right of that (bottom-left part of the map).On the diagonal, BMUs represent sea states that travel towards west.This configuration somehow resembles the one observed in the left map, except for the land sea states, in the top-left corner.The most severe sea sates are clustered in the topright corner of the map and also, though to a smaller extent, in the bottom-left part of it.The resulting ranges are 1.94 < H s < 4.26 m, 4.4 < T m < 8.3 s and 224 < θ m < 316 • N.
The widened ranges of wave parameters provided by TSOM allow a more complete description of the sea states at Acqua Alta, including the most severe as it is shown in Fig. 11.There, for the sequence of events presented in the previous Sections, the reconstructed TSOM time-series is compared to the original one.Also results with 95th and 99th percentile TSOM are plotted, and it clearly appears that the differences among the three tests (i.e.TSOM with H s threshold on 95th, 97th and 99th percentiles) are very small, in particular for what concerns θ m .Nevertheless, 95th percentile TSOM yield to a smaller estimate of the highest H s peak with respect to the others, and 99th percentile TSOM deviates more than the others from the original T m .
Such differences are found also in the marginal experimental pdfs of the wave parameters, shown in Fig. 12.Indeed, p(H s ) and p(T m ) locally differ among the three thresholds and also from the original pdf, in particular in the largest values of H s and

Conclusions References
Tables Figures

Back Close
Full T m .As expected, the more the threshold is high, the more H s range widens, extending to higher values.Hence, 99th percentile TSOM provides the more complete representation of the wave climate, at least concerning H s .Indeed, the widest T m range is obtained with 97th percentile and the tightest with 99th percentile TSOM.Instead, p(θ m ) is equally represented by the three thresholds and is in excellent agreement with the original pdf, though the θ m range is limited with the respect to the complete circle.In addition, local departure from the original pdfs are still observed, especially for H s and T m .The frequency of occurrence of calms is 81 %, while that of "Bora" storms is 11 % and that of "Sirocco" storms is 8 %.Hence, except for a minor change in the frequency of calms and "Bora" events, the overall statistics resembles that one observed at Acqua Alta.

Comparison among strategies
A summary of the performances of the different SOM strategies we tested until here is given in Table 3. There, standard SOM, MDA-SOM with 80 % reduction and TSOM with H s threshold at 97th percentile are compared in their capabilities of representing the wave climate at Acqua Alta by means of the BMUs.As done in the previous sections, the performances are assessed by comparing the reconstructed time series from each strategy with the original ones and the resulting marginal pdfs with pdfs of the original data.However, here the performances are quantified by statistical parameters (see caption of Table 3 for nomenclature).Generally, the reconstructed time series are in agreement with the original one, as shown by the high r av (over 0.98) and r SD (over 0.89), as well as high CC (over 0.95) and low RMSE (below 0.19 m for H s , 0.37 s for T m and 23 • for θ m ).Though, the highest ratios and correlation coeffi- θ m , whose widest range is provided by MDA-SOM, TSOM turned out to be the most efficient in providing the most complete representation among the tested strategies.
5 Peak-Over-Threshold Two-steps SOM (POT-TSOM) As an additional strategy to provide a more complete representation of the wave climate through SOM, we tested a different approach.It is based on a two-step SOM applied initially to the whole dataset, to classify the low/moderate wave climate, and then applied to the peaks of the storms defined by means of Peak-Over-Threshold technique.Storms were identified according to the definition of Boccotti (2000): a storm is the sequence of H s that remains at least 12 h over a given threshold H * s that corresponds to 1.5 times the mean H s .We considered the H s at Acqua Alta (Table 1) and then, with H * s = 0.93 m, we individuated 710 storms.The peaks of the storm constitutes a new dataset that has been analyzed by means of SOM.At the end, we have obtained a double-side map that represent both the low/moderate (on the left) and the "stormy" (on the right) wave climate.POT-TSOM is not directly comparable to the other strategies since the dataset used for the second step of the two-steps SOM is composed by the storms peaks only, thus the global dataset represented by SOM is not continuous.
POT-TSOM output map is shown in Fig. 13.As expected stormy events are "Bora" and "Sirocco" events: the former are clustered on the upper and middle part of the map, the latter in the lower part of it.The most severe storms, concentrated on the right of the map, are both "Bora" and "Sirocco" events.The triplet with the highest H s is

Conclusions
In this paper, we have tested different strategies aimed at providing a complete description of the 1979-2008 trivariate wave climate at Acqua Alta tower using SOM.Indeed, we have verified that besides a satisfactory description of the multivariate low/moderate wave climate (in agreement with usual uni-and bivariate diagrams), standard SOM approach misses the most severe sea states, which are hidden in BMUs with H s even considerably smaller than the extreme ones.For instance, standard SOM classified as [2.75 m, 5.9 s, 270 • N] most of the sea states with H s > 2.75.Hence, the most interesting part of the wave climate was condensed within a few BMUs which hardly allow to discriminate between "Bora" and "Sirocco" events.To increase the weight of the most severe and rare events, we tested a strategy based on the pre-processing of the input dataset (i.e.MDA-SOM) and a strategy based on the post-processing of the standard SOM results (i.e.TSOM).Results presented showed that the post-processing technique is more effective than the pre-processing one.Indeed, a wider range of the wave parameters within BMUs was obtained, in particular for H s .This allowed a representation of the sea states with TSOM more accurate and complete with respect to the one furnished by MDA-SOM, both for time series and marginal experimental pdfs.Nevertheless, some deviations from original pdfs were observed and the range of θ m was not complete, such that sea states traveling towards the north were not properly described.This requires further studies to improve SOM applications to wave analysis, which are rather promising, thanks to the well recognized visualization properties.
In this context, we proposed a double-sided map representation for two-steps SOM strategies, which summarizes on the left the low/moderate wave climate and on the right the most severe one.This novel representation was employed also to show the results of the POT-TSOM, which classified the storms peaks at Acqua Alta tower.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full  Full OSD 12,2015 SOM approaches for extremes of wave climate

F.1
Barbariol 1 , F. M. Falcieri 1 , C. Scotton 2 , A. Benetazzo 1 , S. Carniel 1 , and M. Sclavo 1 Institute of Marine Sciences -Italian National Research Council, Venice, Italy 2 University of Padua, Padua, Italy 1 Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | The bivariate wave diagram (H s -T m ) illustrates the experimental joint pdf of H s and T m (Fig. 3), where it emerges that the most common sea states are represented by [H s , T m ] = [0.2m, 3.6 s].This diagram also displays that the [H s , T m ] pairs associated to extreme sea states are [4.8 m, 7.4 s], [5.0 m, 7.2 s] and [5.0 m, 5.8 s].The experimental Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 3.2.The SOM output map in Fig. 4 merges all the information about the trivariate wave climate at Discussion Paper | Discussion Paper | Discussion Paper |Acqua Alta (H s : inner hexagons color, T m : vectors' length, θ m : vectors' direction) including the frequency of occurrence (F : outer hexagons color) of each [H s , T m , θ m ] triplet.Hence, one can have an immediate sight on the wave climate features and on the experimental joint pdf thanks to the powerful visual properties of SOM's output.
s and θ m is retained by SOM.Sea states with the longest wave periods are clustered in the top-right corner of the map.The most severe sea states of the map are clustered in the top-right part of the map, but are limited to H s values smaller than 3.0 m.Indeed, the triplet with the highest H s produced by SOM is [2.75 m, 5.9 s, 270 • N].However, Table and diagrams in Sect. 2 have shown that at Acqua Alta H s can exceed 5.0 m.Therefore sea states with H s > 2.75 m are represented by BMUs with lower H s .This is clear in Fig. 5, where a sequence of observed events, including one with H s > 4.0 m, has been compared to the sequence reconstructed after SOM, i.e. for each sea state of the sequence the triplet assumes the values of the representative BMU.In Fig. 5 sea states with H s > 2.75 m are represented by BMU #118 (the one with the highest H s ), hence H s are limited to 2.75 m, whereas the peak of the most severe storm in figure has [4.46 m, 6.7 s, 275 • NDiscussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | cients, and lowest RMSE pertain to TSOM.Similar conclusions can be drawn for the pdfs, which are reproduced with very high CC (over 0.95) and RMSE pdf (below 0.04) by all the approaches, but to a greater extent by TSOM.As expected the most wide range variability among the different strategies concerns H s .With the only exception of OSD Discussion Paper | Discussion Paper | Discussion Paper |

[ 4 .
27 m, 6.32 s, 265 • N] and the maximum H s value is very close to the 99th percentile of H s of the new dataset, i.e. 4.28 m.Hence, 99 % of the stormy events are included within the represented range, resembling what observed for the original dataset and standard SOMDiscussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Program 2011-2015.The authors gratefully acknowledge Luigi Cavaleri for providing wave data at Acqua Alta tower and for the fruitful discussions.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

FFigure 3 .
Figure 3. Bivariate wave climate at Acqua Alta tower representing the joint (H s -T m ) experimental pdf (top-right panel); resolutions are ∆H s = 0.1 m and ∆T m = 0.2 s.Experimental marginal H s and T m pdfs at Acqua Alta tower (top-left and bottom-right panels, respectively).Solid lines in the top-right panel denote wave steepness 2πH s /g/T

Table 1
, points out that sea states at Acqua Alta have on average low intensity ( H s = 0.62 m, where − denotes mean), though occasionally can reach severe levels (max(H s ) = 5.23 m).Such severe events Introduction

Table 2 .
MDA-SOM, absolute errors of average and 99th percentile of H s relative to the original dataset (%).

Table 3 .
Performance summary of different SOM approaches, through the comparisons of reconstructed to original time series, and resulting to original pdfs.r av : time series ratio of averages, r SD : time-series ratio of standard deviations, CC: time series cross-correlation coefficient, RMSE: time series root mean square error, CC pdf : pdfs cross-correlation coefficient, RMSE pdf : pdfs root mean square error).