Wave extreme characterization using self-organizing maps

The self-organizing map (SOM) technique is considered and extended to assess the extremes of a multivariate sea wave climate at a site. The main purpose is to obtain a more complete representation of the sea states, including the most severe states that otherwise would be missed by a SOM. Indeed, it is commonly recognized, and herein confirmed, that a SOM is a good regressor of a sample if the frequency of events is high (e.g., for low/moderate sea states), while a SOM fails if the frequency is low (e.g., for the most severe sea states). Therefore, we have considered a trivariate wave climate (composed by significant wave height, mean wave period and mean wave direction) collected continuously at the Acqua Alta oceanographic tower (northern Adriatic Sea, Italy) during the period 1979–2008. Three different strategies derived by SOM have been tested in order to capture the most extreme events. The first contemplates a pre-processing of the input data set aimed at reducing redundancies; the second, based on the post-processing of SOM outputs, consists in a two-step SOM where the first step is applied to the original data set, and the second step is applied on the events exceeding a given threshold. A complete graphical representation of the outcomes of a two-step SOM is proposed. Results suggest that the post-processing strategy is more effective than the pre-processing one in order to represent the wave climate extremes. An application of the proposed two-step approach is also provided, showing that a proper representation of the extreme wave climate leads to enhanced quantification of, for instance, the alongshore component of the wave energy flux in shallow water. Finally, the third strategy focuses on the peaks of the storms.


Introduction
The assessment of wave conditions at sea is fruitful for many research fields in marine and atmospheric sciences and for human activities in the marine environment.In the past decades, the observational network (mostly relying on buoys, satellites and other probes) has been integrated with numerical model outputs allowing one to obtain the parameters of sea states over wider regions.Apart from the collection of wave parameters, the technique adopted to infer the wave climate at those sites is a crucial step in order to provide high-quality data and information to the community.In this context, several statistical techniques have been proposed to provide a reliable representation of the probability structure of wave parameters.While univariate and bivariate probability distribution functions (PDFs) are routinely derived, multivariate PDFs that represent the joint probability structure of more than two wave parameters are not straightforward.For individual waves, for instance, the bivariate joint PDF of wave height and period was derived by Longuet-Higgins (1983) and the bivariate joint PDF of wave height and direction was obtained by Isobe (1988).A trivariate joint PDF of wave height, wave period and direction is due to Kwon and Deguchi (1994).For sea states, attempts have been made to model the joint probability structure of the integral wave parameters.For instance, a joint PDF of the significant wave height and the average zero-crossing wave period was derived by Ochi (1978) and Mathisen and Bitner-Gregersen (1990).De Michele et al. (2007) exploited the "copula" statistical operators to describe the dependence among several random variables, e.g., significant wave hight, storm duration, storm direction and storm interarrival time, deriving their joint probability distributions.The same approach was F. Barbariol et al.: Wave extreme characterization using self-organizing maps applied by Masina et al. (2015) to the significant wave height and peak water level in the context of coastal flooding.
Recently, the self-organizing map (SOM) technique has been successfully applied to represent the multivariate wave climate around the Iberian Peninsula (Camus et al., 2011a, b) and the South American continent (Reguero et al., 2013).SOM (Kohonen, 2001) is an unsupervised neural network technique that classifies multivariate input data and projects them onto a uni-or bi-dimensional output space, called map.The SOM technique was originally developed in the 1980s, and has been largely applied in various fields, including oceanography (Liu et al., 2006;Solidoro et al., 2007;Morioka et al., 2010;Camus et al., 2011a;Falcieri et al., 2013).Typical applications of SOM are vector quantization, regression and clustering.SOMs gained credit among other techniques with same applications due to its visualization capabilities that allow one to get multi-dimensional information from a two-dimensional lattice.The SOM also has the advantages of unsupervised learning; therefore, vector quantization is performed autonomously.However, the quantization is strongly driven by the input data density.Indeed, the SOM is principally forced by the most frequent conditions, while the most rare (i.e., the extreme events) are often missed.Consequently, it is highly unlike to find extremes properly represented on a SOM.
In the context of ocean waves, drawing upon the works of Camus et al. (2011a, b) and Reguero et al. (2013), the SOM input is generally constituted by a set of wave parameters measured or simulated at a given location and evolving over the time t, e.g., the triplet composed by significant wave height H s (t), mean wave period T m (t) and mean wave direction θ m (t), even if other variables can be added (examples of five-or six-dimensional inputs can be found in Camus et al., 2011a).Several activities in the wave field could benefit from the SOM outcomes, such as selection of typical deep-water sea states for propagation towards the coast to study the longshore currents regime and coastal erosion, identification of typical sea states for wave energy resource assessment and wave farm optimization.In addition the empirical joint and marginal PDFs can be derived from SOMs.As accurately shown in Camus et al. (2011b), besides interesting potentials, especially in visualization, some drawbacks in using the SOM for wave analysis have emerged with respect to other classification techniques.Indeed, the largest H s are missed by SOMs because such extreme events are both rare (few comparisons in the "competitive" stage of the SOM learning) and distant from the others in the multi-dimensional space of input data (poorly influenced during the "cooperative" stage).
Moving from this evidence, the scientific question being asked is how can we employ SOM with its visualization capabilities to improve representation of the extremes of a multivariate wave climate at a location.To answer this question we have followed three different strategies.First, we have pre-preprocessed the SOM input data using the maximum-dissimilarity algorithm (MDA) in order to reduce the redundancies of the frequent low and moderate sea states, as done by Camus et al. (2011a).Indeed, MDA is a technique that reduces the density of inputs by preserving only the most representative (i.e., the most distant from each other in a Euclidean sense).Doing so, the most severe sea states are expected to gain weight in the learning process.We have called this strategy MDA-SOM.Then, we have focused on the postprocessing of the SOM outputs.In this context, we have applied a two-step SOM approach (herein called TSOM), by firstly running the SOM to get a reliable representation of the low/moderate (i.e., the most frequent) wave climate, and then by running a second SOM on a reduced input sample.This new sample has been obtained by taking from first-step SOM results the events exceeding a prescribed threshold (e.g., 97th percentile of H s ).To present results of two-step SOMs, we have proposed a double-sided map, showing on the left the SOM with the reliable representation of the low/moderate sea states, and on the right the map with the most severe sea states (i.e., the extremes).Then, we have applied a SOM to the peak of the storms individuated by means of a peak-overthreshold analysis (calling this strategy POT-SOM) and we have represented results using the double-sided map.An application of the proposed TSOM approach is finally reported: we have exploited the TSOM results to compute the longshore component of the wave energy flux, showing that a more proper representation of the extreme wave climate leads to an enhanced quantification of the energy approaching the shore.

Data
The data set employed for the SOM analysis consists of wave time series gathered at the Acqua Alta oceanographic tower, owned and operated by the Italian National Research Council -Institute of Marine Sciences (CNR-ISMAR).Acqua Alta is located in the northern Adriatic Sea (Italy, northern Mediterranean Sea), approximately 15 km off the Venice coast at 17 m depth (Fig. 1) and is a preferential site for marine observations (wind, wave, tide, physical and biogeochemical water properties are routinely retrieved), with a multi-parametermeasuring structure on board (Cavaleri, 2000) upgraded over the years.For this study, we have relied on a 30-year  data set of 3-hourly significant wave height H s , mean wave period T m and mean wave direction of propagation θ m (measured clockwise from the geographical north), observed using pressure transducers.Preliminarily, data have been preprocessed in order to remove occasional spikes.To this end, at first the time series have been treated with an ad hoc despiking algorithm (Goring and Nikora, 2002).The complete data set is therefore constituted of three variables and 50 503 sea states.
Basic statistics of the time series (Table 1) point out that sea states at Acqua Alta have on average low intensity ( H s = 0.62 m, where − denotes mean), though occa-  sionally they can reach severe levels: the most intense event (H s = 5.23 m, T m = 5.36 s, θ m = 242 • N) occurred on 9 December 1992 during a storm forced by winds coming from north-east.Such severe events are not frequent, as confirmed by the 99th percentile of H s , which is 2.68 m.Nevertheless they populate the wave time series at Acqua Alta and constitute the most interesting part of the sample, for instance for extreme analysis.Mean wave period is on average 4.1 s, while mean wave direction is 260 • N indeed most of the waves propagate towards the western quadrants.This is represented more in detail by the histogram representing the PDF of θ m (Fig. 2, bottom panel), which shows that the most frequent directions of propagation are indeed in the range 180 < θ m < 360 • N (western quadrants), with peaks at 247.5 and 315 • N. Directions associated with the most intense sea states (H s >4.5 m) can be obtained from the bivariate histogram (H s −θ m ) representing the joint PDF of H s and θ m (Fig. 2, top panel): 247.5, 270 and 315 • N. Mild sea states and calms (H s <1.5 H s , following Boccotti, 2000) are the most frequent conditions at Acqua Alta, with 80 % of occurrence during the 30 years of observations.They mainly propagate towards the western quadrants too, though the principal propagation directions of such seas states is north-west.In this context, the most frequent sea states at Acqua Alta are represented by {H s , θ m } = {0.25 m, 315 • N}.Storms in the area (denoted as sea states with H s ≥ 1.5 H s ) are generated by the dominant winds, i.e., the so-called Bora and Sirocco winds (Signell et al., 2005;Benetazzo et al., 2012).Bora is a gusty katabatic and fetch-limited wind that blows from north-east; it generates intense storms along the Ital- ian coast of Adriatic Sea characterized by relatively short and steep waves.Sirocco is a wet wind that blows from south-east; it is not fetch limited and it generates longer and less steep waves than Bora, which come from the southern part of the basin.Denoted conventionally as Bora the events with 180 ≤ θ m ≤ 270 • N, and as Sirocco the events with 270<θ m ≤ 360 • N, it follows that Bora storms have an F. Barbariol et al.: Wave extreme characterization using self-organizing maps occurrence of 12 % and Sirocco storms an occurrence of 8 %.The most frequent {H s , T m }, which occurred in the Bora and Sirocco quadrants, are shown in the bivariate (H s − T m ) histogram (Fig. 3) are {0.15m, 3.6 s} and {0.35 m, 3.8 s}, respectively, Sirocco being the most frequent among the two.The associated marginal histogram (Fig. 3) point out that Sirocco winds are responsible for most of the calms, in particular for sea states with H s <1 m, while Bora for the most energetic sea states.Nevertheless, the histogram of H s shows that Sirocco events with H s in the range of 4-5 m can occur as well as Bora events.Bora is also associated with the shortest period waves observed: indeed, the histograms of T m almost coincide for waves shorter than 5.5 s, while for longer waves the probability level of Bora mean periods abruptly drops to values much smaller than those of Sirocco (which remains to non-negligible levels until 9 s).The consequence of shorter and higher Bora waves, with respect to Sirocco, is steeper waves (3 % against 2 % on average, respectively).

Theoretical background
In this section, we recall SOM features that are functional to the study.For more comprehensive readings we refer to Kohonen (2001) and other references cited in the following.
The SOM is an unsupervised neural network technique that classifies multivariate input data and projects them onto a uni-or bi-dimensional output space, called map.Typically a bi-dimensional lattice is produced as output map.The global structure of the lattice is defined by the map shape that can be sheet, cylindrical or toroidal.The local structure of the lattice is defined by the shape of the elements, called units, that are typically either rectangular or hexagonal.The output map produced by a SOM on wave input data (e.g., as in Camus et al., 2011a) furnishes an immediate picture of the multivariate wave climate and allows one to identify, among others, the most frequent sea states along with their significant wave height, mean direction of propagation and mean period.
The core of SOM is represented by the learning stage.Therefore, the choice of functions and parameters that control learning is crucial to obtain reliable maps.In SOM, the classification of input data is performed by means of competitive-cooperative learning: at each iteration, the elements of the output units compete among themselves to be the winning or best-matching units (BMUs), i.e., the closest to the input data according to a prescribed metric (competitive stage), and they organize themselves due to lateral inhibition connections (cooperative stage).Usually, given that the chosen metric is a Euclidean distance, inputs have to be normalized before learning (e.g., by imposing unit variance or [0, 1] range for all the input variables) and de-normalized once finished.The lateral inhibition among the map units is based upon the map topology and upon a neighboring func- m (3 % for Bora, 2 % for Sirocco, g being gravitational acceleration), red solid lines denote wave breaking limit (7 %).Resolutions are H s = 0.2 m and T m = 0.2 s. tion that expresses how much a BMU affects the neighboring ones at each step of the learning process.During the learning process, the neighboring function reduces its domain of influence according to the decrease of a radius, from an initial to a final user-defined value.Learning can be performed sequentially, i.e., presenting the input data one at a time to the map, as done by the original incremental SOM algorithm.A more recent algorithm performs a batchwise learning, presenting the input data set all at once to the map (Kohonen et al., 2009).While the sequential algorithm requires the accurate choice of a learning rate function, which decreases during the process, the batch algorithm does not.At the beginning of the learning stage, the map has to be initialized: randomly or preferably as an ordered two-dimensional sequence of vectors obtained from the eigenvalues and eigenvectors of the covariance matrix of the data.In both SOM algorithms the learning process is performed over a prescribed number of iterations that should lead to an asymptotic equilibrium.Even if Kohonen (2001) argued that convergence is not a problem in practice, the convergence of the learning process to an optimal solution is however an unsolved issue (convergence has been formally proved only for the univariate case, Yin, 2008).The reason is that, unlike other neural network techniques, a SOM does not perform a gradient descent along a cost function that has to be minimized (Yin, 2008).Hence, in order to achieve reliable maps, the degree of optimality has to be assessed in other ways, e.g., by means of specific error met-rics.The most common ones are the mean quantization error and the topographic error (Kohonen, 2001).The former is the average of the Euclidean distances between each input data and its BMUs, and is a measure of the goodness of the map in representing the input.The latter is the percentage of input data that have first and second best matching units adjacent in the map and is a measure of the topological preservation of the map.

SOM setup
In this paper, the SOM technique has been applied by means of the SOM toolbox for MATLAB (Vesanto et al., 2000) that allows for most of the standard SOM capabilities, including pre-and post-processing tools.Among the techniques available, we have chosen the batch algorithm because together with a linear initialization it permits repeatable analyses; i.e., several SOM runs with the same parameters produce the same result (Kohonen et al., 2009).This is not a general feature of SOM, as the non-univoque character of both random initialization and selection of the data in the sequential algorithm lead to always different, though consistent, SOMs (Kohonen, 2001).
Parameters controlling the SOM topology and batchlearning have been accurately examined and their values have been chosen as the result of a sensitivity analysis aimed at attaining the lowest mean quantization and topographic errors.Therefore, we have chosen bi-dimensional squared SOM outputs that are sheet shaped and with hexagonal cells.This kind of topology has been preferred to others (e.g., rectangular lattice, toroidal shape, rectangular cells) because the maps produced this way had the best topological preservation (low topographic error) and visual appearance.The map's size is 13 × 13 (169 cells); hence, each cell represents approximately 300 sea states on average, if the complete data set is considered.The lateral inhibition among the map units is provided by a cut-Gaussian neighborhood function that ensures a certain stiffness to the map (Kohonen, 2001) during the batch learning process (1000 iterations).At the same time, to allow the map to widely span the data set, the neighborhood radius has been set to 7 at the beginning, i.e., more than half the size of the map, and then it linearly decreased to 1 during a single phase learning process.
Input data have been normalized so that the minimum and maximum distance between two realizations of a variable are 0 and 1, respectively.To this end, according to Camus et al. (2011a), the following normalizations have been used: Therefore, H and T range in [0, 1], while θ ranges in [0, 2].
To take into account the circular character of θ m in distance {H i , T i , θ i } and SOM unit {H j , T j , θ j } is defined as (2) The Euclidean-circular distance has been therefore implemented in the scripts of SOM toolbox for MATLAB where distance is calculated.

SOM strategies to characterize wave extremes
In this section, results of the standard SOM approach (applied one time, hence called single-step SOM) and results of the different strategies proposed to improve extremes representation are presented.The performances of a single-step SOM, MDA-SOM and TSOM are assessed by comparing the wave parameters time series and their empirical marginal PDFs to the time series reconstructed from the results of the different strategies and relative PDFs, respectively.POT-SOM is treated separately because a direct comparison with the other strategies using the described methods is not possible.

Single-step SOM
A single-step SOM has been applied using the setup illustrated in Sect.3.2.The SOM output in Fig. 4  According to the map, the most frequent sea states are represented by the triplet {0.17 m, 3.5 s, 323 • N}, which substantially resembles the information that one could have gather from the bivariate (H s −T m ) and (H s −θ m ) histograms (Fig. 3), though these are not formally related to one another.Most cells show wave propagation directions pointing towards the western quadrants, as also displayed in the joint and marginal histograms of θ m (Fig. 2).The cells denoting sea states forced by land winds (pointing toward east) are clustered in the top-left corner of the map and have low frequencies of occurrence (individual and cumulated).The frequency of occurrence of calms is 80 %, while that of Bora storms is 12 % and that of Sirocco storms is 8 % (using definition of calms, Bora and Sirocco storm events given in Sect.2).Hence, the integral distribution of the observed events over H s and θ m is retained by SOMs.Sea states with the longest wave periods are clustered in the top-right corner of the map.
The most severe sea states of the map are clustered in the top-right part of the map, but are limited to H s values smaller than 2.75 m.Indeed, the triplet with the highest H s produced by the SOM is {2.75 m, 5.9 s, 270 • N}.However, Tables and histograms in Sect. 2 have shown that H s can exceed 5.0 m at Acqua Alta.Therefore, sea states with H s >2.75 m are represented by cells with lower H s .This is clear in Fig. 5, where a sequence of observed events, including one with H s >4.0 m, has been compared to the sequence reconstructed after SOM; i.e., for each sea state of the sequence the triplet assumes the values of the corresponding BMUs.In Fig. 5 sea states with H s >2.75 m are represented by the cell with the highest H s , i.e., cell no.118 (first row, 10th column, assuming the cells numbering starts at the top-left cell and proceeds from top to bottom over map rows and then from left to right over map columns); hence, H s is limited to 2.75 m, whereas the peak of the most severe storm in Fig. 5 has {4.46 m, 6.7 s, 275 • N}.Quantitatively, for this particular event, single-step SOM underestimates the peak of 32 % H s , 12 % T m and 2 % θ m .Although H s appears to be the most affected (T m and θ m after a SOM are in better agreement with the original data), all the variables processed by SOM experience a tightening of the original ranges of variation as it is shown in Fig. 6 displaying the marginal empirical PDFs of H s , T m and θ m after SOM.Generally, PDFs provided by SOMs are in good agreement with the original ones.However, the range of variation of H s is reduced from [0.05, 5.23] to [0.17, 2.75] m, the range of T m from [0.5, 10.1] to [2.4, 7.4] s, and the range of θ m from [0, 360] to [41,323] • N. The maximum H s value given by SOM (2.75 m) is pretty close to the 99th percentile value (2.68 m), pointing out that SOM provides a good representation of the wave climate up to the 99th percentile approximately.Nevertheless, the remaining 1 % of events not prop- erly described (extending up to 5.23 m) is for some applications the most interesting part of the sample.This confirms that a single-step SOM provides an incomplete representation of the wave climate.

Maximum-dissimilarity algorithm and SOM (MDA-SOM)
In order to reduce redundancy in the input data and to enable a wider variety of represented sea states, in previous studies (e.g., Camus et al., 2011a) authors applied the MDA before the SOM process.In doing so, a new set of input data for a SOM is constituted by sampling the original data in a way that the chosen sea states have the maximum dissimilarity (herein assumed as the Euclidean-circular distance) one from each other.As a result of MDA, a reduction of the number of sea states with low/moderate H s , i.e., the most frequent at Acqua Alta, is observed.Hence, MDA-SOM is expected to provide a better description of the extreme sea states.Nevertheless, as pointed out by Camus et al. (2011a) the reduction of the sample numerosity leads to lower errors in the 99th percentile of H s (chosen to represent extremes) but also to higher errors in the average of H s .Therefore, in terms of percentage reduction of the original input data set, an optimum balance has to be found in order to get good descriptions of the average and of the extreme wave climate.
In the MDA-SOM application, we have pre-processed the input data set by applying MDA, as described in detail in Camus et al. (2011a).Looking for the best reduction coefficient, the original data set has been reduced by means of MDA from the initial 50 503 sea states (100 %) to 5050 (10 %), with step 10 %.The absolute errors on H s and on the 99th percentile of H s after MDA-SOM, relative to the original data set, are summarized in Table 2.The error on H s , initially 2 %, monotonically increases up to 57 %, while the error on the 99th percentile of H s , initially 9 %, decreases down to 3 % at 50-60 % and then increase up to 27 %.With the widening of the variables' range as principal target (hence a better description of extremes) but without losing the quality on the average climate description, we chose to consider 80 % reduction (7 % error on H s , 4 % error on 99th percentile H s ).The corresponding MDA-SOM output displayed in Fig. 7 is topologically equivalent to that produced by the single-step SOM (Fig. 4), except for minor differences on the location of some sea states.However, the most frequent sea state has {H s , T m , θ m } = {0.28m, 2.8 s, 328 • N}, which still resembles what has emerged from histograms of Sect.2, even if T m is less in agreement with respect to the singlestep SOM.Also, the sea state with highest H s has the triplet equal to {2.8 m, 6.0 s, 275 • N}; hence, even if the input data set has been reduced, the representation of extremes is still unsatisfactory.This is confirmed by the comparison of the original and the reconstructed (after MDA-SOM) time series.In Fig. 8, the comparison has been extended to the results of 60 % MDA-SOM (smaller error on 99th percentile H s , see Table 2) and 10 % MDA-SOM (maximum input data set reduction), in order to investigate if MDA-SOM can enhance extreme wave climate representation even accepting a worsening of the average one.Actually, 60 % MDA-SOM performs only slightly better than 80 % MDA-SOM in describing the chosen events; indeed the highest H s triplet, which represents the sea states at the peak of the most severe storm, is {2.93 m, 5.8 s, 258 The marginal empirical PDFs after MDA-SOM are compared in Fig. 9 to the PDFs of the original data set.The distributions are in good agreement and the representation is more complete with respect to the single-step SOM, especially concerning H s .Nevertheless, 10 % MDA-SOM distribution for H s exhibits a larger departure from the original distribution at 1.7 m with respect to the single-step SOM.Also 10 % MDA-SOM distributions, which provides the widest ranges, locally depart from the reference distributions, in particular for T m and θ m .The frequency of occurrence of calms is 81 %, while that of Bora storms is 12 % and that of Sirocco storms is 7 %.Hence, except for a minor change in the frequency of calms and Sirocco events, the overall statistics resembles that one directly derived from the Acqua Alta data set.

Two-step SOM (TSOM)
A TSOM has been then applied to provide a more complete description of the wave climate at Acqua Alta.To this end, the SOM algorithm has been run a first time on the original data set, without reductions (first step).Then, outputs have been post-processed: a threshold H * s has been fixed, and the cells having H s >H * s have been considered to constitute a new input data set that is composed of the sea states represented by the cells exceeding the threshold.Hence, a second SOM has been run on the new data set (second step).Using the same SOM setup as in the first step, we have obtained a two-sided map (Fig. 10): the first map (left panel) provides a good representation of the low/moderate wave climate but fails in the description of the most severe sea states, which are described in the second map (right panel), focusing on the climate over H * s .Three thresholds have been tested that correspond to the 95th, 97th and 99th percentile of H s : 1.80, 2.12 and 2.68 m, respectively.In the following, we have focused on the results with 97th percentile threshold, since they have turned out to be more representative of the extreme wave climate than the others.
Figure 10 depicts TSOM results with H * s = 2.12 m (97th percentile).The first map, on the left, is the map shown in Fig. 4, representing the whole wave climate at Acqua Alta.On that map, the six cells with H s >2.12 m have been encompassed by a black Without such cells, the map on the left represents the low/moderate sea states, i.e., the 97 % of the whole original data set constituted by events with H s below or equal to the 2.12 m threshold.The remaining 3 % of events, represented by the encompassed cells, are the most severe events at Acqua Alta.The first step SOM associates to such events 2.12 ≤ H s ≤ 2.75 m, 5.0 ≤ T m ≤ 6.5 s and 249 ≤ θ m ≤ 299 • N. Hence, according to SOMs, the most severe sea states pertain to a rather narrow directional sector (50 • ) hardly allowing one to discriminate between Bora and Sirocco conditions.A more detailed representation of such extremes is provided by the second map in Fig. 10, on the right, where extreme Bora and Sirocco events are more widely described by cells.Indeed, a sort of diagonal (from the top-right corner to the bottom-left corner of the map) divides the cells.Bora events are clustered on the left of this diagonal (top-left part of the map), while Sirocco ones on the right of that (bottom-right part of the map).On the diagonal, cells represent sea states that travel towards the west.This configuration somehow resembles the one observed in the left map, except for the land sea states, in the top-left corner.The most severe sea states are clustered in the topright corner of the map and also, though to a smaller extent, in the bottom-left part of it.The resulting ranges of H s , T m and θ m are 1.94 ≤ H s ≤ 4.26 m, 4.4 ≤ T m ≤ 8.3 s and 224 ≤ θ m ≤ 316 • N, respectively.
The widened ranges of wave parameters provided by a TSOM allow for a more complete description of the sea states at Acqua Alta, including the most severe as it is shown in Fig. 11.There, for the sequence of events presented in previous sections, the reconstructed TSOM time series is compared to the original one.Also results with 95th and 99th percentile TSOMs are plotted, and it clearly appears that the differences among the three tests (i.e., TSOMs with H s threshold on 95th, 97th and 99th percentiles) are very small, in particular for what concerns θ m .Nevertheless, the 95th percentile TSOM yields a smaller estimate of the highest H s peak with respect to the others, and the 99th percentile TSOM deviates more than the others from the original T m .
Such differences are also found in the marginal empirical PDFs of the wave parameters, shown in Fig. 12.Indeed, p(H s ) and p(T m ) locally differ among the three thresholds and also from the original PDF, in particular in the largest values of H s and T m .As expected, the more the threshold is high, the more H s range widens, extending to higher values.Hence, the 99th percentile TSOM provides the more complete representation of the wave climate, at least concerning H s .Indeed, the widest T m range is obtained with 97th percentile and the narrowest with a 99th percentile TSOM.Instead, p(θ m ) is equally represented by the three thresholds and is in excellent agreement with the original PDF, though the θ m range is limited with the respect to the complete circle.In addition, local departure from the original PDFs are still observed, especially for H s and T m .The frequency of occurrence of calms is 81 %, while that of Bora storms is 11 % and that of Sirocco storms is 8 %.Hence, except for a minor change in the frequency of calms and Bora events, the overall statistics resembles that one observed at Acqua Alta.

Peak-over-threshold SOM (POT-SOM)
As an additional strategy to provide a more complete representation of the wave climate through SOMs, we tested a third different approach.A SOM was applied initially on the whole data set, and then on the peaks of the storms defined by means of peak-over-threshold technique.Storms were identified according to the definition of Boccotti (2000): a storm is the sequence of H s that remains at least 12 h over a given threshold H * s corresponding to 1.5 times the mean H s .We    1) and then, with H * s = 0.93 m, we individuated 710 storms.The peaks of the storms constitute a new data set that has been analyzed by means of a SOM.At the end, we have obtained a doublesided map that represent at the same time the whole wave climate (on the left) and the "stormy" part of it (on the right).
POT-SOM output map is shown in Fig. 13.As expected, stormy events are Bora and Sirocco events: the former are clustered on the upper and middle part of the map, the latter in the lower part of it.The most severe storms, concentrated on the right side of the map, are both Bora and Sirocco events.The triplet with the highest H s is {4.27 m, 6.32 s, 265 • N} and the maximum H s value is very close to the 99th percentile of H s of the new data set, i.e., 4.28 m.Hence, 99 % of the stormy events are included within the represented range, resembling what was observed for the original data set analyzed with a single-step SOM.

Discussion
A summary of the performances of the different SOM strategies is given in Table 3.There the single-step SOM, MDA-SOM with 80 % reduction and the TSOM with H s threshold at 97th percentile are compared in their capabilities of representing the wave climate at Acqua Alta by means of the cells.The POT-SOM is not directly comparable to the other strategies since the data set used for the second map is composed of the storm peaks only.As done in the previous sections, the performances are assessed by comparing the reconstructed time series from each strategy with the original ones, and the resulting marginal PDFs with PDFs of the original data.However, here the performances are quantified by statistical parameters (see caption of Table 3 for nomenclature).Generally, the reconstructed time series are in agreement with the original ones, as shown by the high r av (over 0.98) and r SD (over 0.89), as well as high CC (over 0.95) and low RMSE (below 0.19 m for H s , 0.37 s for T m and 23 • for θ m ).Nevertheless, the highest ratios and correlation coefficients, and the lowest RMSE pertain to TSOMs.Similar conclusions can be drawn for the PDFs, which are reproduced with very high CC (over 0.95) and RMSE PDF (below 0.04) by all the approaches, but to a greater extent by TSOMs.As expected, the most wide range variability among the different strategies concerns H s .With the only exception of θ m , whose widest range is provided by MDA-SOM, TSOM turned out to be the most efficient in providing the most complete representation among the tested strategies.
We verified that a higher size single-step SOM (e.g., 25 × 25, not shown here) can produce a wider range of extremes with respect to that used in the study (i.e., 13 × 13): the units' maximum H s is 3.56 m instead of 2.75 m.In the same map configuration (i.e., 25 × 25), MDA preselection can further widen this range towards extremes: 3.63 m, the units' maximum H s obtained with an 80 % reduction of the sample (using MDA); 3.66 m, the units' maximum H s with a 40 % reduction.This has the effect of reducing the absolute error on 99th percentile of H s (1 % with 80 % reduction and 11 % with 40 % reduction).However, the most extreme sea states are still far from being properly represented (we recall that the most extreme sea state observed had H s = 5.23 m).In addition and most importantly, if a larger number of elements in the map can improve the SOM performance shown in the paper, it will certainly worsen the readability of the map and the possibility of extracting quantitative information from the map.Indeed, considering, for instance, the 25 × 25 map, sea states at a site would be represented by 625 typical sea states: a huge number that is hardly manageable for a practical classification of the wave conditions.

Application of TSOM
An application of the TSOM is proposed to show that a more detailed representation of the extreme wave climate can enhance the quantification of the longshore component of the shallow-water wave energy flux P (per unit shore length), expressed as (Komar and Inman, 1970) where E = ρgH 2 s /16 is the wave energy per unit crest length (being ρ the water density), c g is the group celerity and α is the mean wave propagation direction measured counterclockwise from the normal to the shoreline.P is a driving factor for the potential longshore transport, and its dependence upon the wave energy E (which in turn depends on the square of H s ) suggests that an accurate representation of H s is crucial.As the shoreline in front of Acqua Alta tower is almost parallel to the 20 • N direction (i.e., orthogonal to the 290 • N direction), the longshore transport is directed towards southwest when P is positive, and directed towards northeast when P is negative.Given the wave energy flux Ec g , P is maximized when α = ±45 • N, which correspond to θ m = 245 • N and θ m = 335 • N, respectively.
In order to obtain the shallow-water values of wave parameters, following Reguero et al. (2013), we propagated the Acqua Alta sea state resulting from the TSOM (see maps in Fig. 10) from 17 to 5 m depth (a typical closure depth in the region), approximately accounting for the wave transformations, i.e., shoaling, refraction and wave breaking.In doing so, we assumed straight and parallel bottom contour lines, we neglected wave energy dissipation prior to wave breaking, and we allowed H s to reach the 80 % of the water depth at most (depth-induced wave breaking criterion).Roughly, shoaling mostly affects the Sirocco sea states that are typically associated with longer wavelengths with respect to Bora sea states.In shallow water, refraction tends to reduce the difference between Bora and Sirocco directions with respect to Acqua Alta, as the normal direction to the shoreline, which waves tend to align to, is very close to the boundary (i.e., 270 • N), which we assumed in order to discriminate between the two conditions.Sea states forced by land winds (20 • N < θ m <200 • N) were excluded from the analysis.
The longshore component of the wave energy flux P at 5 m depth is shown in Fig. 14.It is worth noting that the left map represents the longshore component of the wave energy flux resulting from the single-step SOM technique (e.g., the left panel of Fig. 10).Here, P ranges between −2 and 8 kW m −1 , and the highest values are mainly due to Bora events that are responsible for potential longshore transport towards southwest (even if few Sirocco events with θ m close to 270 • N have the same effect).According to the left map, the transport towards northeast is due to Sirocco events that, however, cause less intense potential transport.The highest P values are associated with the highest H s events, clustered on the cells at the top of the Fig. 10 left map.The right map of Fig. 14 describes the longshore flux component due to the Acqua Alta sea states represented by the SOM cells exceeding the 97th percentile H s threshold (i.e., the six cells bounded by the black line in the left map).The range of P variation widens considerably when the extreme sea states are considered, with values ranging from −20 to 20 kW m −1 .As observed in the right map of Fig. 10, the sea states exceeding the 97th percentile threshold on H s are Bora and Sirocco events.The Bora events in the top-left part of the map (except for two cells in the bottom-right corner) contribute to positive, i.e., south-westward, transport, while Sirocco events in the bottom-right part contribute to negative, i.e., north-eastward, transport.The most intense transport is associated with the highest H s cells at the bottom-left, bottom-right and top-right corners of the Fig. 10 right map.The major difference with respect to a single-step SOM estimate concerns the Sirocco sea states, associated with negative P , that had the most intense value extended from −2 to −20 kW m −1 .
The mean longshore wave energy flux in shallow-water P , i.e., the average of P weighted on the frequencies of occurrence F over the 30 years of observations, was obtained by  taking the absolute value of P from the maps of Fig. 14 and is 0.57 kW m −1 (Table 4).In order to support this estimate, we compared the 1.71 kW m −1 estimate of the mean wave energy flux Ec g at Acqua Alta against the 1.5 kW m −1 value obtained at the same site over 1996-2011 by Barbariol et al. (2013).The contributions to P from Bora (P + ) and Sirocco (P − ) are 0.45 and −0.12 kW m −1 , respectively, pointing out the predominant effect of Bora on the longshore transport over the western side of the Gulf of Venice.For comparison, P was also computed using single-step SOM results (see Table 4): in this case, P is 0.52 kW m −1 , P + is 0.41 kW m −1 and P − is −0.11 kW m −1 .Hence, with respect to the TSOM, the estimate of the mean longshore energy flux is 9.0 % lower for P , 7.5 % lower for P + and 16.5 % lower for P − .

Conclusions
In this paper, we have tested different strategies aimed at improving the characterization of multivariate wave climate using SOM.Indeed, we have verified that besides a satisfactory description of the low/moderate wave climate (in agreement with usual uni-and bivariate histograms), the single-step SOM approach misses the most severe sea states, which are hidden in SOM cells with H s even considerably smaller than the extreme ones.
For our purpose, we used the 1979-2008 trivariate wave climate {H s , T m , and θ m } recorded at Acqua Alta tower, and we showed that, for instance, the single-step SOM assigned most of the sea states with H s >2.75 m to the {2.75 m, 5.9 s, 270 • N} class.Hence, the most interesting part of the wave climate was condensed within a few cells of the map, also hindering the distinction between Bora and Sirocco events, i.e., the prevailing meteorological conditions in the northern Adriatic Sea.To increase the weight of the most severe and rare events in SOM classification, we tested a strategy based on the pre-processing of the input data set (i.e., MDA-SOM) and a strategy based on the post-processing of the SOM outputs (i.e., TSOM).Results presented in the study showed that the post-processing technique is more effective than the preprocessing one.Indeed, a TSOM allowed a more accurate and complete representation of the sea states with respect to the one furnished by MDA-SOM, because it provided a wider range of the wave parameters (particularly H s ), and more reliable a posteriori reconstructions of time series and empirical marginal PDFs.Nevertheless, some deviations from original PDFs were observed and the range of θ m was not complete, such that sea states traveling towards the north were not properly described.This requires further studies to improve SOM applications to wave analysis, which are rather promising, thanks to the well recognized visualization capabilities of SOMs.In this context, we proposed a double-sided map representation, which provides (on the left) a description of the whole wave climate that is particularly reliable for the low/moderate events and is completed (on the right) by the description of the extreme wave climate.This novel representation was also employed to provide a SOM classifi-cation of the storms peaks, based on the peak-over-threshold approach, on the right (POT-SOM).
Finally, a TSOM was applied for the assessment of the potential longshore wave energy flux to show how practical oceanographic and engineering applications can benefit from this novel SOM strategy.Indeed, the mean flux in front of the Venice coast was found to be 9 % higher if evaluated after a TSOM instead of a SOM.

Figure 1 .
Figure 1.Acqua Alta (AA) oceanographic tower location in the northern Adriatic Sea, Italy (left panel).The tower is depicted in the right panel.

Figure 2 .
Figure 2. Observed bivariate wave climate at Acqua Alta: histograms representing the joint PDF of H s and θ m (top panel) and the marginal PDF of θ m (bottom panel).Resolutions are H s = 0.5 m and θ m = 22.5 • .

Figure 3 .
Figure 3. Observed bivariate wave climate at Acqua Alta: histograms representing the joint PDFs of H s and T m for Bora (topleft panel) and Sirocco (top-right panel) sea states and the corresponding marginal PDFs of H s (bottom-left panel; blue for Bora, red for Sirocco) and T m (bottom-right panels; blue for Bora, red for Sirocco).Black solid lines in the top panels denote average wave steepness 2πH s /g/T 2m (3 % for Bora, 2 % for Sirocco, g being gravitational acceleration), red solid lines denote wave breaking limit (7 %).Resolutions are H s = 0.2 m and T m = 0.2 s.

Figure 5 .Figure 6 .
Figure 5. Single-step SOM: BMU cells (top panel) and comparison between original (blue solid lines) and reconstructed (red dashed lines) time series of H s (central-top panel), T m (central-bottom panel) and θ m (bottom panel), for a chosen sequence of events.

Figure 9 .
Figure 9. MDA-SOM: comparison between original (black solid lines) and resulting histograms representing the PDFs of H s (top panel), T m (central panel) and θ m (bottom panel), for the whole period of observations.Data set reduction: 80 % (blue dashed-squares line), 60 % (red dashed-circles line) and 10 % (green dashed line).

Figure 10 .Figure 11 .
Figure 10.TSOM output map with threshold H * s = 2.12 m (97th percentile of H s ).H s : inner hexagons' color, T m : vectors' length, θ m : vectors' direction, F : outer hexagons' color.Wave climate after a single-step SOM (left panel) and TSOM extreme wave climate (i.e., over the threshold, right panel and cells within black solid line in the left panel).For the right panel map, mean quantization error: 0.04; topographic error: 6 %.

Figure 12 .
Figure 12.TSOM: comparison of original (black solid line) and resulting histograms representing the PDFs of H s (top panel), T m (central panel) and θ m (bottom panel), for the whole data set.Thresholds: 95th (blue dashed-squares line), 97th (red dashedcircles line) and 99th (green dashed line) percentile of H s .

Figure 14 .
Figure 14.Application of TSOM: assessment of the longshore flux of wave energy P in shallow water, after single-step SOM (left panel) and resulting from the TSOM extreme wave climate (right panel and cells within black solid line in the left panel).Mean wave directions at Acqua Alta tower (blue arrows) indicate contributions of different meteorological conditions: positive mainly due to Bora (180 ≤ θ m ≤ 270 • N), negative to Sirocco (270<θ m ≤ 360 • N).Land wind events (white cells) have been excluded, and the direction of the shoreline (270 • N) is shown as gray dashed lines.

Table 4 .
Application of TSOM: assessment of the longshore flux of wave energy in shallow-water P .P is the mean over the 1979-2008 period accounting for the absolute value of P , P + is the mean of the positive P , and P − is the mean of the negative P , TSOM−SOM is the relative difference of values computed after TSOM with respect to values computed after SOM.SOM (kW m −1 ) TSOM (kW m −1 )

www.ocean-sci.net/12/403/2016/ Ocean Sci., 12, 403-415, 2016 F. Barbariol et al.: Wave extreme characterization using self-organizing maps SOM
merges all the information about the trivariate wave climate at Acqua Alta (H s : inner hexagons' color, T m : vectors' length, θ m : vectors' direction) including the frequency of occurrence (F : outer hexagons' color) of each {H s , T m , θ m } triplet.Hence, one can have an immediate sight on the wave climate features and on the empirical joint PDF thanks to visual capabilities of 's output.Gradual and continuous change in wave parameters over the cells points out that the topological preservation is quite good, as confirmed by the 22 % topographic error.

Table 2 .
MDA-SOM: absolute errors of average and 99th percentile of H s after MDA-SOM relative to the original data set (%).

415, 2016 F. Barbariol et al.: Wave extreme characterization using self-organizing maps provided
by 10 % MDA-SOM, though the maximum is however missed and in its proximity the original data are overestimated.Indeed, 60 % and 10 % MDA-SOMs locally overestimate H s in the low/moderate sea states.

Table 3 .
Performance summary of different SOM approaches, through the comparisons of reconstructed to original time series, and resulting to original PDFs.r av : ratio of time series averages, r SD : ratio of time series standard deviations, CC: time series cross-correlation coefficient, RMSE: time series root mean square error, CC PDF : PDFs cross-correlation coefficient, RMSE PDF : PDFs root mean square error).