Native and non-native aquatic plants of South America : comparing and integrating GBIF records with literature data

The Global Biodiversity Information Facility (GBIF) is at the moment one of the largest and most widely used biodiversity databases. Nevertheless, there are still some limitations, e.g. in terms of plant species status (native vs. non-native) and geographic resolution of records. At the same time, it is well known that alien plant invasions in inland freshwaters can alter community structure, ecosystem functions and services with significant negative impacts on biodiversity and human activities. We assessed if the GBIF database has a geospatial homogeneous information for native and non-native aquatic plant species for South America and whether or not literature resources not yet digitalized (floras, checklists and other papers) could provide additional information. We selected a set of 40 native and 40 non-native aquatic species. These 80 species included a sub-set of 40 alien species previously evaluated with the USAqWRA scheme (US Aquatic Weed Risk Assessment). Species with non-reliable identification, duplicates of the same collection, records poorly georeferenced were removed from the dataset. New records were manually compiled through classical literature research. All the georeferenced records (GBIF + literature) were used for the mapping and the comparative analysis. As a result, we can conclude that the two datasets provide quite significantly different information and the combination of the two offers new information that would not exist in a single data source. Nevertheless, a careful quality evaluation of the primary information, both in the case of literature and GBIF should be conducted, before the data is used for further analyses.


Introduction
The Global Biodiversity Information Facility (GBIF, http://www.gbif.org/) is one of the largest and most widely used biodiversity database (Jetz et al. 2012;Beck et al. 2014), and it offers freely and universally primary biodiversity data (Roberts and Moritz 2011).This kind of information, together with tools to analyze it (e.g.Geographic Information Systems software, and statistical analysis packages), has facilitated large-scale analyses and interpretation of biodiversity and distribution data (García-Roselló et al. 2015;Maldonado et al. 2015) for both native and non-native plant species.
Most data on plant species distribution are stored in different sources, including checklists, herbaria, floras and field observations, and are based on point occurrence records, representing what is generally referred as the primary distribution data, i.e., the occurrence of a particular plant species at a particular location at a particular point in time (Soberón and Peterson 2004).Millions of these records from herbaria and other sources have been mobilised via international data-sharing networks and databases (Edwards et al. 2000) although there might be constraints to a generalised use of this occurrence data due to its coverage (or thematic resolution) and level of accuracy.
Coverage has several components and subcomponents, but three main aspects are most commonly considered.The first one is the "taxonomic coverage", i.e., how many of the existing species and valid lower taxa are well documented (Funk et al. 1999;Hortal et al. 2007;Brummitt et al. 2015) and how frequently the taxonomic and nomenclature resolution and precision of the database is updated and cross-checked.The second is the "geographical coverage", i.e., how precisely and completely species locations and resulting ranges are documented within records (Feeley and Silman 2011).Finally, the "temporal coverage", i.e., the time resolution of the database, based, e.g. on more or less continuous recording of species through time (Brummitt et al. 2015) and on the verification of the persistency along time of a given species in one historical locality (Troia et al. 2016).However, gaps and biases usually exist in the available biodiversity information (Boakes et al. 2010;Feeley and Silman 2011;Sousa-Baena et al. 2014;García-Roselló et al. 2015) and data limitations may occur as a result of an inadequate financial and institutional support (Vollmar et al. 2010;Amano and Sutherland 2013) or of different sampling efforts focusing more on regions with certain appeal like endemism, species richness or protected areas (Petřík et al. 2010;Yang et al. 2014).Petřík et al. (2010), showed that the bias in grid mapping of flora seems to be dependent on spatial scale.In addition, the number of botanists involved and duration of the study are associated with some level of bias in estimates of species richness.
GIBF records are widely used in ecology, evolution and conservation (Meyer et al. 2015) and have been used for many different purposes, e.g. to identify native ranges of invasive alien species or species climatic and environmental requirements (Peterson 2003;Suarez and Tsutsui 2004;Chapman 2005;García-Roselló et al. 2015).
In addition to these well-known problems, specific limitations in the GBIF database become evident when one wants to use it as a tool for the analysis of plant invasions, e.g.concerning the plant species status (alien vs. native; casual vs. naturalized vs. invasive; archaeophyte vs. neophyte), because the invasive status and residence time cannot be inferred for most of the records.GBIF created a dedicated working group to address the enhancement of the system to be used in the field of biological invasions (McGeoch et al. 2016).In fact, it has been remarked that the information available in GBIF has been in some cases used to rapidly assess patterns of diversity and allodiversity, without much attention being paid to the quality and reliability of the data (García-Roselló et al. 2015;Maldonado et al. 2015).
However, GBIF distribution data for alien plants, after expert review, may help in identifying not only highly invaded areas, but also the overall distribution, in predicting locations susceptible to further establishment (Duursma et al. 2013) and in identifying areas that are at greatest risk from future invasions.The knowledge of the spatial distribution of invasive species and invaded habitats is one of the pillars supporting an effective strategy for their management and control (Thuiller et al. 2005).In this concern, the identification of invasive species risk hotspots is a useful tool to prioritize management of plant invasions at large scale (Liang et al. 2014;Adhikari et al. 2015).
Among invaded ecosystems, freshwater ecosystems and habitats, especially lakes and streams are particularly vulnerable (Strayer 2010;Simberloff 2013;Boltovskoy and Correa 2015;Brundu 2015) and prone to dramatic biodiversity loss (Ricciardi and Rasmussen 1999) because of their high concentration of species per surface area (Thomaz et al. 2015).Nutrients in suspension and in sediments are also important determinants of aquatic plants invasion (Engelhardt 2011).While in some continents, intracontinental propagule pressure can be assumed to have been larger, because of the shorter distances, South America have species with restricted ranges, and as the consequence they are less likely to have been dispersed outside their native ranges (van Kleunen et al. 2015).In addition, freshwater ecosystems are often difficult to survey and monitor, so there might be a general scarcity of information about the distribution of invasive alien aquatic plants in many part of the worlds, as is the case of South America (Lozano and Brundu 2016).In particular, Chile, Brazil (Brazil's Atlantic Forest), Ecuador and Tropical Andes, where biodiversity hotspots are mainly represented, offers a unique opportunity to study biological invasions because they hold a unique native flora with high levels of endemism, extraordinary richness and diverse climatic gradients (Myers et al. 2000;Pauchard et al. 2004).In Brazil, for example, the Guiana Shield constitutes a geological, hydrographical and biogeographic region in the Amazonian Basin that is considered a very important biodiversity hotspot (Delnatte and Meyer 2012).However, the degree of susceptibility of ecosystems to invasion in these regions is poorly understood and investigated (Thomaz et al. 2015).
The main goal of this study was to assess the state of data availability for aquatic plant species in South America.Therefore, the present research aimed to: (1) evaluate the increase in reliability offered by the merging of information "manually" extracted from literature, for a set of native and non-native aquatic species of South America, with the information for the same species held in GBIF, and (2) evaluate the relationship between the density of non-native aquatic species and large-scale expected predictors such as the Human Influence Index (HII) and the distribution of protected areas in 16 regions of South America.

Species selection and study area
For the purposes of the present research, we selected a set of 80 species composed by 40 native and 40 non-native aquatic plant species thriving in South America (Supplementary material Table S1).The set included both helophytes (growing in anaerobic saturated soils) and hydrophytes, free-floating, floating (rooted) and submerged freshwater vascular plants.The selection of these 80 species was based on the availability of reliable information found in the literature concerning taxonomical, geographical and biological traits data, which are a fundamental prerequisite for any reliable modelling and risk assessment.

Collection of distribution records
We first created two distribution datasets for the selected 80 species, using two different methodologies.The first dataset (hereafter called literature dataset) included the geographical coordinates (Lat/Long WGS84) of species records that were collected through classical literature research (national or regional floras: http://www.floraargentina.edu.ar,http://floradobrasil.jbrj.gov.br,http://www.lib.udec.cl,http://www2.darwin.edu.ar,flora checklists and papers) for the 16 regions of South America.The second dataset (hereafter called GBIF dataset) was created using the ModestR software (freely available at the website http://www.ipez.es/ModestR,accessed on 2016) (Pelayo-Villamil et al. 2012;García-Roselló et al. 2013).We retrieved all available distributional data for the 80 selected aquatic species in South America from the GBIF portal (http://www.gbif.org,accessed on 2016).Acknowledgments for all the sources of the downloaded records from GBIF are shown in the supplementary materials (Table S4).Finally, we merged the two datasets in a new "integrated dataset", including new information (e.g.status of invasion and the risk level for non-native species using the USAqWRA scheme).

Taxonomical and geographical validation of distribution records
The synonyms used in the literature were handled in accordance to The Plant List portal (http://www.theplantlist.org/)and crosschecked using IPNI (International Plant Name Index, http://www.ipni.org/), to ensure that all records were assigned to an accepted valid name in agreement with GBIF taxonomic treatment.For species reported in the literature without georeferenced localities, but with an accurate description of the collection locality, geographical coordinates were assigned using Google Earth.On the other hand, the GBIF data were checked and cleaned using the menu facility of ModestR (García-Roselló et al. 2014).Species with non-reliable identification, duplicates of the same collection (discriminating between real duplicates and records of the same specimen sent to different collections), records without (georeferenced) locations or with latitude/longitude equal to 0º and records on the sea (i.e., coordinates that did not project onto land) were removed from the dataset.In addition, the software automatically classified valid and invalid samples depending on whether records were within or outside the inland freshwater.In addition, the software allowed retrieval for all species at the same time by including a file with the species names following a simple taxonomic classification, correcting wrong or invalid synonyms.Once the taxonomic data was introduced, distribution maps for each species was stored in the ModestR database.

Invasive status and distribution maps
Four a priori status categories were defined according to expert opinion, classifying each species in one of the following status categories for each of the 16 regions defined in the present study, or for part of the regions.The four status categories were as follows: alien non-invasive (NNV), alien invasive (INV), native (IND) and absent (ABS).We used the scores of the 35 non-native species previous evaluated by Lozano and Brundu (2016) with the US Aquatic Weed Risk Assessment (USAqWRA).We assessed the five additional alien aquatic species with the USAqWRA scheme (originally shaped by Gordon et al. 2012), i.e., Agrostis stolonifera L., Aponogeton distachyos L.f., Eleocharis acicularis (L.) Roem.& Schult., Nasturtium officinale R.Br., and Nymphaea micrantha Guill.& Perr.(Supplementary material Table S1, S5 and S6).The USAqWRA scores and invasion status, per species/region, improved the information downloaded from GBIF, that at the moment does provide only limited features related to biological invasions.This information allowed to correlate the invasive species risk in South America (i.e., the scores derived from the USAqWRA scheme) with large-scale expected predictors such as the Human Influence Index (HII) and the location of the protected areas (PAs).
The whole set of cleaned records, for alien invasive and alien non-invasive species obtained from the "integrated dataset", was used to map species distribution in each of the 16 regions or part of regions in South America.Distribution maps were created with: 1) GBIF presence records with the addition of records retrieved from literature and 2) GBIF presence records with the invasion status according to expert opinion (see Table S5 and S7).We also produced choropleth maps, based on the number of records, to highlight species density at regional level.Finally, we mapped the allodiversity of the 16 investigated regions.

GIS & statistical data analysis
We tested the difference between the raw and cleaned records within the two datasets for the 80 species, respectively for GBIF and literature.We also tested the difference between the native and non-native records downloaded from the GBIF and literature dataset applying a t-test (see Table 1).Additionally, we evaluated whether the information collected by literature did improve the information obtained through GBIF.
We downloaded the data set at continental-level (grid format, 1×1 km cell size) for the Human Influence Index (HII)  Afterward, we extracted the HII values for each pairs of coordinates, corresponding to each species records, in the 16 regions of South America.After projecting the set of cleaned records from GBIF and literature dataset into the HII layer, we used the Pearson's correlation to evaluate the relationship between the non-native species HII scores and their USAqWRA scores, at each geographical location.A Wilcoxon test was performed to check differences between the HII scores at individual points where native species were recorded vs. the points where non-native species were recorded.We addressed the possible bias due to spatial autocorrelation treating HII data with Generalized Least Square Models.GLS was fitted using the function gls, with "nlme" R package (Dormann et al. 2007; see Table S10).On the other side, polygon layers of nature reserves or protected areas (PAs) data sets for South America, were obtained from the World Database on Protected Areas (WDPA, http://www.wdpa.org,accessed on December 2016).We assessed the relationship between the non-native species proportion of records inside and outside the protected areas and their USAqWRA scores, using a generalized linear model (logistic regression).The risk of invasion was evaluated when the records of the non-native species were within or outside the PAs and in accordance with USAqWRA scores, using the software R (R Core Team 2015).Chi-square test on the contingency table between native/non-native records and inside/outside PAs was also performed.We addressed the spatial autocorrelation treating PAs data with autocovariate regression.The regression was conceived for binary data (as autologistic regression).We used the function autocov_dist, with "spdep" R package (Dormann et al. 2007; see Table S10).

Number of records and distribution in the 16 South American regions
The GBIF database held valid georeferenced distribution records for 79 of the 80 aquatic plant species investigated in the present study.Importantly, there were no GBIF records for the alien species Aponogeton distachyos L.f. (cape pondweed), in South America.On the other hand, the distribution records downloaded from GBIF covered only 15 of the 16 regions, i.e., excluding South Georgia and South Sandwich Islands.The data downloaded from GBIF contained 10,735 raw records (Table 1).Overall, cleaning and validation led to an exclusion of 1,825 records.
Table 1.Difference of the main descriptors of primary biodiversity information.Studies were performed comparing between GBIF and literature datasets: the total number and total records of native, non-invasive and invasive species, and the total records found in the 16 regions of South America, after the cleaning process with the software ModestR.The GBIF data were supplemented with records collected from literature sources.This resulted in the addition of 427 records, obtaining a final total number of 9,337 records (integrated dataset).The difference between the total number of records provided by GBIF and those obtained from the literature dataset, for the 80 species, were highly significant (p value < 0.001) (Table 1, Figure 1).The average records for species found in the GBIF dataset was 111.3 while for the literature dataset was 5.33.The literature dataset provided information lacking in the GBIF dataset, as it was possible to add records in regions not documented by GBIF such as South Georgia and South Sandwich Islands (Table S2).Although the additional number of records (4.79%) provided by the literature dataset was relatively low, we had an increase in terms of new/different species coverage of 1.26% and of 6.66% coverage for regions: the overall bias rate was considerably lower (5.53%).In addition, literature search provided information such as life form, plant traits and invasion status according to expert opinion.

Native and non-native status
The records available in the GBIF dataset for the selected 40 native and 40 non-native (alien noninvasive + alien invasive) aquatic plant species were in almost equal proportions: 4,536 for native and 4,374 for non-native species (Table 1).The literature dataset provided 126 records for native species, and 301 for non-native species (Table 1).The choropleth maps highlight areas differing in the number of records for native, non-invasive and invasive alien species (Figure 2).Noteworthy, the regions with the higher occurrence of non-native aquatic species records were Brazil (2,182), Colombia (454) and Argentina (444) (Supplementary material Figure S1).In both datasets, the total records for alien invasive species were higher than the records for alien non-invasive species (Table 1, Figure 2B, C).The country with the higher density of native and non-native species was Brazil.Regions like French Guiana, Guyana, Suriname and Venezuela, had the tendency to hold more native species than the non-native (Figure 2), and those with the higher density of native and nonnative species per square kilometers were Ecuador and Paraguay (Figure S2).
The data downloaded from GBIF showed a massive tendency towards denser species concentration of native and non-native species in Brazil (53), Argentina (52) and Colombia (50).In the data collected manually from the literature the denser concentrations were found in Brazil (38), Chile (30) and Argentina (28) (Table S3).
The reliability of GBIF and literature datasets, after cleaning process using ModestR, was 47% and 59% respectively (Table S2).This means that the literature dataset considered in the present study contained a higher proportion of species with reliable information in comparison to GBIF.
The allodiversity of the 16 investigated regions (i.e., the number of alien species present in a specific area, sensu Barthlott et al. 1999) is shown in Figure 3A.According to our results, the regions holding the highest number of different alien aquatic plant species were Argentina and Brazil.The ordinary Kriging map of Figure 3B shows with a better spatial resolution those parts of the regions holding the highest numbers and densities of alien invasive and alien non-invasive species.
When the Human Influence Index was considered, the alien invasive and alien non-invasive species with a high level of risk according to the USAqWRA scheme were positively correlated (t = 3.5851, df = 4421, p value < 0.001) with those locations with the higher level of anthropisation (Figure 4 and S3).In addition, we observed that in Brazil, Colombia, Ecuador, Uruguay and Venezuela the records were mostly found along the cost, probably close to the main ports, and this could be related to the pathway of introduction (intentional, e.g.ornamental, Table S1) or secondary release.We predicted a significantly higher HII for non-native occurrences (p value < 0.001).We did not found spatial autocorrelation with the HII scores and the USAqWRA scores (p value = 0.0014).
The correlation between the distribution of the most invasive species according to the USAqWRA scheme and the PAs in South America was found significant (p value = 0.034) and with a negative correlation coefficient, meaning that there is a higher probability of founding the most risky alien species outside the PAs (Figure 4).The chi-square test was significant (p value < 0.001), meaning that there is a higher number of records of non-native species outside the PAs in comparison to native species records.

Discussion
Globalization facilitates the spread of aquatic invasive plants as international commerce develops (Perrings et al. 2005;Donaldson et al. 2014;Seebens et al. 2015Seebens et al. , 2017)), and most of the aquatic alien species have been deliberately introduced as ornamental or for other commercial uses.After being introduced they might escape into the environment and also South America is unfortunately negatively affected by this process (Table S1).For example, Arundo donax L., was introduced in the Galapagos as an ornamental (Guézou et al. 2014).Hippuris vulgaris L., was introduced in Chile from Europe as ornamental (Ramírez and San Martin 2006).In Chile, 21% of the aquatic and riparian flora has been introduced.It is likely, that this percentage will increase as the country develops and together with it the water bodies are subjected to greater disturbances and prone to accidental escapes or even to intentional releases (Urrutia et al. 2016).To this concern, ports can be one of the main entrance points for aquatic alien species from other countries arriving as stowaways (e.g.ship hull fouling or transport with ballast water, Hulme 2009).Although GBIF and literature data may be biased and have limits in the coverage, especially in poorly investigated regions, we can expect that part of the difference in allodiversity detected in South America using a sample of 80 aquatic species, might be due to the distribution of ports acting as points of entry, and to the intense trade of ornamental plants.The native species considered in the present research could be considered as a "control group" whose distribution pattern reflects the sampling biases in the data when the effects of introduction dynamics and pathways are not important.Therefore, we can assume that differences between native and non-native records (e.g.non-natives in higher HII) could be a result of introduction and secondary release pathways.
In our study, similarly to other studies where the primary biodiversity information is used (Sousa-Baena et al. 2014;García-Roselló et al. 2015;Maldonado et al. 2015;Meyer et al. 2015), a critical point that decreased the data reliability was the inaccurate georeferencing (17.0% of wrong/missing locations).Hijmans et al. (1999) suggested that a relatively large proportion of all available records are not correctly georeferenced.Feeley and Silman (2011), reported the extreme lack of collections data in GBIF (and a similar database for Brazil named SpeciesLink; http://splink.cria.org.br/) for tropical plant species.They estimated that about 65% of tropical plants lack available geo-referenced collections.This lack of reliable spatial information over vast extents demonstrates that for many regions with large conservation opportunities there are not sufficient occurrence data to support even the most sophisticated modeling approaches (Meyer et al. 2015).Feeley and Silman (2011) termed this lack of knowledge as the "data void".They pointed out the importance of investigating species responses to climate change through species distribution modeling to predict rates of habitat loss and the associated extinction risks.Nevertheless, Collen et al. (2008), found for the tropical South America that species distributions and their responses to climate change is potentially crippled by a lack of basic data.Using "presence-only" data, a minimum of 20-50 collections per taxa are generally required to produce accurate species distribution models.Due to the paucity of digitized collections, very few tropical species meet this criterion (Feeley and Silman 2011).Sousa-Baena et al. (2014), pointed out that incompleteness not only is due to the lack of collection effort, but may also correspond to existing knowledge that is not digital or not accessible.In accordance with Maldonado et al. 2015, we would like to emphasize that the sources of information should be always accompanied by good metadata, including specific details on how the coordinates were obtained, and on whether the coordinate assignment was done manually (e.g.literature sources) or automatically (GBIF data).Periodically, researchers will need to re-evaluate coverage and completeness, and this information will need to incorporate additional coverage information.An advantage of enhancing GBIF dataset with occurrence records collected manually is that they might increase information about local patterns of occurrence, species abundances or community composition.Feeley (2015) quantified the amount of occurrence data available through GBIF for plant species in tropical South America and examined how data availability had changed through time.He found that most of this increase was due to the inclusion of additional pre-existing records rather than new collections.This increase was driven in large part by the incorporation of SpeciesLink data into GBIF.The greatest density of collections comes from the Northern Andean Paramo and Andean ecoregion, consistently with part of our data occurrence.In tropical South America, more than 10% is still represented by no collection and the reason is that the vast majority of species are sterile, therefore many collections are not identified to species or are identified incorrectly.
The importance and advantage of increasing the digitized records (e.g. in South America) is due to the fact that many ecoregions are very poorly represented in the GBIF collections database.Our results suggested that literature records can improve the coverage of the GBIF dataset, e.g. in Argentina, Brazil and Chile.For example, the Cerrado is one of the South America's largest, most diverse, and most threatened ecoregions but it is not well-represented (Feeley and Silman 2009).In contrast, the Andean ecoregion are well-represented, maybe their collection intensities were higher and there is not a lack of access (due to physical or bureaucratic impediments) (Feeley 2015).Major rivers, such as the Tocantins and Tapajós in Pará State, and the Rio Negro and Rio Madeira in Amazonas State, are sometimes associated with higher information content.Nevertheless, for our dataset the information in this wellknown place is still lacking.
Importantly, in contrast with the general trend, the biodiversity hotspot regions in Brazil (Myers et al. 2000) have the highest concentration of the invasive alien plants and include a large number of protected areas (World Conservation Union and UNEP-World Conservation Monitoring Centre 2007).Protected areas are usually characterised by high levels of biodiversity, unique habitats, pristine ecosystems or protected or endangered species (Yang et al. 2014;Kumschick et al. 2015).The Andes biodiversity hotspot is one of the most diverse regions and supports many endemic species of high conservation priority (Myers et al. 2000), yet the lack of usable data interferes in conservation efforts.Immediate efforts are needed to increase the quality and number of data available from this and other underrepresented systems (Feeley and Silman 2010).
In accordance with our results, Feeley and Silman (2011) reported that in Ecuador a relatively large number of collections are available online, thanks to the efforts of local herbaria, including the Museo de Historia Natural (QCNE), the Pontificia Universidad Católica del Ecuador (QCA), and the Universidad Central del Ecuador (QAP).
There is a clear need for more frequent and intensive collection campaigns, not just for our set of aquatic species but in general, and research efforts in the structure and dynamic across the Amazon and Tropical South America (Feeley 2015).
The fact that Protected areas are holding a low quantity of invasive species (Foxcroft et al. 2017), could be related to a lower sampling effort (Figure S4).Nevertheless, in the past years, there have been attempts to standardize inventory data (i.e., plot data) in the Amazon forest (e.g. the Amazon Tree Diversity Network (ATDN, http://web.science.uu.nl/Amazon/atdn/) and the RAINFOR Amazon Forest Inventory Network (http://www.rainfor.org/)(Feeley 2015) and this information would be useful to reduce the artefact of sampling bias.Schulman et al. (2007), showed that much of the Amazonian Basin shows little or no evidence of botanical exploration.Therefore, geographical gaps and the small number of herbarium collections available impede accurate mapping of plant distributions and mapping biodiversity (Hopkins 2007).
Among the 59 alien plant species that are reported as invaders from 135 protected areas from around the world (in Foxcroft et al. 2013) there are many aquatic plants such as Arundo donax, Hydrilla verticillata, Pistia stratiotes and Salvinia molesta.According to our results and in combination with the USAqWRA scores, those species were the most invasive species in South America, prone to causes biodiversity loss in PAs.
The Human Influence Index could be considered a useful proxy to detect areas where the alien species could arrive and establish.Our results reflected that alien species were favoured in locations with a high effect of anthropisation.According to Gallardo et al. (2015) transport networks are at the moment one of the most important driver for the entry and the distribution of invasive plants (e.g.port proximity determined the presence of freshwater invaders) and directly linked to the vectors and pathways of introduction and secondary release for invasive species.These findings confirm that the relationship between invasive species and the human influence are quite important to explain highest risk values in areas where propagule pressure can be presumed high (i.e., close to transport networks and densely populated areas).
In accordance with García-Roselló et al. (2015), the inclusion of species in localities from which they had not been recorded by the use of predicted maps generally involves an increase in species richness.Extrapolations of individual species ranges, alternatively, do not appear to affect the geographical position of hotspots or patterns of global species richness.

Conclusion
The GIBF and literature datasets provided significantly different information and the combination of the two offered new information and a better coverage that would not exist in a single data source.Nevertheless, a careful quality evaluation of the primary biodiversity information, both in the case of literature and GBIF should be conducted, before the data is used for further analyses in macroecological studies.
The identification of invasive species risk hotspots for aquatic invasive plant species could promote the development of prevention and control strategies.Particularly, the biodiversity hotspots and the protected areas should be efficiently prevented and monitored.The human influence amplifying the potential for invasion could be translated into highest cumulative risk scores in close relation to the location of commercial ports, dense populated areas and intensely used landscapes.The methodology used in the present research, if applied on a larger dataset including all non-native species, could facilitate prevention and monitoring, at least for some regions of South America.
Finally, we would like to stress that GBIF data and tools are very valuable and important.However constant efforts at increasing sample sizes through the generation of new data and the publishing of existing datasets are particularly required of native and alien aquatic plants in South America.

Figure 1 .
Figure 1.Map showing the distribution of records downloaded from GBIF (grey circle) and collected from literature sources (red triangle), for South America.

Figure 3 .
Figure 3. Number of different invasive alien and non-invasive alien species present in each of the 16 regions of South America, downloaded from GBIF dataset, cleaned and increased through the addition of records from literature.A) Choropleth map of the number of non-native species.B) Kriging of estimated non-native species density according the number of different species in the 16 regions of South America.The 16 South American regions considered in the study were defined as follows: (1) Argentina, (2) Bolivia, (3) Brazil, (4) Chile, (5) Colombia, (6) Ecuador, (7) Falklands Islands, (8) French Guiana, (9) Galapagos, (10) Guyana, (11) Paraguay, (12) Peru, (13) South Georgia and South Sandwich Islands, (14) Suriname, (15) Uruguay and (16) Venezuela.

Figure 4 .
Figure 4. Human Influence Index (HII) map on the left side and the Protected Areas (PAs) map on the right side.The records of the invasive alien and non-invasive alien species downloaded from GBIF dataset and integrated through the addition of records compiled manually are shown in both maps (dots).The HII grid layer was downloaded from the Wildlife Conservation Society [WCS; Center for International Earth Science Information Network -CIESIN -Columbia University 2005] and the PAs map was downloaded from the World Database of Protected Areas (WDPA).