How were the extrapolated distribution ranges visible on some distribution maps produced?
The method we use most closely approaches the 'bioclimatic envelope' theory of niche modeling, which basically looks at a suite
of climatic parameters at the locations from which a species has been recorded, and then extrapolates the probable distribution to all
locations that share the same range of those parameters. We developed a home-grown implementation of this concept in order to have
maximum flexibility and future extensibility.
Because our area of interest is defined, we could limit ourselves to Namibia and offshore areas only. A standard NaBiD grid file
covers continental Namibia, a roughly equivalent area of the adjacent South Atlantic Ocean, and a narrow margin to allow for proximate
records from neighbouring countries. The resultant rectangular study area extends from 2° to 25.99° E and from 16° to 29.99
° S. We work with a grid resolution of 0.01°. One hundredth of a degree is not much different in size from e.g. 30 arc-seconds,
but makes for much simpler computation. It follows that our grids each contains 3.36 million nodes, with a resolution of approximately
1 km on the ground (or sea). This was found to be a suitable compromise between the need for accuracy, the lack of comparable accuracy
in many datasets that feed into this, and sane processing times.
We use a binary file format, that consists of 3.36 million values (floating point or integer) in row and column sequence from
north to south and west to east. Individual values can be rapidly read from this by seeking to the appropriate node index.
The first step in niche modeling is then to convert the coordinates of the localities from which a species is known to the nearest
grid nodes to those coordinates. All further work is done using the nodes only. A species starts the process with a blank grid
file consisting of 3.36 million zeroes. As environmental parameters are evaluated, some of those zeroes are replaced by other values,
and in the end those values are used to produce a map.
Which parameters to use for which species? Our database includes a table with simple boolean columns that toggle inclusion or exclusion
of each parameter for each species. Part of our Database Management Application allows changing of these parameters.
The same table also holds the resultant extrapolated range (a polygon), and a description of the parameters tested
for and those that were found to be significant. The niche modeling application simply tests each designated species for each parameter
flagged in the database. Parameters come in three flavours:
1. Surface files. These were generated (generally through Kriging) by extrapolating from scattered data points to a regular surface.
Climatic parameters are typically converted to surfaces. We use the actual numeric values (amount of rainfall, temperature, etc.) at the
nodes from where the species has been recorded. Basic statistics is used to reduce the range of values to those that fall within 95%
confidence limits only. We then find all nodes in the parameter grid that fall within these limits, and add 1 to the value of each
corresponding node on the species grid. Repeat for next surface.
2. Categorised grids. These generally start off as shapefiles. We reduce them to grids by assigning each distinct category a sequential
number and then producing a grid that has the appropriate category number at each node. A simple text file in ini-format allows us to
refer back and find which number refers to which category. We then find the category number associated with each node from which the species
has been recorded. Category numbers are treated as attributes, not numbers per se. We record number of nodes per categrory and
reduce them to those with frequencies that fall within statistical 95% confidence limits only. Then we add 1 to each node of the
species grid that corresponds to all the nodes in the category file that have the significant category numbers. Repeat for next
categorised grid.
As an aside to the previous, and a possible future refinement, we also compare the frequencies of the different categories for
each species with the frequency of occurrence (= area covered by) each category in Namibia. A chi-squared test for significance is
done, and positive results are written to the database to become part of the parameter description that appears below each map.
These results are not currenlty used for mapping, but might in future.
3. Blanking files. In contrast to the previous two, blanking files reduce the values of particular nodes in the species file
to zero instead of adding 1 to them. The reasoning is as follows: it is entirely possible that e.g. a climatic grid may predict the
occurrence of a terrestrial species in an adjacent offshore area with similar climate. Since we know from experience that terrestrial
species do not occur in the sea, we can use a sea blanking file to set all oceanic nodes in that species' grid to zero and prevent
potentially embarrasing extrapolations. For terrestrial taxa we currently also use a Namibian border blanking file to clip all ranges
to within the country's borders, because we do not yet have sufficient datasets to do confident extrapolations across the border.
The end result of all the above is a species grid file in which many nodes are still zero, but at least some now have a positive
numeric value. The nodes with the highest values are those that shared the most parameters with those at actual recorded localities
for the species, and should represent the area of predicted most likely occurrence of the species. We can use a contouring algorithm
to create a polygon that encloses the area or areas with non-zero scores. In order to once again throw out the lower scores, we
currently contour at 25% of the maximum value for any single node for the species. The 25% level was arbitrarily chosen, and
may be revised depending on future experience.
Because the contouring works at the grid resolution, it typically spits out 50 - 100 000 contour fragments, each less than 1 km
ground length. Consolidating these into coherent polygons takes a while, and still leaves us with a polygon replete with redundant
vectors, so some polygon decimation is done to get the vector number down to hundreds instead of tens of thousands. This polygon is
stored in the database, and is used to generate the distribution maps seen on the site.
All the above is fairly processor intensive, and could not be done through PHP and the web server as all our other work is. Two
standalone C++ console applications were therefore written: one to convert other files to our grid format, and the second to do the actual
niche modelling. When any species' distribution is modified or added to in the database, a small text file with relevant information
is written to a working directory. The C++ application simply iterates through any files found in that directory, does the modeling,
and replaces them with the generated polygon data. Back in PHP, the database co-ordinator can run a script on a regular basis that
plots a distribution map with the new polygon data and displays it in the browser side-by-side with the old map. Options exist to
accept the new map (replace the old map), reject the new map (keep the old map), or retry modeling with different parameters. (This
last bit of manual intervention will hopefully become redundant in future, once we have tweaked the program to spit out perfect
maps every time
.)