Namibian Biodiversity Database

NaBiD - Frequently asked questions

Niche modeling


Niche modeling

  • How were the extrapolated distribution ranges visible on some distribution maps produced?

    The method we use most closely approaches the 'bioclimatic envelope' theory of niche modeling, which basically looks at a suite of climatic parameters at the locations from which a species has been recorded, and then extrapolates the probable distribution to all locations that share the same range of those parameters. We developed a home-grown implementation of this concept in order to have maximum flexibility and future extensibility.

    Because our area of interest is defined, we could limit ourselves to Namibia and offshore areas only. A standard NaBiD grid file covers continental Namibia, a roughly equivalent area of the adjacent South Atlantic Ocean, and a narrow margin to allow for proximate records from neighbouring countries. The resultant rectangular study area extends from 2° to 25.99° E and from 16° to 29.99 ° S. We work with a grid resolution of 0.01°. One hundredth of a degree is not much different in size from e.g. 30 arc-seconds, but makes for much simpler computation. It follows that our grids each contains 3.36 million nodes, with a resolution of approximately 1 km on the ground (or sea). This was found to be a suitable compromise between the need for accuracy, the lack of comparable accuracy in many datasets that feed into this, and sane processing times.

    We use a binary file format, that consists of 3.36 million values (floating point or integer) in row and column sequence from north to south and west to east. Individual values can be rapidly read from this by seeking to the appropriate node index.

    The first step in niche modeling is then to convert the coordinates of the localities from which a species is known to the nearest grid nodes to those coordinates. All further work is done using the nodes only. A species starts the process with a blank grid file consisting of 3.36 million zeroes. As environmental parameters are evaluated, some of those zeroes are replaced by other values, and in the end those values are used to produce a map.

    Which parameters to use for which species? Our database includes a table with simple boolean columns that toggle inclusion or exclusion of each parameter for each species. Part of our Database Management Application allows changing of these parameters. The same table also holds the resultant extrapolated range (a polygon), and a description of the parameters tested for and those that were found to be significant. The niche modeling application simply tests each designated species for each parameter flagged in the database. Parameters come in three flavours:

    1. Surface files. These were generated (generally through Kriging) by extrapolating from scattered data points to a regular surface. Climatic parameters are typically converted to surfaces. We use the actual numeric values (amount of rainfall, temperature, etc.) at the nodes from where the species has been recorded. Basic statistics is used to reduce the range of values to those that fall within 95% confidence limits only. We then find all nodes in the parameter grid that fall within these limits, and add 1 to the value of each corresponding node on the species grid. Repeat for next surface.

    2. Categorised grids. These generally start off as shapefiles. We reduce them to grids by assigning each distinct category a sequential number and then producing a grid that has the appropriate category number at each node. A simple text file in ini-format allows us to refer back and find which number refers to which category. We then find the category number associated with each node from which the species has been recorded. Category numbers are treated as attributes, not numbers per se. We record number of nodes per categrory and reduce them to those with frequencies that fall within statistical 95% confidence limits only. Then we add 1 to each node of the species grid that corresponds to all the nodes in the category file that have the significant category numbers. Repeat for next categorised grid.

    As an aside to the previous, and a possible future refinement, we also compare the frequencies of the different categories for each species with the frequency of occurrence (= area covered by) each category in Namibia. A chi-squared test for significance is done, and positive results are written to the database to become part of the parameter description that appears below each map. These results are not currenlty used for mapping, but might in future.

    3. Blanking files. In contrast to the previous two, blanking files reduce the values of particular nodes in the species file to zero instead of adding 1 to them. The reasoning is as follows: it is entirely possible that e.g. a climatic grid may predict the occurrence of a terrestrial species in an adjacent offshore area with similar climate. Since we know from experience that terrestrial species do not occur in the sea, we can use a sea blanking file to set all oceanic nodes in that species' grid to zero and prevent potentially embarrasing extrapolations. For terrestrial taxa we currently also use a Namibian border blanking file to clip all ranges to within the country's borders, because we do not yet have sufficient datasets to do confident extrapolations across the border.

    The end result of all the above is a species grid file in which many nodes are still zero, but at least some now have a positive numeric value. The nodes with the highest values are those that shared the most parameters with those at actual recorded localities for the species, and should represent the area of predicted most likely occurrence of the species. We can use a contouring algorithm to create a polygon that encloses the area or areas with non-zero scores. In order to once again throw out the lower scores, we currently contour at 25% of the maximum value for any single node for the species. The 25% level was arbitrarily chosen, and may be revised depending on future experience.

    Because the contouring works at the grid resolution, it typically spits out 50 - 100 000 contour fragments, each less than 1 km ground length. Consolidating these into coherent polygons takes a while, and still leaves us with a polygon replete with redundant vectors, so some polygon decimation is done to get the vector number down to hundreds instead of tens of thousands. This polygon is stored in the database, and is used to generate the distribution maps seen on the site.

    All the above is fairly processor intensive, and could not be done through PHP and the web server as all our other work is. Two standalone C++ console applications were therefore written: one to convert other files to our grid format, and the second to do the actual niche modelling. When any species' distribution is modified or added to in the database, a small text file with relevant information is written to a working directory. The C++ application simply iterates through any files found in that directory, does the modeling, and replaces them with the generated polygon data. Back in PHP, the database co-ordinator can run a script on a regular basis that plots a distribution map with the new polygon data and displays it in the browser side-by-side with the old map. Options exist to accept the new map (replace the old map), reject the new map (keep the old map), or retry modeling with different parameters. (This last bit of manual intervention will hopefully become redundant in future, once we have tweaked the program to spit out perfect maps every time  .)


Site active since 18 November 2003.    This page last modified on: 21 March 2008, at 13:46 pm
Webmaster:  info@biodiversity.org.na.    Site design, layout and coding by John Irish.