| |
|
HYDRO1k Documentation
Table of Contents
- 1.0. Introduction
- 2.0. Data Layers
- 3.0. Data
Set Development
-
- 3.1. Data
Processing Procedures
- 3.1.1. Project the
DEM
- 3.1.2. Identify
Natural Sink Features
- 3.1.3. Filling
the DEM
- 3.1.4. Verification
of the DEM
- 3.2. Generation
of Derivative Raster Data Sets
- 3.2.1. Aspect
- 3.2.2. Flow
Directions
- 3.2.3. Flow
Accumulations
- 3.2.4. Slope
- 3.2.5. Compound Topographic
Index
- 3.3. Generation
of Derivative Vector Data Sets
- 3.3.1. Drainage
Basin Boundaries
- 3.3.2. Stream
Lines
- 4.0. Data Formats
- 4.1. Vector
Data Formats
- 4.2. Raster
Data Formats
- 4.2.1. Image File
(.bil)
- 4.2.2. Header
File (.hdr)
- 4.2.3. World File
(.blw)
- 4.2.4. Statistics
File (.stx)
- 5.0. Data
Distribution
- 6.0. Notes
and Hints for HYDRO1k Users
- 7.0. Summary
- 8.0. References
- 9.0. Disclaimers
1.0. Introduction
HYDRO1k, developed at the U.S. Geological Survey's (USGS) EROS Data Center, is a geographic database
providing comprehensive and consistent global coverage of topographically
derived data sets. Developed from the USGS' recently released 30 arc-second
digital elevation model (DEM) of the world (GTOPO30),
HYDRO1k provides a standard suite of geo-referenced data sets (at a
resolution of 1 km) that will be of value for all users who need to
organize, evaluate, or process hydrologic information on a continental
scale.
Constructive comments from users of the HYDRO1k data sets are welcomed.
Please send your comments to kverdin@edcmail.cr.usgs.gov or
sgreenlee@edcmail.cr.usgs.gov.
2.0. Data
Layers
The HYDRO1k data sets are being developed on a continent by continent
basis, for all landmasses of the globe with the exception of Antarctica
and Greenland. The HYDRO1k package provides, for each continent, a suite
of six raster and two vector data sets. These data sets cover many of
the common derivative products used in hydrologic analysis. The raster
data sets are the hydrologically correct DEM, derived flow directions,
flow accumulations, slope, aspect, and a compound topographic (wetness)
index. The derived streamlines and basins are distributed as vector
data sets.
3.0. Data
Set Development
The HYDRO1k data sets are the result of the cooperative project at
the U.S. Geological Survey's (U.S.G.S.) EROS Data Center. The goal of
the project is the development of a globally consistent hydrologic derivative
data set. The effort has been led by U.S.G.S. scientists in collaboration
with the United Nations Environment Programme/Global Resource Information
Database (UNEP/GRID) located in Sioux Falls, South
Dakota.
Development of the HYDRO1k database was made possible by the completion
of the 30 arc-second digital elevation model at the EROS Data Center
in 1996, entitled GTOPO30. This data set, with its nominal cell size
of 1 km, has been and will continue to be applied by many scientists
and researchers to hydrologic and land form studies. Inevitably, these
studies require development, at a minimum, of a standard suite of derivative
products. In the past, users would obtain the DEM data, process the
data, extract the derivative information, use the derived products in
their studies and, perhaps, share the derived information with others.
In an attempt to reduce repetition of these procedures by every user
of the data set, the HYDRO1k data base aims to provide these standard
products, developed in a consistent fashion for the entire globe and
make them available for the entire user community.
-
The basis of all of the data layers available in the HYDRO1k database
is the hydrologically correct DEM. This DEM is, of course, based
on the GTOPO30 data set. However, to ensure that the DEM is able
to reproduce the correct movement of water across its surface, the
DEM is processed to remove elevation anomalies that can interfere
with hydrologically correct flow. The procedures followed in development
of this DEM are iterative. Some of the techniques used in the DEM
development are documented in Danielson (1997).
-
In order to properly perform area calculations on the DEM, the
data are projected into an equal area projection. The Lambert Azimuthal
Equal Area projection was selected for this database. (Steinwand
et al, 1995). The cell size for all continents is 1,000 meters and
the radius of the sphere of influence is 6,370,997 meters. Projection
parameters that vary by continent are given in the following table.
Other geo-referencing information is available in the projection
file that is included with each continental data set.
| Continent |
Longitude of Origin |
Latitude of Origin |
| Africa |
20° 00' 00"E |
5° 00' 00"N |
| Asia |
100° 00' 00"E |
45° 00' 00"N |
| Australasia |
135° 00' 00"E |
15° 00' 00"S |
| Europe |
20° 00' 00"E |
55° 00' 00"N |
| North America |
100° 00' 00"W |
45° 00' 00"N |
| South America |
60° 00' 00"W |
15° 00' 00"S |
-
All continents contain some closed basins; drainage basins with
no natural outlet to the sea. In processing the HYDRO1k DEM to replicate
natural flow patterns, techniques were developed to (1). identify
which sink features in the DEM are, indeed, natural features and
(2). preserve these sink features during the processing. Identification
of the natural sinks in the DEM was begun by creating a "sink layer"
containing all sink features contained in the projected GTOPO30
DEM. This sink layer was then thresholded to extract only sinks
with a surface area greater than a specified minimum. This was used
as a "first-cut" on identification of the natural sink features.
-
To allow filling of the DEM using standard GIS techniques while
still maintaining the sinks identified in step 3.1.2., the identified
sinks are "seeded" by placing a NODATA point at the bottom of each
sink. Since the standard GIS implementation of the hydrologic filling
technique allows flow only off the edge of the DEM or to NODATA
points, this procedure "tricks" the GIS into letting water flow
to the sink. All spurious sinks, those not identified as potential
natural features in 3.1.2, are removed.
-
Following filling of the DEM, initial streamline and basin data
sets are generated for use in the verification of the DEM. Flow
direction and flow accumulation grids are generated and the vector
stream lines and basin boundaries are produced. The streamlines
and basins thus derived are compared against existing digital data.
In most cases, the Digital Chart of the World (DCW) drainage cover
was used for comparison (Defense Mapping Agency, 1992; Danko, 1992).
However, all available map sources were used. Comparison of the
generated streamlines with mapped hydrography allows identification
of essentially two types of errors in the DEM:
(1). Errors of omission or inclusion of natural sink features.
Examination of mapped hydrography often serves to identify whether
or not the first pass identification of the natural sinks features
was adequate. In the case of an error of omission, the newly identified
sink feature is "seeded" in the DEM and in the case of inclusion,
the "seeded" sink is removed ("unseeded").
(2). Errors in the DEM which prevent proper flow across its surface.
These errors can be caused by the DEM generation or resampling techniques
or can simply be caused by the 1-km horizontal or the 1-m vertical
resolution of the DEM. Comparison with mapped hydrography serves
to identify locations where the generated streamlines or basin boundary
deviate. If the difference between the two sources of information
proves to be the DEM, editing of the DEM is done to guarantee that
flow progresses in the required direction. These type of DEM edits
usually involve only small changes in the elevation of one or two
pixels.
The procedures in 3.1.3. and 3.1.4. are repeated until the DEM
is able to produce streamlines and basins that adequately match
mapped hydrography.
-
Following generation of the hydrologically correct DEM, the final
versions of the additional derivative data layers are produced.
Along with the hydrologically correct DEM, the following five raster
data layers are developed using standard GIS techniques. All derivative
raster data layers were produced using ARC/INFO’s GRID module
(ESRI, 1992).
-
The aspect data set describes the direction of maximum rate of
change in the elevations between each cell and its eight neighbors.
It can essentially be thought of as the slope direction. It is measured
in positive integer degrees from 0 to 360, measured clockwise from
north. Aspects of cells of zero slope (flat areas) are assigned
values of -1.
The flow direction data layer defines the direction of flow from
each cell in the DEM to its steepest down-slope neighbor. Values
of flow direction vary from 1 to 255. Defined flow directions
follow the convention adopted by ARC/INFO's flow direction implementation:
Cells with undefined direction of flow represent sinks and have
flow directions that are simple combinations of its neighbors' flow
direction values.
The flow accumulation data layer defines the amount of upstream
area draining into each cell. It is essentially a measure of the
upstream catchment area. The flow direction layer is used to define
which cells flow into the target cell. Since the cell size of the
HYDRO1k data set is 1 km, the flow accumulation value translates
directly into drainage areas in square kilometers. Values range
from 0 at topographic highs to very large numbers (on the order
of millions of cells) at the mouths of large rivers.
The slope data layer describes the maximum change in the elevations
between each cell and its eight neighbors. The slope is expressed
in integer degrees of slope between 0 and 90.
3.2.5. Compound Topographic
Index
The Compound Topographic Index (CTI), commonly referred to as the
Wetness Index, is a function of the upstream contributing area and
the slope of the landscape. The implementation used in the HYDRO1k
data set is based on Moore et al (1991). The CTI is calculated using
the flow accumulation (FA) layer along with the slope as:
CTI = ln ( FA / tan (slope) )
In areas of no slope, a CTI value is obtained by substituting a
slope of 0.001. This value is smaller than the smallest slope obtainable
from a 1000 m data set with a 1 m vertical resolution.
-
3.3 Generation
of Derivative Vector Data Sets
The stream line and basin data in the HYDRO1k data set are distributed
as vector layers.
-
The drainage basins distributed with the HYDRO1k data set are derived
using the vector streamlines along with the flow direction layer.
The basins are seeded following procedures first articulated by
Otto Pfafstetter, a Brazilian engineer, and adapted for use in the
HYDRO1k data set (Verdin,
1997). Each polygon in the basin data set has been tagged with
a Pfafstetter code uniquely identifying each sub-basin. The six-digit
Pfafstetter code assigned to each basin carries basin linkage information.
This permits determination of basin interconnectedness through simple
examination of the Pfafstetter code.
The drainage basin polygons are attributed with the following attributes:
Level1 to Level6 = Pfafstetter units of each polygon
Slope_mean = Mean value of the slopes within the subbasin (degree)
Slope_stdev = Standard deviation of the slopes within the subbasin
(degree)
Aspect_mean = Mean value of the aspects within the subbasin (degree
from N)
Aspect_stdev = Standard deviation of the aspects within the subbasin
(degree from N)
Dem_mean = Mean elevation value within the subbasin (m)
Dem_stdev = Standard deviation of the elevations within the subbasin
(m)
The stream line data layer distributed with the HYDRO1k data set
is derived from the flow accumulation and flow direction layers.
Cells with upstream drainage areas greater than 1000 km2 are
selected from the flow accumulation layer and processed through
the STREAMLINK function. The resulting links are attributed with
the maximum flow accumulation occurring within that link and the
result is vectorized using the STREAMLINE function. These procedures
result in a vector data layer of streamlines with each segment of
stream attributed with the upstream contributing drainage area.
The vector streamlines are attributed with the following fields:
Flowacc = The maximum flow accumulation value of the stream segment.
This value corresponds directly with the upstream watershed contributing
area. (10-3 km2)
Pf_type = The Pfafstetter level at which the stream segment is
considered "main stem".
Level1 to Level6 = The Pfafstetter units in which the stream segment
lie.
Frmelevation = The elevation value of the stream segment's from-node
(m)
Toelevation = The elevation value of the stream segment's to-node
(m)
Strorder = Strahler stream order of the segment
Gradient = Gradient of the stream segment calculated as the difference
of the from and to-node elevations divided by the length of the
segment
Frmup_flowlen = The upstream flowlength from the from-node. Calculated
using ARC/INFO's FLOWLENGTH function, it is the longest path from
the from-node to the drainage basin divide. (m)
Toup_flowlen = The upstream flowlength from the to-node. (m)
Frmdn_flowlen = The downstream flowlength from the from-node. Again
from ARC/INFO's FLOWLENGTH function, it is the length from the from-node
to the ocean or a terminal sink. (m)
Todn_flowlen = The downstream flowlength from the to-node. (m)
4.0. Data Formats
-
The vector data sets, stream lines and basins, distributed with
HYDRO1k are being made available in a ARC/INFO Export Format (.E00
extension).
The six raster data layers distributed for each continent are being
distributed as simple binary raster data. Each raster data layer
is provided as four files, with the extension of each file defining
the file type.
| File Extension |
File Type |
| .bil |
Raster Data File |
| .hdr |
Header File |
| .blw |
World File |
| .stx |
Statistics File |
-
The raster data for each layer are provided as signed integer data
in a simple binary raster format. All the data layers are 16-bit
data with the exception of the flow accumulation layer, which, due
to the range of values needed, is 32-bit. There are no header or
trailer bytes embedded in the image. The data are stored in row
major order (all the data for row 1, followed by all the data for
row 2, etc.).
-
The raster data header file is an ASCII text file containing size
and coordinate information for the layer. Many standard software
packages require the .hdr file to provide important geo-referencing
information for the image. The following keywords are used in the
header file:
| BYTEORDER: |
Byte order in which image pixel
values are stored
M = Motorola byte order (most significant byte first) |
| LAYOUT: |
organization of the bands in
the file
BIL: band interleaved by line (note: the raster layers are
all single band images) |
| NROWS: |
number of rows in the image |
| NCOLS: |
number of columns in the image |
| NBANDS: |
number of spectral bands in
the image (1) |
| NBITS: |
number of bits per pixel (16
or 32) |
| BANDROWBYTES: |
number of bytes per band per
row (twice the number of columns for a 16-bit image; four-times
for the 32-bit image) |
| TOTALROWBYTES: |
total number of bytes of data
per row (twice the number of columns for a single band 16-bit
image; four-times for the 32-bit image) |
| BANDGAPBYTES: |
the number of bytes between
bands in a BSQ format image (0) |
The world file is an ASCII text file containing coordinate information.
It is used by some packages for geo-referencing of image data.
| XDIM: |
X-dimension of a pixel (1000) |
| Rotation term: |
Always zero |
| Rotation term: |
Always zero |
| Negative YDIM: |
Negative Y-dimension of a pixel
(-1000) |
| XMIN: |
X-location of center of upper-left
pixel
(projected meters) |
| YMAX: |
Y-location of center of upper-left
pixel
(projected meters) |
The statistics file is an ASCII text file that lists the band number,
minimum value, maximum value, mean value, and standard deviation
of the values in the raster data file.
HYDRO1k data for each continent are distributed electronically as tar
files. The data files are identified by the two-digit continental identifier
according to the following scheme:
| Two-digit Identifier |
Continent |
| AF |
Africa |
| AS |
Asia |
| AU |
Australasia |
| EU |
Europe |
| NA |
North America |
| SA |
South America |
Users have the option of obtaining the entire HYDRO1k data set for
a continent (all eight data layers) or selectively choosing layers for
download. In either case, the data are distributed as tar files. In
the case of raster data sets, the .bil files have been compressed with
the gzip function before creation of the tar file. The vector data export
files have been compressed (gzipped) as well prior to creation of the
tar file. As an example of the naming convention used, the North American
data sets that are available are:
| Na.tar |
A tar file containing
all the North American data layers
along with README |
| Na_asp.tar |
Tar file containing the aspect
data layer
(compressed bil file, three ancillary files
and README) |
| Na_bas.tar |
Vector basin data layer in
compressed ARC/INFO Export format
along with README |
| Na_cti.tar |
Tar file with CTI data layer
(compressed bil file, three ancillary files
and README) |
| Na_dem.tar |
Tar file with DEM data layer
(compressed bil file, three ancillary files
and README) |
| Na_fd.tar |
Tar file with flow direction data
layer
(compressed bil file, three ancillary files
and README) |
| Na_fa.tar |
Tar file with flow accumulation
data layer
(compressed bil file, three ancillary files
and README) |
| Na_slope.tar |
Tar file with slope data layer
(compressed bil file, three ancillary files
and README) |
| Na_str.tar |
Vector streams data layer in
compressed ARC/INFO Export format
along with README |
As well as being available via a web page interface, the HYDRO1k data
sets are available electronically through an Internet anonymous File
Transfer Protocol (FTP) account at the EROS Data Center (at no cost).
To access this account:
- 1. FTP to edcftp.cr.usgs.gov
- 2. Enter anonymous at the Name prompt.
- 3. Enter your email address at the Password prompt.
- 4. Change to the /pub/data/gtopo30hydro subdirectory
- 5. Enter binary to set the transfer type.
- 6. Use get or mget to retrieve the desired files.
To use the HYDRO1k data files, the individual data files must first
be extracted from the tar files. Within the tar files, the image data
files (.bil) are compressed. These files, along with the compressed
vector export files, must be uncompressed. If you do not have the gzip
and tar utilities, they can be obtained from the following locations:
- Unix gzip:
- ftp://prep.ai.mit.edu/pub/gnu
- ftp://wuarchive.wustl.edu/systems/gnu
- Macintosh gzip and tar:
- ftp://mirrors.aol.com/pub/mac/util/compression
- macgzip0.3b2.sit.hqx
- suntar2.03.cpt.hqx
- DOS gzip and tar:
- ftp://prep.ai.mit.edu/pub/gnu
- gzip-1.2.4.tar
- ftp://ftp.uu.net/systems/ibmpc/msdos/pcroute
- tar.exe
Because the image (.bil) data are stored in a 16-bit binary format,
users must be aware of how the bytes are addressed on their computers.
The data are provided in Motorola byte order, which stores the most
significant byte first ("big endian"). Systems such as Sun SPARC and
Silicon Graphics workstations use the Motorola byte order. The Intel
byte order, which stores the least significant byte first ("little endian"),
is used on DEC Alpha systems and most PCs. Users with systems that address
bytes in the Intel byte order may have to "swap bytes" of the BIL data
unless their application software performs the conversion during ingest.
The statistics file (.stx) provided for each data set gives the range
of values in the image file, so that users can check if they have the
correct values stored on their system.
Users of ARC/INFO or ArcView can display the image data directly. However,
if a user needs access to the actual pixel values for analysis in ARC/INFO
the image must be converted to an ARC/INFO grid with the command IMAGEGRID.
IMAGEGRID does not support conversion of signed image data, therefore
the negative 16-bit image values will not be interpreted correctly.
After running IMAGEGRID, an easy fix can be accomplished using the following
formula in GRID:
out_grid = con(in_grid >= 32768, in_grid - 65536, in_grid)
The converted grid will then have the negative values properly represented,
and the statistics of the grid should match those listed in the .stx
file. If desired, the -9999 ocean mask values in the grid could then
be set to NODATA with the SETNULL function.
The HYDRO1k data set provides many of the derivative products useful
in earth science applications. The hydrologically correct DEM and ancillary
data layers are useful in studies of earth systems including watershed
analysis, landform studies and global change scenarios. Development
of a standard set of data layers minimizes duplication of effort and
will provide consistent global coverage.
Danielson, J.J., 1996. Delineation of drainage basins from 1 km African
digital elevation data. In: Pecora Thirteen, Human Interactions with
the Environment - Perspectives from Space, Sioux Falls, South Dakota,
August 20-22, 1996.
Danko, D.M., 1992. The digital chart of the world. GeoInfo Systems,
2:29-36.
Defense Mapping Agency, 1992, Development of the Digital Chart of the
World: Washington, D.C., U.S. Government Printing Office
ESRI, 1992, "Cell based modeling with GRID", ESRI, Inc., Redlands,
California.
Moore, I.D., R.B. Grayson and A.R. Ladson, 1991, Digital Terrain Modelling:
A Review of Hydrological, Geomorphological and Biological Applications.
In: Hydrological Processes An International Journal, January - March,
1991, pp. 3 - 30.
Steinwand, D.R., Hutchinson, J.A., and Snyder, J.P. ,1995, Map projections
for global and continental data sets and an analysis of pixel distortion
caused by reprojection: Photogrammetric Engineering and Remote Sensing,
v. 61, p. 1,487-1,497.
Verdin, K.L., and Greenlee, S.K., 1996. Development of continental
scale digital elevation models and extraction of hydrographic features.
In: Proceedings, Third International Conference/Workshop on Integrating
GIS and Environmental Modeling, Santa Fe, New Mexico, January 21-26,
1996. National Center for Geographic Information and Analysis, Santa
Barbara, California.
Verdin, K.L., A System for Topologically Coding Global Drainage Basins
and Stream Networks. In: Proceedings, 17th Annual ESRI Users
Conference, San Diego, California, July 1997.
Any use of trade, product, or firm names is for descriptive purposes
only and does not imply endorsement by the U.S. Government. Please note
that some U.S. Geological Survey (USGS) information contained in this
data set and documentation may be preliminary in nature and presented
prior to final review and approval by the Director of the USGS. This
information is provided with the understanding that it is not guaranteed
to be correct or complete and conclusions drawn from such information
are the sole responsibility of the user.
|