Home | Trees | Indices | Help |
---|
|
Python interface to the netCDF version 4 library. netCDF version 4 has many features not found in earlier versions of the library and is implemented on top of HDF5. This module can read and write files in both the new netCDF 4 and the old netCDF 3 format, and can create files that are readable by HDF5 clients. The API modelled after Scientific.IO.NetCDF, and should be familiar to users of that module.
Most new features of netCDF 4 are implemented, such as multiple unlimited dimensions, groups and zlib data compression. All the new numeric data types (such as 64 bit and unsigned integer types) are implemented. Compound and variable length (vlen) data types are supported, but the enum and opaque data types are not. Mixtures of compound and vlen data types (compound types containing vlens, and vlens containing compound types) are not supported.
--enable-hl --enable-shared
'.
--enable-netcdf-4 --enable-shared
', and
set CPPFLAGS="-I $HDF5_DIR/include"
and
LDFLAGS="-L $HDF5_DIR/lib"
, where
$HDF5_DIR
is the directory where HDF5 was installed.
If you want OPeNDAP
support, add '--enable-dap
'. If you want HDF4 SD
support, add '--enable-hdf4
' and add the location of
the HDF4 headers and library to CPPFLAGS
and
LDFLAGS
.
HDF5_DIR
environment variable to
point to where HDF5 is installed (the libs in
$HDF5_DIR/lib
, the headers in
$HDF5_DIR/include
). If the headers and libs are
installed in different places, you can use HDF5_INCDIR
and HDF5_LIBDIR
to define the locations of the headers
and libraries independently.
NETCDF4_DIR
(or
NETCDF4_INCDIR
and NETCDF4_LIBDIR
)
environment variable(s) to point to where the netCDF version 4
library and headers are installed.
SZIP_DIR
(or SZIP_INCDIR
and
SZIP_LIBDIR
) environment variable(s) to point to where
szip is installed. Note that the netCDF library does not support
creating szip compressed files, but can read szip compressed files
if the HDF5 lib is configured to support szip.
python setup.py build
, then python setup.py
install
(as root if necessary).
setup.cfg
to specify them. To use this method, copy the file
setup.cfg.template
to setup.cfg
, then
open setup.cfg
in a text editor and follow the
instructions in the comments for editing. If you use
setup.cfg
, environment variables will be ignored.
python
run_all.py
.
To create a netCDF file from python, you simply call the Dataset
constructor. This is also the method used to open an existing netCDF
file. If the file is open for write access (w, r+
or
a
), you may write any type of data including new
dimensions, groups, variables and attributes. netCDF files come in
several flavors (NETCDF3_CLASSIC, NETCDF3_64BIT,
NETCDF4_CLASSIC
, and NETCDF4
). The first two
flavors are supported by version 3 of the netCDF library.
NETCDF4_CLASSIC
files use the version 4 disk format
(HDF5), but do not use any features not found in the version 3 API.
They can be read by netCDF 3 clients only if they have been relinked
against the netCDF 4 library. They can also be read by HDF5 clients.
NETCDF4
files use the version 4 disk format (HDF5) and
use the new features of the version 4 API. The netCDF4
module can read and write files in any of these formats. When
creating a new file, the format may be specified using the
format
keyword in the Dataset
constructor.
The default format is NETCDF4
. To see how a given file
is formatted, you can examine the file_format
Dataset attribute.
Closing the netCDF file is accomplished via the close method
of the Dataset
instance.
Here's an example:
>>> from netCDF4 import Dataset >>> rootgrp = Dataset('test.nc', 'w', format='NETCDF4') >>> print rootgrp.file_format NETCDF4 >>> >>> rootgrp.close()
Remote OPeNDAP-hosted datasets can be accessed for reading
over http if a URL is provided to the Dataset
constructor instead of a filename. However, this requires that the
netCDF library be built with OPenDAP support, via the
--enable-dap
configure option (added in version
4.0.1).
netCDF version 4 added support for organizing data in hierarchical
groups, which are analagous to directories in a filesystem. Groups
serve as containers for variables, dimensions and attributes, as well
as other groups. A netCDF4.Dataset
defines creates a
special group, called the 'root group', which is similar to the root
directory in a unix filesystem. To create Group instances, use
the createGroup method of a Dataset or Group instance. createGroup takes a single argument, a python string
containing the name of the new group. The new Group instances
contained within the root group can be accessed by name using the
groups
dictionary attribute of the Dataset instance.
Only NETCDF4
formatted files support Groups, if you try
to create a Group in a netCDF 3 file you will get an error
message.
>>> rootgrp = Dataset('test.nc', 'a') >>> fcstgrp = rootgrp.createGroup('forecasts') >>> analgrp = rootgrp.createGroup('analyses') >>> print rootgrp.groups OrderedDict([('forecasts', <netCDF4.Group object at 0x1b4b7b0>), ('analyses', <netCDF4.Group object at 0x1b4b970>)]) >>>
Groups can exist within groups in a Dataset, just as
directories exist within directories in a unix filesystem. Each Group instance has a
'groups'
attribute dictionary containing all of the
group instances contained within that group. Each Group instance also
has a 'path'
attribute that contains a simulated unix
directory path to that group.
Here's an example that shows how to navigate all the groups in a
Dataset. The
function walktree
is a Python generator that is used to
walk the directory tree. Note that printing the Dataset or Group object yields
summary information about it's contents.
>>> fcstgrp1 = fcstgrp.createGroup('model1') >>> fcstgrp2 = fcstgrp.createGroup('model2') >>> def walktree(top): >>> values = top.groups.values() >>> yield values >>> for value in top.groups.values(): >>> for children in walktree(value): >>> yield children >>> print rootgrp >>> for children in walktree(rootgrp): >>> for child in children: >>> print child <type 'netCDF4.Dataset'> root group (NETCDF4 file format): dimensions: variables: groups: forecasts, analyses <type 'netCDF4.Group'> group /forecasts: dimensions: variables: groups: model1, model2 <type 'netCDF4.Group'> group /analyses: dimensions: variables: groups: <type 'netCDF4.Group'> group /forecasts/model1: dimensions: variables: groups: <type 'netCDF4.Group'> group /forecasts/model2: dimensions: variables: groups: >>>
netCDF defines the sizes of all variables in terms of dimensions,
so before any variables can be created the dimensions they use must
be created first. A special case, not often used in practice, is that
of a scalar variable, which has no dimensions. A dimension is created
using the createDimension method of a Dataset or Group instance. A
Python string is used to set the name of the dimension, and an
integer value is used to set the size. To create an unlimited
dimension (a dimension that can be appended to), the size value is
set to None
or 0. In this example, there both the
time
and level
dimensions are unlimited.
Having more than one unlimited dimension is a new netCDF 4 feature,
in netCDF 3 files there may be only one, and it must be the first
(leftmost) dimension of the variable.
>>> level = rootgrp.createDimension('level', None) >>> time = rootgrp.createDimension('time', None) >>> lat = rootgrp.createDimension('lat', 73) >>> lon = rootgrp.createDimension('lon', 144)
All of the Dimension instances are stored in a python dictionary.
>>> print rootgrp.dimensions OrderedDict([('level', <netCDF4.Dimension object at 0x1b48030>), ('time', <netCDF4.Dimension object at 0x1b481c0>), ('lat', <netCDF4.Dimension object at 0x1b480f8>), ('lon', <netCDF4.Dimension object at 0x1b48a08>)]) >>>
Calling the python len
function with a Dimension
instance returns the current size of that dimension. The isunlimited method of a Dimension
instance can be used to determine if the dimensions is unlimited, or
appendable.
>>> print len(lon) 144 >>> print len.is_unlimited() False >>> print time.is_unlimited() True >>>
Printing the Dimension object provides useful summary info, including the name and length of the dimension, and whether it is unlimited.
>>> for dimobj in rootgrp.dimensions.values(): >>> print dimobj <type 'netCDF4.Dimension'> (unlimited): name = 'level', size = 0 <type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0 <type 'netCDF4.Dimension'>: name = 'lat', size = 73 <type 'netCDF4.Dimension'>: name = 'lon', size = 144 <type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0 >>>
Dimension names can be changed using the renameDimension method of a Dataset or Group instance.
netCDF variables behave much like python multidimensional array
objects supplied by the numpy module. However, unlike numpy arrays, netCDF4
variables can be appended to along one or more 'unlimited'
dimensions. To create a netCDF variable, use the createVariable method of a Dataset or Group instance. The
createVariable method has two mandatory arguments,
the variable name (a Python string), and the variable datatype. The
variable's dimensions are given by a tuple containing the dimension
names (defined previously with createDimension). To create a scalar variable,
simply leave out the dimensions keyword. The variable primitive
datatypes correspond to the dtype attribute of a numpy array. You can
specify the datatype as a numpy dtype object, or anything that can be
converted to a numpy dtype object. Valid datatype specifiers
include: 'f4'
(32-bit floating point), 'f8'
(64-bit floating point), 'i4'
(32-bit signed integer),
'i2'
(16-bit signed integer), 'i8'
(64-bit
singed integer), 'i1'
(8-bit signed integer),
'u1'
(8-bit unsigned integer), 'u2'
(16-bit
unsigned integer), 'u4'
(32-bit unsigned integer),
'u8'
(64-bit unsigned integer), or 'S1'
(single-character string). The old Numeric single-character
typecodes ('f'
,'d'
,'h'
,
's'
,'b'
,'B'
,'c'
,'i'
,'l'
),
corresponding to
('f4'
,'f8'
,'i2'
,'i2'
,'i1'
,'i1'
,'S1'
,'i4'
,'i4'
),
will also work. The unsigned integer types and the 64-bit integer
type can only be used if the file format is NETCDF4
.
The dimensions themselves are usually also defined as variables, called coordinate variables. The createVariable method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.
>>> times = rootgrp.createVariable('time','f8',('time',)) >>> levels = rootgrp.createVariable('level','i4',('level',)) >>> latitudes = rootgrp.createVariable('latitude','f4',('lat',)) >>> longitudes = rootgrp.createVariable('longitude','f4',('lon',)) >>> # two dimensions unlimited. >>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))
All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:
>>> print rootgrp.variables OrderedDict([('time', <netCDF4.Variable object at 0x1b4ba70>), ('level', <netCDF4.Variable object at 0x1b4bab0>), ('latitude', <netCDF4.Variable object at 0x1b4baf0>), ('longitude', <netCDF4.Variable object at 0x1b4bb30>), ('temp', <netCDF4.Variable object at 0x1b4bb70>)]) >>>
To get summary info on a Variable instance in an interactive session, just print it.
>>> print rootgrp.variables['temp'] <type 'netCDF4.Variable'> float32 temp(time, level, lat, lon) least_significant_digit: 3 units: K unlimited dimensions: time, level current shape = (0, 0, 73, 144) >>>
Variable names can be changed using the renameVariable method of a Dataset instance.
There are two types of attributes in a netCDF file, global and variable. Global attributes provide information about a group, or the entire dataset, as a whole. Variable attributes provide information about one of the variables in a group. Global attributes are set by assigning values to Dataset or Group instance variables. Variable attributes are set by assigning values to Variable instances variables. Attributes can be strings, numbers or sequences. Returning to our example,
>>> import time >>> rootgrp.description = 'bogus example script' >>> rootgrp.history = 'Created ' + time.ctime(time.time()) >>> rootgrp.source = 'netCDF4 python module tutorial' >>> latitudes.units = 'degrees north' >>> longitudes.units = 'degrees east' >>> levels.units = 'hPa' >>> temp.units = 'K' >>> times.units = 'hours since 0001-01-01 00:00:00.0' >>> times.calendar = 'gregorian'
The ncattrs method of a Dataset, Group or Variable instance
can be used to retrieve the names of all the netCDF attributes. This
method is provided as a convenience, since using the built-in
dir
Python function will return a bunch of private
methods and attributes that cannot (or should not) be modified by the
user.
>>> for name in rootgrp.ncattrs(): >>> print 'Global attr', name, '=', getattr(rootgrp,name) Global attr description = bogus example script Global attr history = Created Mon Nov 7 10.30:56 2005 Global attr source = netCDF4 python module tutorial
The __dict__
attribute of a Dataset, Group or Variable instance
provides all the netCDF attribute name/value pairs in a python
dictionary:
>>> print rootgrp.__dict__ OrderedDict([(u'description', u'bogus example script'), (u'history', u'Created Thu Mar 3 19:30:33 2011'), (u'source', u'netCDF4 python module tutorial')])
Attributes can be deleted from a netCDF Dataset, Group or Variable using
the python del
statement (i.e. del grp.foo
removes the attribute foo
the the group
grp
).
Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.
>>> import numpy >>> lats = numpy.arange(-90,91,2.5) >>> lons = numpy.arange(-180,180,2.5) >>> latitudes[:] = lats >>> longitudes[:] = lons >>> print 'latitudes =\n',latitudes[:] latitudes = [-90. -87.5 -85. -82.5 -80. -77.5 -75. -72.5 -70. -67.5 -65. -62.5 -60. -57.5 -55. -52.5 -50. -47.5 -45. -42.5 -40. -37.5 -35. -32.5 -30. -27.5 -25. -22.5 -20. -17.5 -15. -12.5 -10. -7.5 -5. -2.5 0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5 25. 27.5 30. 32.5 35. 37.5 40. 42.5 45. 47.5 50. 52.5 55. 57.5 60. 62.5 65. 67.5 70. 72.5 75. 77.5 80. 82.5 85. 87.5 90. ] >>>
Unlike NumPy's array objects, netCDF Variable objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.
>>> # append along two unlimited dimensions by assigning to slice. >>> nlats = len(rootgrp.dimensions['lat']) >>> nlons = len(rootgrp.dimensions['lon']) >>> print 'temp shape before adding data = ',temp.shape temp shape before adding data = (0, 0, 73, 144) >>> >>> from numpy.random import uniform >>> temp[0:5,0:10,:,:] = uniform(size=(5,10,nlats,nlons)) >>> print 'temp shape after adding data = ',temp.shape temp shape after adding data = (6, 10, 73, 144) >>> >>> # levels have grown, but no values yet assigned. >>> print 'levels shape after adding pressure data = ',levels.shape levels shape after adding pressure data = (10,) >>>
Note that the size of the levels variable grows when data is
appended along the level
dimension of the variable
temp
, even though no data has yet been assigned to
levels.
>>> # now, assign data to levels dimension variable. >>> levels[:] = [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]
However, that there are some differences between NumPy and netCDF
variable slicing rules. Slices behave as usual, being specified as a
start:stop:step
triplet. Using a scalar integer index
i
takes the ith element and reduces the rank of the
output array by one. Boolean array and integer sequence indexing
behaves differently for netCDF variables than for numpy arrays. Only
1-d boolean arrays and integer sequences are allowed, and these
indices work independently along each dimension (similar to the way
vector subscripts work in fortran). This means that
>>> temp[0, 0, [0,1,2,3], [0,1,2,3]]
returns an array of shape (4,4) when slicing a netCDF variable,
but for a numpy array it returns an array of shape (4,). Similarly, a
netCDF variable of shape (2,3,4,5)
indexed with
[0, array([True, False, True]), array([False, True, True,
True]), :]
would return a (2, 3, 5)
array. In
NumPy, this would raise an error since it would be equivalent to
[0, [0,1], [1,2,3], :]
. While this behaviour can cause
some confusion for those used to NumPy's 'fancy indexing' rules, it
provides a very powerful way to extract data from multidimensional
netCDF variables by using logical operations on the dimension arrays
to create slices.
For example,
>>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
will extract time indices 0,2 and 4, pressure levels 850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern Hemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).
>>> print 'shape of fancy temp slice = ',tempdat.shape shape of fancy temp slice = (3, 3, 36, 71) >>>
Time coordinate values pose a special challenge to netCDF users.
Most metadata standards (such as CF and COARDS) specify that time
should be measure relative to a fixed date using a certain calendar,
with units specified like hours since YY:MM:DD hh-mm-ss
.
These units can be awkward to deal with, without a utility to convert
the values to and from calendar dates. The functione called num2date and date2num are
provided with this package to do just that. Here's an example of how
they can be used:
>>> # fill in times. >>> from datetime import datetime, timedelta >>> from netCDF4 import num2date, date2num >>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])] >>> times[:] = date2num(dates,units=times.units,calendar=times.calendar) >>> print 'time values (in units %s): ' % times.units+'\n',times[:] time values (in units hours since January 1, 0001): [ 17533056. 17533068. 17533080. 17533092. 17533104.] >>> >>> dates = num2date(times[:],units=times.units,calendar=times.calendar) >>> print 'dates corresponding to time values:\n',dates dates corresponding to time values: [2001-03-01 00:00:00 2001-03-01 12:00:00 2001-03-02 00:00:00 2001-03-02 12:00:00 2001-03-03 00:00:00] >>>
num2date
converts numeric values of time in the specified units
and calendar
to datetime objects, and date2num does
the reverse. All the calendars currently defined in the CF metadata convention are supported. A function
called date2index is also provided which returns the
indices of a netCDF time variable corresponding to a sequence of
datetime instances.
If you want to read data from a variable that spans multiple
netCDF files, you can use the MFDataset class to read the data as if it were
contained in a single file. Instead of using a single filename to
create a Dataset instance, create a MFDataset
instance with either a list of filenames, or a string with a wildcard
(which is then converted to a sorted list of files using the python
glob module). Variables in the list of files that share the same
unlimited dimension are aggregated together, and can be sliced across
multiple files. To illustrate this, let's first create a bunch of
netCDF files with the same variable (with the same unlimited
dimension). The files must in be in NETCDF3_64BIT
,
NETCDF3_CLASSIC
or NETCDF4_CLASSIC format
(NETCDF4
formatted multi-file datasets are not
supported).
>>> for nfile in range(10): >>> f = Dataset('mftest'+repr(nfile)+'.nc','w',format='NETCDF4_CLASSIC') >>> f.createDimension('x',None) >>> x = f.createVariable('x','i',('x',)) >>> x[0:10] = numpy.arange(nfile*10,10*(nfile+1)) >>> f.close()
Now read all the files back in at once with MFDataset
>>> from netCDF4 import MFDataset >>> f = MFDataset('mftest*nc') >>> print f.variables['x'][:] [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99] >>>
Note that MFDataset can only be used to read, not write, multi-file datasets.
Data stored in netCDF 4 Variable objects can be compressed and decompressed
on the fly. The parameters for the compression are determined by the
zlib
, complevel
and shuffle
keyword arguments to the createVariable method. To turn on compression, set
zlib=True
. The complevel
keyword regulates
the speed and efficiency of the compression (1 being fastest, but
lowest compression ratio, 9 being slowest but best compression
ratio). The default value of complevel
is 4. Setting
shuffle=False
will turn off the HDF5 shuffle filter,
which de-interlaces a block of data before compression by reordering
the bytes. The shuffle filter can significantly improve compression
ratios, and is on by default. Setting fletcher32
keyword argument to createVariable to True
(it's
False
by default) enables the Fletcher32 checksum
algorithm for error detection. It's also possible to set the HDF5
chunking parameters and endian-ness of the binary data stored in the
HDF5 file with the chunksizes
and endian
keyword arguments to createVariable. These keyword arguments only are
relevant for NETCDF4
and NETCDF4_CLASSIC
files (where the underlying file format is HDF5) and are silently
ignored if the file format is NETCDF3_CLASSIC
or
NETCDF3_64BIT
,
If your data only has a certain number of digits of precision (say
for example, it is temperature data that was measured with a
precision of 0.1 degrees), you can dramatically improve zlib
compression by quantizing (or truncating) the data using the
least_significant_digit
keyword argument to createVariable. The least significant digit is the
power of ten of the smallest decimal place in the data that is a
reliable value. For example if the data has a precision of 0.1, then
setting least_significant_digit=1
will cause data the
data to be quantized using
numpy.around(scale*data)/scale
, where scale = 2**bits,
and bits is determined so that a precision of 0.1 is retained (in
this case bits=4). Effectively, this makes the compression 'lossy'
instead of 'lossless', that is some precision in the data is
sacrificed for the sake of disk space.
In our example, try replacing the line
>>> temp = rootgrp.createVariable('temp','f4',('time','level','lat','lon',))
with
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),zlib=True)
and then
>>> temp = dataset.createVariable('temp','f4',('time','level','lat','lon',),zlib=True,least_significant_digit=3)
and see how much smaller the resulting files are.
Compound data types map directly to numpy structured (a.k.a 'record' arrays). Structured arrays are akin to C structs, or derived types in Fortran. They allow for the construction of table-like structures composed of combinations of other data types, including other compound types. Compound types might be useful for representing multiple parameter values at each point on a grid, or at each time and space location for scattered (point) data. You can then access all the information for a point by reading one variable, instead of reading different parameters from different variables. Compound data types are created from the corresponding numpy data type using the createCompoundType method of a Dataset or Group instance. Since there is no native complex data type in netcdf, compound types are handy for storing numpy complex arrays. Here's an example:
>>> f = Dataset('complex.nc','w') >>> size = 3 # length of 1-d complex array >>> # create sample complex data. >>> datac = numpy.exp(1j*(1.+numpy.linspace(0, numpy.pi, size))) >>> # create complex128 compound data type. >>> complex128 = numpy.dtype([('real',numpy.float64),('imag',numpy.float64)]) >>> complex128_t = f.createCompoundType(complex128,'complex128') >>> # create a variable with this data type, write some data to it. >>> f.createDimension('x_dim',None) >>> v = f.createVariable('cmplx_var',complex128_t,'x_dim') >>> data = numpy.empty(size,complex128) # numpy structured array >>> data['real'] = datac.real; data['imag'] = datac.imag >>> v[:] = data # write numpy structured array to netcdf compound var >>> # close and reopen the file, check the contents. >>> f.close(); f = Dataset('complex.nc') >>> v = f.variables['cmplx_var'] >>> datain = v[:] # read in all the data into a numpy structured array >>> # create an empty numpy complex array >>> datac2 = numpy.empty(datain.shape,numpy.complex128) >>> # .. fill it with contents of structured array. >>> datac2.real = datain['real']; datac2.imag = datain['imag'] >>> print datac.dtype,datac # original data complex128 [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j] >>> >>> print datac2.dtype,datac2 # data from file complex128 [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j] >>>
Compound types can be nested, but you must create the 'inner' ones first. All of the compound types defined for a Dataset or Group are stored in a Python dictionary, just like variables and dimensions. As always, printing objects gives useful summary information in an interactive session:
>>> print f <type 'netCDF4.Dataset'> root group (NETCDF4 file format): dimensions: x_dim variables: cmplx_var groups: <type 'netCDF4.Variable'> >>> print f.variables['cmplx_var'] compound cmplx_var(x_dim) compound data type: [('real', '<f8'), ('imag', '<f8')] unlimited dimensions: x_dim current shape = (3,) >>> print f.cmptypes OrderedDict([('complex128', <netCDF4.CompoundType object at 0x1029eb7e8>)]) >>> print f.cmptypes['complex128'] <type 'netCDF4.CompoundType'>: name = 'complex128', numpy dtype = [(u'real','<f8'), (u'imag', '<f8')] >>>
NetCDF 4 has support for variable-length or "ragged" arrays. These are arrays of variable length sequences having the same type. To create a variable-length data type, use the createVLType method method of a Dataset or Group instance.
>>> f = Dataset('tst_vlen.nc','w') >>> vlen_t = f.createVLType(numpy.int32, 'phony_vlen')
The numpy datatype of the variable-length sequences and the name of the new datatype must be specified. Any of the primitive datatypes can be used (signed and unsigned integers, 32 and 64 bit floats, and characters), but compound data types cannot. A new variable can then be created using this datatype.
>>> x = f.createDimension('x',3) >>> y = f.createDimension('y',4) >>> vlvar = f.createVariable('phony_vlen_var', vlen_t, ('y','x'))
Since there is no native vlen datatype in numpy, vlen arrays are
represented in python as object arrays (arrays of dtype
object
). These are arrays whose elements are Python
object pointers, and can contain any type of python object. For this
application, they must contain 1-D numpy arrays all of the same type
but of varying length. In this case, they contain 1-D numpy
int32
arrays of random length betwee 1 and 10.
>>> import random >>> data = numpy.empty(len(y)*len(x),object) >>> for n in range(len(y)*len(x)): >>> data[n] = numpy.arange(random.randint(1,10),dtype='int32')+1 >>> data = numpy.reshape(data,(len(y),len(x))) >>> vlvar[:] = data >>> print 'vlen variable =\n',vlvar[:] vlen variable = [[[ 1 2 3 4 5 6 7 8 9 10] [1 2 3 4 5] [1 2 3 4 5 6 7 8]] [[1 2 3 4 5 6 7] [1 2 3 4 5 6] [1 2 3 4 5]] [[1 2 3 4 5] [1 2 3 4] [1]] [[ 1 2 3 4 5 6 7 8 9 10] [ 1 2 3 4 5 6 7 8 9 10] [1 2 3 4 5 6 7 8]]] >>> print f <type 'netCDF4.Dataset'> root group (NETCDF4 file format): dimensions: x, y variables: phony_vlen_var groups: >>> print f.variables['phony_vlen_var'] <type 'netCDF4.Variable'> vlen phony_vlen_var(y, x) vlen data type: int32 unlimited dimensions: current shape = (4, 3) >>> print f.VLtypes['phony_vlen'] <type 'netCDF4.VLType'>: name = 'phony_vlen', numpy dtype = int32 >>>
Numpy object arrays containing python strings can also be written
as vlen variables, For vlen strings, you don't need to create a vlen
data type. Instead, simply use the python str
builtin
instead of a numpy datatype when calling the createVariable method.
>>> z = f.createDimension('z',10) >>> strvar = rootgrp.createVariable('strvar', str, 'z')
In this example, an object array is filled with random python strings with random lengths between 2 and 12 characters, and the data in the object array is assigned to the vlen string variable.
>>> chars = '1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> data = NP.empty(10,'O') >>> for n in range(10): >>> stringlen = random.randint(2,12) >>> data[n] = ''.join([random.choice(chars) for i in range(stringlen)]) >>> strvar[:] = data >>> print 'variable-length string variable:\n',strvar[:] variable-length string variable: [aDy29jPt jd7aplD b8t4RM jHh8hq KtaPWF9cQj Q1hHN5WoXSiT MMxsVeq td LUzvVTzj 5DS9X8S] >>> print f <type 'netCDF4.Dataset'> root group (NETCDF4 file format): dimensions: x, y, z variables: phony_vlen_var, strvar groups: >>> print f.variables['strvar'] <type 'netCDF4.Variable'> vlen strvar(z) vlen data type: <type 'str'> unlimited dimensions: current size = (10,) >>>
All of the code in this tutorial is available in
examples/tutorial.py
, Unit tests are in the
test
directory.
Contact: Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>
Copyright: 2008 by Jeffrey Whitaker.
License: Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation. THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Version: 1.0.1
Classes | |
CompoundType A CompoundType instance is used to describe a compound data type. |
|
Dataset Dataset(self, filename, mode="r", clobber=True, diskless=False, persist=False, format='NETCDF4') |
|
Dimension Dimension(self, group, name, size=None) |
|
Group Group(self, parent, name) |
|
MFDataset MFDataset(self, files, check=False, aggdim=None, exclude=[]) |
|
MFTime MFTime(self, time, units=None) |
|
VLType A VLType instance is used to describe a variable length (VLEN) data type. |
|
Variable Variable(self, group, name, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None) |
Functions | |||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Variables | |
NC_DISKLESS = 8
|
|
__hdf5libversion__ =
|
|
__netcdf4libversion__ =
|
|
__package__ = None
|
|
__required_hdf5version__ =
|
|
__required_netcdf4version__ =
|
|
default_encoding =
|
|
default_fillvals =
|
|
python3 = False
|
|
unicode_error =
|
Function Details |
convert a character array to a string array with one less dimension.
|
Return indices of a netCDF time variable corresponding to the given dates.
|
Return numeric time values given datetime objects. The units of the
numeric time values are described by the Like the matplotlib
|
Return datetime objects given numeric time values. The units of the
numeric time values are described by the Like the matplotlib
|
convert a string to a character array of length NUMCHARS
|
convert a string array to a character array with one extra dimension
|
Variables Details |
default_fillvals
|
Home | Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Wed Oct 3 10:05:05 2012 | http://epydoc.sourceforge.net |