Module netCDF3

Module netCDF3

Introduction

Python interface to the netCDF version 3 library. The API modelled after Scientific.IO.NetCDF, and should be familiar to users of that module. Some new features not found in Scientific.IO.NetCDF:

Download

Requires

Install

Tutorial

1) Creating/Opening/Closing a netCDF file

To create a netCDF file from python, you simply call the Dataset constructor. This is also the method used to open an existing netCDF file. If the file is open for write access (w, r+ or a), you may write any type of data including new dimensions, variables and attributes. netCDF files come in several flavors (NETCDF3_CLASSIC, NETCDF3_64BIT, NETCDF4_CLASSIC, and NETCDF4). The first two flavors are supported by version 3 of the netCDF library, and are supported in this module. To read or write NETCDF4 and NETCDF4_CLASSIC files use the companion netCDF4 python module. The default format NETCDF3_64BIT. To see how a given file is formatted, you can examine the file_format Dataset attribute. Closing the netCDF file is accomplished via the close method of the Dataset instance.

Here's an example:

>>> import netCDF3
>>> ncfile = netCDF3.Dataset('test.nc', 'w')
>>> print ncfile.file_format
NETCDF3_64BIT
>>>
>>> ncfile.close()

2) Dimensions in a netCDF file

netCDF defines the sizes of all variables in terms of dimensions, so before any variables can be created the dimensions they use must be created first. A special case, not often used in practice, is that of a scalar variable, which has no dimensions. A dimension is created using the createDimension method of a Dataset instance. A Python string is used to set the name of the dimension, and an integer value is used to set the size. To create an unlimited dimension (a dimension that can be appended to), the size value is set to None or 0. netCDF 3 files can only have one unlimited dimension, and it must be the first (leftmost) dimension of the variable.

>>> ncfile.createDimension('press', 10)
>>> ncfile.createDimension('time', None)
>>> ncfile.createDimension('lat', 73)
>>> ncfile.createDimension('lon', 144)

All of the Dimension instances are stored in a python dictionary.

>>> print ncfile.dimensions
{'lat': <netCDF3.Dimension object at 0x24a5f7b0>, 
 'time': <netCDF3.Dimension object at 0x24a5f788>, 
 'lon': <netCDF3.Dimension object at 0x24a5f7d8>, 
 'press': <netCDF3.Dimension object at 0x24a5f760>}
>>>

Calling the python len function with a Dimension instance returns the current size of that dimension. The isunlimited method of a Dimension instance can be used to determine if the dimensions is unlimited, or appendable.

>>> for dimname, dimobj in ncfile.dimensions.iteritems():
>>>    print dimname, len(dimobj), dimobj.isunlimited()
lat 73 False
time 0 True
lon 144 False
press 10 False
>>>

Dimension names can be changed using the renameDimension method of a Dataset instance.

3) Variables in a netCDF file

netCDF variables behave much like python multidimensional array objects supplied by the numpy module. However, unlike numpy arrays, netCDF3 variables can be appended to along one 'unlimited' dimension. To create a netCDF variable, use the createVariable method of a Dataset instance. The createVariable method has two mandatory arguments, the variable name (a Python string), and the variable datatype. The variable's dimensions are given by a tuple containing the dimension names (defined previously with createDimension). To create a scalar variable, simply leave out the dimensions keyword. The variable primitive datatypes correspond to the dtype attribute of a numpy array. You can specify the datatype as a numpy dtype object, or anything that can be converted to a numpy dtype object. Valid datatype specifiers include: 'f4' (32-bit floating point), 'f8' (64-bit floating point), 'i4' (32-bit signed integer), 'i2' (16-bit signed integer), 'i1' (8-bit signed integer), or 'S1' (single-character string) The old Numeric single-character typecodes ('f','d','h', 's','b','B','c','i','l'), corresponding to ('f4','f8','i2','i2','i1','i1','S1','i4','i4'), will also work.

The dimensions themselves are usually also defined as variables, called coordinate variables. The createVariable method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.

>>> times = ncfile.createVariable('time','f8',('time',))
>>> pressure = ncfile.createVariable('press','i4',('press',))
>>> latitudes = ncfile.createVariable('latitude','f4',('lat',))
>>> longitudes = ncfile.createVariable('longitude','f4',('lon',))
>>> # two dimensions unlimited.
>>> temp = ncfile.createVariable('temp','f4',('time','press','lat','lon',))

All of the variables in the Dataset are stored in a Python dictionary, in the same way as the dimensions:

>>> print ncfile.variables
{'temp': <netCDF3.Variable object at 0x24a61068>,
 'pressure': <netCDF3.Variable object at 0.35f0f80>, 
 'longitude': <netCDF3.Variable object at 0x24a61030>,
 'pressure': <netCDF3.Variable object at 0x24a610a0>, 
 'time': <netCDF3.Variable object at 02x45f0.4.58>, 
 'latitude': <netCDF3.Variable object at 0.3f0fb8>}
>>>

Variable names can be changed using the renameVariable method of a Dataset instance.

4) Attributes in a netCDF file

There are two types of attributes in a netCDF file, global and variable. Global attributes provide information about a dataset as a whole. Variable attributes provide information about one of the variables in a dataset. Global attributes are set by assigning values to Dataset instance variables. Variable attributes are set by assigning values to Variable instance variables. Attributes can be strings, numbers or sequences. Returning to our example,

>>> import time
>>> ncfile.description = 'bogus example script'
>>> ncfile.history = 'Created ' + time.ctime(time.time())
>>> ncfile.source = 'netCDF3 python module tutorial'
>>> latitudes.units = 'degrees north'
>>> longitudes.units = 'degrees east'
>>> pressure.units = 'hPa'
>>> temp.units = 'K'
>>> times.units = 'hours since 0001-01-01 00:00:00.0'
>>> times.calendar = 'gregorian'

The ncattrs method of a Dataset or Variable instance can be used to retrieve the names of all the netCDF attributes. This method is provided as a convenience, since using the built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.

>>> for name in ncfile.ncattrs():
>>>     print 'Global attr', name, '=', getattr(ncfile,name)
Global attr description = bogus example script
Global attr history = Created Mon Nov  7 10.30:56 2005
Global attr source = netCDF3 python module tutorial

The __dict__ attribute of a Dataset or Variable instance provides all the netCDF attribute name/value pairs in a python dictionary:

>>> print ncfile.__dict__
{'source': 'netCDF3 python module tutorial',
'description': 'bogus example script',
'history': 'Created Mon Nov  7 10.30:56 2005'}

Attributes can be deleted from a netCDF Dataset or Variable using the python del statement (i.e. del var.foo removes the attribute foo the the variable var).

6) Writing data to and retrieving data from a netCDF variable

Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.

>>> import numpy 
>>> latitudes[:] = numpy.arange(-90,91,2.5)
>>> pressure[:] = numpy.arange(1000,90,-100)
>>> print 'latitudes =\n',latitudes[:]
latitudes =
[-90.  -87.5 -85.  -82.5 -80.  -77.5 -75.  -72.5 -70.  -67.5 -65.  -62.5
 -60.  -57.5 -55.  -52.5 -50.  -47.5 -45.  -42.5 -40.  -37.5 -35.  -32.5
 -30.  -27.5 -25.  -22.5 -20.  -17.5 -15.  -12.5 -10.   -7.5  -5.   -2.5
   0.    2.5   5.    7.5  10.   12.5  15.   17.5  20.   22.5  25.   27.5
  30.   32.5  35.   37.5  40.   42.5  45.   47.5  50.   52.5  55.   57.5
  60.   62.5  65.   67.5  70.   72.5  75.   77.5  80.   82.5  85.   87.5
  90. ]
>>>
>>> print 'pressure levels =\n',pressure[:]
[1000  900  800  700  600  500  400  300  200  100]
>>>

Unlike numpy array objects, netCDF Variable objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.

>>> # append along two unlimited dimensions by assigning to slice.
>>> nlats = len(ncfile.dimensions['lat'])
>>> nlons = len(ncfile.dimensions['lon'])
>>> nlevs = len(ncfile.dimensions['press'])
>>> print 'temp shape before adding data = ',temp.shape
temp shape before adding data =  (0, 10, 73, 144)
>>>
>>> from numpy.random.mtrand import uniform
>>> temp[0:5,:,:,:] = uniform(size=(5,nlevs,nlats,nlons))
>>> print 'temp shape after adding data = ',temp.shape
temp shape after adding data =  (5, 16, 73, 144)
>>>

Time coordinate values pose a special challenge to netCDF users. Most metadata standards (such as CF and COARDS) specify that time should be measure relative to a fixed date using a certain calendar, with units specified like hours since YY:MM:DD hh-mm-ss. These units can be awkward to deal with, without a utility to convert the values to and from calendar dates. The functione called num2date and date2num are provided with this package to do just that. Here's an example of how they can be used:

>>> # fill in times.
>>> from datetime import datetime, timedelta
>>> from netCDF3 import num2date, date2num
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = date2num(dates,units=times.units,calendar=times.calendar)
>>> print 'time values (in units %s): ' % times.units+'\n',times[:]
time values (in units hours since January 1, 0001): 
[ 17533056.  17533068.  17533080.  17533092.  17533104.]
>>>
>>> dates = num2date(times[:],units=times.units,calendar=times.calendar)
>>> print 'dates corresponding to time values:\n',dates
dates corresponding to time values:
[2001-03-01 00:00:00 2001-03-01 12:00:00 2001-03-02 00:00:00
 2001-03-02 12:00:00 2001-03-03 00:00:00]
>>>

num2date converts numeric values of time in the specified units and calendar to datetime objects, and date2num does the reverse. All the calendars currently defined in the CF metadata convention are supported. A function called date2index is also provided which returns the indices of a netCDF time variable corresponding to a sequence of datetime instances.

All of the code in this tutorial is available in examples/tutorial-nc3.py, Unit tests are in the test3 directory.

7) Reading data from a multi-file netCDF dataset.

If you want to read data from a variable that spans multiple netCDF files, you can use the MFDataset class to read the data as if it were contained in a single file. Instead of using a single filename to create a Dataset instance, create a MFDataset instance with either a list of filenames, or a string with a wildcard (which is then converted to a sorted list of files using the python glob module). Variables in the list of files that share the same unlimited dimension are aggregated together, and can be sliced across multiple files. To illustrate this, let's first create a bunch of netCDF files with the same variable (with the same unlimited dimension).

>>> for nfile in range(10):
>>>     f = Dataset('mftest'+repr(nfile)+'.nc','w')
>>>     f.createDimension('x',None)
>>>     x = f.createVariable('x','i',('x',))
>>>     x[0:10] = numpy.arange(nfile*10,10*(nfile+1))
>>>     f.close()

Now read all the files back in at once with MFDataset

>>> f = MFDataset('mftest*nc')
>>> print f.variables['x'][:]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
>>>

Note that MFDataset can only be used to read, not write, multi-file datasets.


Contact: Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>

Copyright: 2008 by Jeffrey Whitaker.

License: Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation. THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Version: 0.9.2

Classes
  Dataset
Dataset(self, filename, mode="r", clobber=True, format='NETCDF3_64BIT')
  Dimension
Dimension(self, dset, name, size=None)
  MFDataset
MFDataset(self, files, check=False, exclude=[])
  Variable
Variable(self, dset, name, datatype, dimensions=(), fill_value=None)
Functions
 
chartostring(b)
convert a character array to a string array with one less dimension.
 
date2index(dates, nctime, calendar=None, select='exact')
Return indices of a netCDF time variable corresponding to the given dates.
 
date2num(dates, units, calendar='standard')
Return numeric time values given datetime objects.
 
getlibversion()
returns a string describing the version of the netcdf library used to build the module, and when it was built.
 
num2date(times, units, calendar='standard')
Return datetime objects given numeric time values.
 
stringtoarr(a, NUMCHARS)
convert a string to a character array of length NUMCHARS
 
stringtochar(a)
convert a string array to a character array with one extra dimension
Variables
  __package__ = None
Function Details

chartostring(b)

 

convert a character array to a string array with one less dimension.

Parameters:
  • b - Input character array (numpy datatype 'S1'). Will be converted to a array of strings, where each string has a fixed length of b.shape[-1] characters.
Returns:
A numpy string array with datatype 'SN' and shape b.shape[:-1], where N=b.shape[-1].

date2index(dates, nctime, calendar=None, select='exact')

 

Return indices of a netCDF time variable corresponding to the given dates.

Parameters:
  • dates - A datetime object or a sequence of datetime objects. The datetime objects should not include a time-zone offset.
  • nctime - A netCDF time variable object. The nctime object must have a units attribute.
  • calendar - Describes the calendar used in the time calculation. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian' 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is 'standard', which is a mixed Julian/Gregorian calendar If calendar is None, its value is given by nctime.calendar or standard if no such attribute exists.
  • select - 'exact', 'before', 'after', 'nearest' The index selection method. exact will return the indices perfectly matching the dates given. before and after will return the indices corresponding to the dates just before or just after the given dates if an exact match cannot be found. nearest will return the indices that correspond to the closest dates.
Returns:
an index (indices) of the netCDF time variable corresponding to the given datetime object(s).

date2num(dates, units, calendar='standard')

 

Return numeric time values given datetime objects. The units of the numeric time values are described by the units argument and the calendar keyword. The datetime objects must be in UTC with no time-zone offset. If there is a time-zone offset in units, it will be applied to the returned numeric values.

Like the matplotlib date2num function, except that it allows for different units and calendars. Behaves the same if units = 'days since 0001-01-01 00:00:00' and calendar = 'proleptic_gregorian'.

Parameters:
  • dates - A datetime object or a sequence of datetime objects. The datetime objects should not include a time-zone offset.
  • units - a string of the form 'time units since reference time' describing the time units. time units can be days, hours, minutes or seconds. reference time is the time origin. A valid choice would be units='hours since 1800-01-01 00:00:00 -6:00'.
  • calendar - describes the calendar used in the time calculations. All the values currently defined in the CF metadata convention are supported. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian' 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is 'standard', which is a mixed Julian/Gregorian calendar.
Returns:
a numeric time value, or an array of numeric time values.

The maximum resolution of the numeric time values is 1 second.

num2date(times, units, calendar='standard')

 

Return datetime objects given numeric time values. The units of the numeric time values are described by the units argument and the calendar keyword. The returned datetime objects represent UTC with no time-zone offset, even if the specified units contain a time-zone offset.

Like the matplotlib num2date function, except that it allows for different units and calendars. Behaves the same if units = 'days since 001-01-01 00:00:00' and calendar = 'proleptic_gregorian'.

Parameters:
  • times - numeric time values. Maximum resolution is 1 second.
  • units - a string of the form 'time units since reference time' describing the time units. time units can be days, hours, minutes or seconds. reference time is the time origin. A valid choice would be units='hours since 1800-01-01 00:00:00 -6:00'.
  • calendar - describes the calendar used in the time calculations. All the values currently defined in the CF metadata convention are supported. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian' 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is 'standard', which is a mixed Julian/Gregorian calendar.
Returns:
a datetime instance, or an array of datetime instances.

The datetime instances returned are 'real' python datetime objects if the date falls in the Gregorian calendar (i.e. calendar='proleptic_gregorian', or calendar = 'standard' or 'gregorian' and the date is after 1582-10-15). Otherwise, they are 'phony' datetime objects which support some but not all the methods of 'real' python datetime objects. This is because the python datetime module cannot the uses the 'proleptic_gregorian' calendar, even before the switch occured from the Julian calendar in 1582. The datetime instances do not contain a time-zone offset, even if the specified units contains one.

stringtoarr(a, NUMCHARS)

 

convert a string to a character array of length NUMCHARS

Parameters:
  • a - Input python string.
  • NUMCHARS - number of characters used to represent string (if len(a) < NUMCHARS, it will be padded on the right with blanks).
Returns:
A rank 1 numpy character array of length NUMCHARS with datatype 'S1'

stringtochar(a)

 

convert a string array to a character array with one extra dimension

Parameters:
  • a - Input numpy string array with numpy datatype 'SN', where N is the number of characters in each string. Will be converted to an array of characters (datatype 'S1') of shape a.shape + (N,).
Returns:
A numpy character array with datatype 'S1' and shape a.shape + (N,), where N is the length of each string in a.