HdfsCLI¶
API and command line interface for HDFS.
Installation¶
Using pip:
$ pip install hdfs
By default none of the package requirements for extensions are installed. To do so simply suffix the package name with the desired extensions:
$ pip install hdfs[avro,dataframe,kerberos]
User guide¶
Sample script¶
#!/usr/bin/env python
# encoding: utf-8
"""Sample HdfsCLI script.
This example shows how to write files to HDFS, read them back, and perform a
few other simple filesystem operations.
"""
from hdfs import Config
from json import dump, load
# Get the default alias' client. (See the quickstart section in the
# documentation to learn more about this.)
client = Config().get_client()
# Some fake data that we are interested in uploading to HDFS.
model = {
'(intercept)': 48.,
'first_feature': 2.,
'second_feature': 12.,
}
# First, we delete any existing `models/` folder on HDFS.
client.delete('models', recursive=True)
# We can now upload the data, first as CSV.
with client.write('models/1.csv', encoding='utf-8') as writer:
for item in model.items():
writer.write(u'%s,%s\n' % item)
# We can also serialize it to JSON and directly upload it.
with client.write('models/1.json', encoding='utf-8') as writer:
dump(model, writer)
# We can check that the files exist and get their properties.
assert client.list('models') == ['1.csv', '1.json']
status = client.status('models/1.csv')
content = client.content('models/1.json')
# Later, we can download the files back. The `delimiter` option makes it
# convenient to read CSV files.
with client.read('models/1.csv', delimiter='\n', encoding='utf-8') as reader:
items = (line.split(',') for line in reader if line)
assert dict((name, float(value)) for name, value in items) == model
# Loading JSON directly from HDFS is even simpler.
with client.read('models/1.json', encoding='utf-8') as reader:
assert load(reader) == model
More examples can be found in the examples/
folder on GitHub.