download.file {utils} | R Documentation |
This function can be used to download a file from the Internet.
download.file(url, destfile, method, quiet = FALSE, mode = "w", cacheOK = TRUE, extra = getOption("download.file.extra"))
url |
A character string naming the URL of a resource to be downloaded. |
destfile |
A character string with the name where the downloaded file is saved. Tilde-expansion is performed. |
method |
Method to be used for downloading files. Current
download methods are The method can also be set through the option
|
quiet |
If |
mode |
character. The mode with which to write the file. Useful
values are |
cacheOK |
logical. Is a server-side cached value acceptable? |
extra |
character vector of additional command-line arguments for
the |
The function download.file
can be used to download a single
file as described by url
from the internet and store it in
destfile
.
The url
must start with a scheme such as
http://, ftp:// or file://.
If method = "auto"
is chosen (the default), the internal method
is chosen for file:// URLs, and for the others provided
capabilities("http/ftp")
is true (which it almost always
is). Otherwise methods "wget"
and "curl"
.
Support for method "libcurl"
is optional: use
capabilities("libcurl")
to see if it is supported on
your build. It provides (non-blocking) access to https:// and
ftps:// URLs. There is support for simultaneous downloads, so
url
and destfile
can be character vectors of the same
length greater than one. For a single URL and quiet = FALSE
there a progress bar is shown in interactive use.
For methods "wget"
and "curl"
a system call is made to
the tool given by method
, and the respective program must be
installed on your system and be in the search path for executables.
They will block all other activity on the R process until they
complete: this may make a GUI unresponsive.
cacheOK = FALSE
is useful for http:// URLs, and will
attempt to get a copy directly from the site rather than from an
intermediate cache. It is used by available.packages
.
The remaining details apply to method "internal"
only.
Note that https:// URLs are not supported by the internal method.
See url
for how file:// URLs are interpreted,
especially on Windows. This function does decode encoded URLs.
The timeout for many parts of the transfer can be set by the option
timeout
which defaults to 60 seconds.
The level of detail provided during transfer can be set by the
quiet
argument and the internet.info
option. The
details depend on the platform and scheme, but setting
internet.info
to 0 gives all available details, including
all server responses. Using 2 (the default) gives only serious
messages, and 3 or more suppresses all messages.
A progress bar tracks the transfer. If the file length is known, an equals sign represents 2% of the transfer completed: otherwise a dot represents 10Kb.
Code written to download binary files must use mode = "wb"
, but
the problems incurred by a text transfer will only be seen on Windows.
An (invisible) integer code, 0
for success and non-zero for
failure. For the "wget"
and "curl"
methods this is the
status code returned by the external program. The "internal"
method can return 1
, but will in most cases throw an error.
This applies to the internal code only.
Proxies can be specified via environment variables.
Setting no_proxy to *
stops any proxy being tried.
Otherwise the setting of http_proxy or ftp_proxy
(or failing that, the all upper-case version) is consulted and if
non-empty used as a proxy site. For FTP transfers, the username
and password on the proxy can be specified by ftp_proxy_user
and ftp_proxy_password. The form of http_proxy
should be http://proxy.dom.com/
or
http://proxy.dom.com:8080/
where the port defaults to
80
and the trailing slash may be omitted. For
ftp_proxy use the form ftp://proxy.dom.com:3128/
where the default port is 21
. These environment variables
must be set before the download code is first used: they cannot be
altered later by calling Sys.setenv
.
Usernames and passwords can be set for HTTP proxy transfers via
environment variable http_proxy_user in the form
user:passwd
. Alternatively, http_proxy can be of the
form http://user:pass@proxy.dom.com:8080/
for compatibility
with wget
. Only the HTTP/1.0 basic authentication scheme is
supported.
Much the same scheme is supported by method = "libcurl"
, including
no_proxy, http_proxy and ftp_proxy, and for the last
two a contents of [user:password@]machine[:port]
where the
parts in brackets are optional. See
http://curl.haxx.se/libcurl/c/libcurl-tutorial.html for details.
Methods which access https:// and ftps:// URLs usually try to verify their certificates. This is usually done using the CA root certificates installed by the OS (although we have seen instances in which these got removed rather than updated).
This is an issue for method = "libcurl"
on Windows, where the
OS does not provide a suitable CA certificate bundle, so by default on
Windows certificates are not verified. To turn verification on, set
environment variable CURL_CA_BUNDLE to the path to a certificate
bundle file, usually named ‘ca-bundle.crt’ or
‘curl-ca-bundle.crt’. (This is normally done for a binary
installation of R, which installs ‘etc/curl-ca-bundle.crt’.)
Files of more than 2GB are supported on 64-bit builds of R; they may be truncated on some 32-bit builds.
Method "wget"
is mainly for historical compatibility, but it
and "curl"
can be used for URLs (e.g., https:// URLs or
those that use cookies) which the internal method does not support.
Method "wget"
can be used with proxy firewalls which require
user/password authentication if proper values are stored in the
configuration file for wget
.
wget
(http://www.gnu.org/software/wget/) is commonly
installed on Unix-alikes (but not OS X). Windows binaries are
available from Cygwin, gnuwin32 and elsewhere.
curl
(http://curl.haxx.se/) is installed on OS X and
commonly on Unix-alikes. Windows binaries are available at that URL.
options
to set the HTTPUserAgent
, timeout
and internet.info
options.
url
for a finer-grained way to read data from URLs.
url.show
, available.packages
,
download.packages
for applications.
Contributed package RCurl provides more comprehensive facilities to download from URLs.