boost/filesystem/path.hpp

Introduction
Grammar for generic path strings
Canonical form
Header synopsis
Class path
Member functions
Non-member functions
Validity checking functions
Rationale
Path decomposition examples

Introduction

Filesystem Library functions traffic in objects of class path, provided by this header. The header also supplies non-member functions for error checking.

For actual operations on files and directories, see boost/filesystem/operations.hpp documentation.

For file I/O stream operations, see boost/filesystem/fstream.hpp documentation.

The Filesystem Library's Common Specifications apply to all member and non-member functions supplied by this header.

Class path

Class path provides for portable mechanism for representing paths in C++ programs, using a portable generic path string grammar. Class path is concerned with the lexical and syntactic aspects of a path. The path does not have to exist in the operating system's filesystem, and may contain names which are not even valid for the current operating system.

Rationale: If Filesystem functions trafficked in std::strings or C-style strings, the functions would provide only an illusion of portability since the function calls would be portable but the strings they operate on would not be portable.

Conceptual model of a path

An object of class path can be conceptualized as containing a sequence of strings, where each string contains the name of a directory, or, in the case of the string representing the element farthest from the root in the directory hierarchy, the name of a directory or file. Such a path representation is independent of any particular representation of the path as a single string.

There is no requirement that an implementation of class path actually contain a sequence of strings, but conceptualizing the contents as a sequence of strings provides a completely portable way to reason about paths.

So that programs can portably express paths as a single string, class path defines a grammar for a portable generic path string format, and supplies constructor and append operations taking such strings as arguments. Because user input or third-party library functions may supply path strings formatted according to operating system specific rules, an additional constructor is provided which takes a system-specific format as an argument.

Access functions are provided to retrieve the contents of a object of class path formatted as a portable path string, a directory path string using the operating system's format, and a file path string using the operating system's format.  Additional access functions retrieve specific portions of the contained path.

Grammar for portable generic path strings

The grammar is specified in extended BNF, with terminal symbols in quotes:

path ::= [root] [relative-path]  // an empty path is valid
root ::= [root-name] [root-directory]
root-directory ::= "/"
relative-path ::= path-element { "/" path-element } ["/"]
path-element ::= name | parent-directory 
parent-directory ::= ".." 
name ::= char { char }

The following are not valid name char's: x01-x1F, <, >, :, ", /, \, |, *, ?. Although these characters are supported by some operating systems, they are disallowed by so many operating systems that they are banned altogether.

root-name grammar is implementation-defined. root-name must not be present in generic input (the undecorated path constructors); it may be part of the strings returned by path member functions, and may be present in the argument to path constructors with the native decorator.

Although implementation-defined, it is desirable that root-name have a grammar which is distinguishable from other grammar elements, and follow the conventions of the operating system.

The optional trailing "/" in a relative-path is allowed as a notational convenience. It has no semantic meaning and is discarded in conversions to canonical form.

Whether or not a generic path string is actually portable to a particular operating system will depend on the names used.  See the Portability Guide.

Canonical form

Adjacent name, parent-directory elements in m_name are recursively removed.

relative-path does not have a trailing "/".

Header boost/filesystem/path.hpp synopsis

namespace boost
{
  namespace filesystem
  {
    enum path_format { native };

    class path
    {
    public:
      // compiler generates copy constructor,
      // copy assignment, and destructor

      // constructors:
      path();
      path( const std::string & src );
      path( const char * src );
      path( const std::string & src, path_format );
      path( const char * src, path_format );

      // append operations:
      path & operator /= ( const path & rhs );
      path   operator /  ( const path & rhs ) const;

      // conversion functions:
      const std::string & string() const;
      std::string native_file_string() const;
      std::string native_directory_string() const;
      
      // decomposition functions:
      path        root_path() const;
      std::string root_name() const;
      std::string root_directory() const;
      path        relative_path() const;
      std::string leaf() const;
      path        branch_path() const;
      
      // query functions: 
      bool empty() const;
      bool is_complete() const;
      bool has_root_path() const;
      bool has_root_name() const;
      bool has_root_directory() const;
      bool has_relative_path() const;
      bool has_leaf() const;
      bool has_branch_path() const;
      
      // iteration:
      typedef implementation-defined iterator;
      iterator begin() const;
      iterator end() const;

    private:
      std::vector<std::string> m_name;  // for exposition only
    };

    path operator / ( const char * lhs, const path & rhs );
    path operator / ( const std::string & lhs, const path & rhs );

    // Also see Undocumented non-member functions below

  }
}

For the sake of exposition, class path member functions are described as if the class contains a private member std::vector<std::string> m_name. Actual implementations may differ.

Class path member, or non-member operator/, functions may throw a filesystem_error exception if the path is not in the syntax specified for the grammar.

Note: There is no guarantee that a path object represents a path which is considered valid by the current operating system. A path might be invalid to the operating system because it contains invalid names (too long, invalid characters, and so on), or because it is a partial path still as yet unfinished by the program. An invalid path will normally be detected at time of use, such as by one of the Filesystem Library's operations or fstream functions.

Portability Warning: There is no guarantee that a path object represents a path which would be portable to another operating system. A path might be non-portable because it contains names which the operating systems considers too long or contains invalid characters. Validity checking functions are supplied to ensure names in paths are as portable as desired, but they must be explicitly called by the user.

System-specific Representation

Several path member functions return representations of m_name in formats specific to the operating system. These formats are implementation defined. If an m_name element contains characters which are invalid under the operating system's rules, and there is an unambiguous translation between the invalid character and a valid character, the implementation is required to perform that translation. For example, if an operating system does not permit lowercase letters in file or directory names, these letters will be translated to uppercase if unambiguous. Such translation does not apply to generic path string format representations.

Representation example

The rule-of-thumb is to use string() when a generic string representation of the path is required, and use either native_directory_string() or native_file_string() when a string representation formatted for the particular operating system is required.

The difference between the representations returned by string(), native_directory_string(), and native_file_string() are illustrated by the following code:

path my_path( "foo/bar/data.txt" );
std::cout
  << "string------------------: " << my_path.string() << '\n'
  << "native_directory_string-: " << my_path.native_directory_string() << '\n'
  << "native_file_string------: " << my_path.native_file_string() << '\n';

On POSIX systems, the output would be:

string------------------: foo/bar/data.txt
native_directory_string-: foo/bar/data.txt
native_file_string------: foo/bar/data.txt

On Windows, the output would be:

string------------------: foo/bar/data.txt
native_directory_string-: foo\bar\data.txt
native_file_string------: foo\bar\data.txt

On classic Mac OS, the output would be:

string------------------: foo/bar/data.txt
native_directory_string-: foo:bar:data.txt
native_file_string------: foo:bar:data.txt

On a hypothetical operating system using OpenVMS format representations, it would be:

string------------------: foo/bar/data.txt
native_directory_string-: [foo.bar.data.txt]
native_file_string------: [foo.bar]data.txt

Note that that because OpenVMS uses period as both a directory separator character and as a separator between filename and extension, native_directory_string() in the example produces a useless result. On this operating system, the programmer should only use this path as a file path. (There is a portability recommendation to not use periods in directory names.)

Caution for POSIX and UNIX programmers

POSIX and other UNIX-like operating systems have a single root, while most other operating systems have multiple roots. Multi-root operating systems require a root-name such as a drive, device, disk, volume, or share name for a path to be resolved to an actual specific file or directory.  Because of this, the root() and root_directory() functions return identical results on UNIX and other single-root operating systems, but different results on multi-root operating systems. Thus use of the wrong function will not be apparent on UNIX-like systems, but will result in non-portable code which will fail when used on multi-root systems. UNIX programmers are cautioned to use particular care in choosing between root() and root_directory(). If undecided, use root().

The same warning applies to has_root() and has_root_directory().

Member functions

constructors

path();

Effects: Default constructs an object of class path.

path( const std::string & src );
path( const char * src );

Precondition: src conforms to the generic path string grammar relative-path syntax with optional root-directory prefix, and contains no embedded '\0' characters.

Effects: For each src elementm_name.push_back( element ).

Postcondition: m_name is in canonical form.

Rationale: These constructors are not explicit because an intended use is automatic conversion of strings to paths.

path( const std::string & src, path_format );
path( const char * src, path_format );

Precondition: src conforms to the operating system's grammar for path strings, and contains no embedded '\0' characters.

Effects: For each src element,  m_name.push_back( element ).

Postcondition: m_name is in canonical form.

operator /=

path & operator/=( const path & rhs );

Effects: If any of the following conditions are met, then m_name.push_back("/").

 Then append rhs.m_name to m_name.

(Footnote: Thus on Windows, (path("//share") /= "foo").string() is "//share/foo")

Returns: *this

Postcondition: m_name is in canonical form.

Rationale: It is not considered an error for rhs to include a root-name because it might relative, and thus valid.  For example, on Windows, the follow must succeed:

path p( "c:", native );
p /= "/foo";
assert( p.string() == "c:/foo" );

operator /

const path operator/ ( const path & rhs ) const;

Returns: path( *this ) /= rhs

Rationale: Operator / is supplied because together with operator /=, it provides a convenient way for users to supply paths with a variable number of elements.  For example, initial_directory() / "src" / test_name. Operator+ and operator+= were considered as alternatives, but deemed too easy to confuse with those operators for std::string. Operator<< and operator=<< were until during public review it was pointed out that / and /= matched the generic path syntax.

Note: Also see non-member operator/ functions.

string

const std::string & string() const;

Returns: The contents of m_name, formatted according to the rules of the generic path string grammar.

Note: If any m_name elements originated from the system specific constructors, there is no guarantee that the returned string is unambiguous according to the grammar. A root-name indistinguishable from a relative-path name, a name containing "/", a name "..", and a root-name beyond the first element all could cause ambiguities. Such an ambiguous representation might still be useful for some purposes, such as display. If no m_name elements originated from the system specific constructors, the returned string is always unambiguous.

See: Representation example above.

native_file_string

std::string native_file_string() const;

Returns: The contents of m_name, formatted in the system-specific representation of a file path.

See: Representation example above.

Naming rationale: The name is deliberately ugly to warn users that this function yields non-portable results.

native_directory_string

const std::string native_file_string() const;

Returns: The contents of m_name, formatted in the system-specific representation of a directory path.

See: Representation example above.

Naming rationale: The name is deliberately ugly to warn users that this function yields non-portable results.

root_path

path root_path() const;

Returns: root_name() / root_directory()

Portably provides a copy of a path's full root path, if any. See Path decomposition examples.

root_name

std::string root_name() const;

Returns: If !m_name.empty() && m_name[0] is a root-name, returns m_name[0], else returns a null string.

Portably provides a copy of a path's root-name, if any. See Path decomposition examples.

root_directory

std::string root_directory() const;

Returns: If the path contains root-directory, then string("/"), else string().

Portably provides a copy of a path's root-directory, if any. The only possible results are "/" or "". See Path decomposition examples.

relative_path

path relative_path() const;

Returns: A new path containing only the relative-path portion of the source path.

Portably provides a copy of a path's relative portion, if any. See Path decomposition examples.

leaf

std::string leaf() const;

Returns: empty() ? std::string() : m_name.back()

A typical use is to obtain the undecorated name of a directory entry from the path returned by a directory_iterator. See Path decomposition examples.

branch_path

path branch_path() const;

Returns: m_name.size() <= 1 ? path("") : x, where x is a path constructed from all the elements of m_name except the last.

A typical use is to obtain the parent path for a path supplied by the user. See Path decomposition examples.

empty

bool empty() const;

Returns: m_name.empty().

Naming rationale: Because the equivalent function for standard library container is named empty(), prior versions with other names caused numerous typos. The problem was acerbated because tests for path emptyness are often used near, or in the same expression, with tests for string emptyness,

is_complete

bool is_complete() const;

Returns: For single-root operating systems, has_root_directory(). For multi-root operating systems, has_root_directory() && has_root_name().

Naming rationale: The alternate name, is_absolute(), causes confusion and controversy because on multi-root operating systems some people believe root_name() should participate in is_absolute(), and some don't.

has_root_path

bool has_root_path() const;

Returns: has_root_name() || has_root_directory()

has_root_name

bool has_root_name() const;

Returns: !root_name().empty()

has_root_directory

bool has_root_directory() const;

Returns: !root_directory().empty()

has_relative_path

bool has_relative_path() const;

Returns: !relative_path().empty()

has_leaf

bool has_leaf() const;

Returns: !leaf().empty()

has_branch_path

bool has_branch_path() const;

Returns: !branch_path().empty()

iterator

typedef implementation-defined iterator;

A const iterator meeting the C++ Standard Library requirements for bidirectional iterators (24.1). The iterator is a class type (so that operator++ and -- will work on temporaries). The value, reference, and pointer types are std::string, const std::string &, and const std::string *, respectively.

begin

iterator begin() const;

Returns: m_path.begin()

end

iterator end() const;

Returns: m_path.end()

Non-member functions

Non-member operator /

path operator / ( const char * lhs, const path & rhs );
path operator / ( const std::string & lhs, const path & rhs );

Returns: path( lhs ) /= rhs

Validity checking functions

Undocumented non-member functions

The header boost/filesystem/path.hpp also supplies several non-member functions which can be used to verify that a path meets certain requirements. These subsidiary functions are undocumented pending more research and discussion, and should not be relied upon as they are likely to change.

Rationale

Function naming: Class path member function names and operations.hpp non-member function names were chosen to be somewhat distinct from one another. The objective was to avoid cases like foo.empty() and empty( foo ) both being valid, but with completely different semantics. At one point path::empty() was renamed path::is_null(), but that caused many coding typos because std::string::empty() is often used nearby.

Decomposition functions: Decomposition functions are provided because without them it is impossible to write portable path manipulations. Convenience is also a factor.

Const vs non-const returns: In some earlier versions of the library, member functions returned values as const rather than non-const. See Scott Myers, Effective C++, Item 21. The const qualifiers were eliminated (1) to conform with C++ Standard Library practice, (2) because non-const returns allow occasionally useful expressions, and (3) because the number of coding errors eliminated were deemed rare. A requirement that path::iterator not be a non-class type was added to eliminate errors non-const iterator errors.

Path decomposition examples

It is often useful to extract specific elements from a path object.  While any decomposition can be achieved by iterating over the elements of a path, convenience functions are provided which are easier to use, more efficient, and less error prone.

The first column of the table gives the example path, formatted by the string() function. The second column shows the values which would be returned by dereferencing each element iterator. The remaining columns show the results of various expressions.

p.string() Elements p.root_
path()
 
p.root_
name()
p.root_
directory()
p.relative_
path()
p.root_
directory()
/ p.relative_
path()
p.root_
name() /
p.relative_
path()
p.branch_
path()
p.leaf()
All systems                  
/ / / "" / "" / "" "" /
foo foo "" "" "" foo foo foo "" foo
/foo /,foo / "" / foo /foo foo / foo
foo/bar foo,bar "" "" "" foo/bar foo/bar foo/bar foo bar
/foo/bar /,foo,bar / "" / foo/bar /foo/bar foo/bar /foo bar
Windows                  
c: c: c: c: "" "" "" c: "" c:
c:foo c:,foo c: c: "" foo foo c:foo c: foo
c:/ c:,/ c:/ c: / "" / c: c: /
c:/foo c:,/,foo c:/ c: / foo /foo c:foo c:/ foo
//shr //shr //shr //shr "" "" "" //shr "" //shr
//shr/ //shr,/ //shr/ //shr / "" / //shr //shr /
//shr/foo //shr,
/,foo
//shr/ //shr / foo /foo //shr/foo //shr/ foo
prn: prn: prn: prn: "" "" "" prn: "" prn:

© Copyright Beman Dawes, 2002

Revised 11 March, 2003