This tutorial introduces the basics of the Document Object Model(DOM) API.
As shown in Usage at a glance, a JSON can be parsed into DOM, and then the DOM can be queried and modified easily, and finally be converted back to JSON.
Each JSON value is stored in a type called Value
. A Document
, representing the DOM, contains the root of Value
. All public types and functions of RapidJSON are defined in the rapidjson
namespace.
In this section, we will use excerpt of example/tutorial/tutorial.cpp
.
Assumes we have a JSON stored in a C string (const char* json
):
Parse it into a Document
The JSON is now parsed into document
as a DOM tree:
The root of a conforming JSON should be either an object or an array. In this case, the root is an object.
Query whether a "hello"
member exists in the root object. Since a Value
can contain different types of value, we may need to verify its type and use suitable API to obtain the value. In this example, "hello"
member associates with a JSON string.
JSON true/false values are represented as bool
.
JSON null can be queryed by IsNull()
.
JSON number type represents all numeric values. However, C++ needs more specific type for manipulation.
JSON array contains a number of elements.
Note that, RapidJSON does not automatically convert values between JSON types. If a value is a string, it is invalid to call GetInt()
, for example. In debug mode it will fail an assertion. In release mode, the behavior is undefined.
In the following, details about querying individual types are discussed.
By default, SizeType
is typedef of unsigned
. In most systems, array is limited to store up to 2^32-1 elements.
You may access the elements in array by integer literal, for example, a[1]
, a[2]
. However, a[0]
will generate a compiler error. It is because two overloaded operators operator[](SizeType)
and operator[](const char*)
is available, and C++ can treat 0
as a null pointer. Workarounds:
a[SizeType(0)]
a[0u]
Array is similar to std::vector
, instead of using indices, you may also use iterator to access all the elements.
And other familiar query functions:
SizeType Capacity() const
bool Empty() const
Similar to array, we can iterate object members by iterator:
Note that, when operator[](const char*)
cannot find the member, it will fail an assertion.
If we are unsure whether a member exists, we need to call HasMember()
before calling operator[](const char*)
. However, this incurs two lookup. A better way is to call FindMember()
, which can check the existence of member and obtain its value at once:
JSON provide a single numerical type called Number. Number can be integer or real numbers. RFC 4627 says the range of Number is specified by parser.
As C++ provides several integer and floating point number types, the DOM tries to handle these with widest possible range and good performance.
When the DOM parses a Number, it stores it as either one of the following type:
Type | Description |
---|---|
unsigned | 32-bit unsigned integer |
int | 32-bit signed integer |
uint64_t | 64-bit unsigned integer |
int64_t | 64-bit signed integer |
double | 64-bit double precision floating point |
When querying a number, you can check whether the number can be obtained as target type:
Checking | Obtaining |
---|---|
bool IsNumber() | N/A |
bool IsUint() | unsigned GetUint() |
bool IsInt() | int GetInt() |
bool IsUint64() | uint64_t GetUint() |
bool IsInt64() | int64_t GetInt64() |
bool IsDouble() | double GetDouble() |
Note that, an integer value may be obtained in various ways without conversion. For example, A value x
containing 123 will make x.IsInt() == x.IsUint() == x.IsInt64() == x.IsUint64() == true
. But a value y
containing -3000000000 will only makes x.IsInt64() == true
.
When obtaining the numeric values, GetDouble()
will convert internal integer representation to a double
. Note that, int
and uint
can be safely convert to double
, but int64_t
and uint64_t
may lose precision (since mantissa of double
is only 52-bits).
In addition to GetString()
, the Value
class also contains GetStringLength()
. Here explains why.
According to RFC 4627, JSON strings can contain Unicode character U+0000
, which must be escaped as "\\u0000"
. The problem is that, C/C++ often uses null-terminated string, which treats `\0'
as the terminator symbol.
To conform RFC 4627, RapidJSON supports string containing U+0000
. If you need to handle this, you can use GetStringLength()
API to obtain the correct length of string.
For example, after parsing a the following JSON string to Document d
.
The correct length of the value "a\\u0000b"
is 3. But strlen()
returns 1.
GetStringLength()
can also improve performance, as user may often need to call strlen()
for allocating buffer.
Besides, std::string
also support a constructor:
which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance.
There are several ways to create values. After a DOM tree is created and/or modified, it can be saved as JSON again using Writer
.
When creating a Value or Document by default constructor, its type is Null. To change its type, call SetXXX()
or assignment operator, for example:
There are also overloaded constructors for several types:
To create empty object or array, you may use SetObject()
/SetArray()
after default constructor, or using the Value(Type)
in one shot:
A very special decision during design of RapidJSON is that, assignment of value does not copy the source value to destination value. Instead, the value from source is moved to the destination. For example,
Why? What is the advantage of this semantics?
The simple answer is performance. For fixed size JSON types (Number, True, False, Null), copying them is fast and easy. However, For variable size JSON types (String, Array, Object), copying them will incur a lot of overheads. And these overheads are often unnoticed. Especially when we need to create temporary object, copy it to another variable, and then destruct it.
For example, if normal copy semantics was used:
The object o
needs to allocate a buffer of same size as contacts, makes a deep clone of it, and then finally contacts is destructed. This will incur a lot of unnecessary allocations/deallocations and memory copying.
There are solutions to prevent actual copying these data, such as reference counting and garbage collection(GC).
To make RapidJSON simple and fast, we chose to use move semantics for assignment. It is similar to std::auto_ptr
which transfer ownership during assignment. Move is much faster and simpler, it just destructs the original value, memcpy()
the source to destination, and finally sets the source as Null type.
So, with move semantics, the above example becomes:
This is called move assignment operator in C++11. As RapidJSON supports C++03, it adopts move semantics using assignment operator, and all other modifying function like AddMember()
, PushBack()
.
Sometimes, it is convenient to construct a Value in place, before passing it to one of the "moving" functions, like PushBack()
or AddMember()
. As temporary objects can't be converted to proper Value references, the convenience function Move()
is available:
RapidJSON provide two strategies for storing string.
Copy-string is always safe because it owns a copy of the data. Const-string can be used for storing string literal, and in-situ parsing which we will mentioned in Document section.
To make memory allocation customizable, RapidJSON requires user to pass an instance of allocator, whenever an operation may require allocation. This design is more flexible than STL's allocator type per class, as we can assign an allocator instance for each allocation.
Therefore, when we assign a copy-string, we call this overloaded SetString()
with allocator:
In this example, we get the allocator from a Document
instance. This is a common idiom when using RapidJSON. But you may use other instances of allocator.
Besides, the above SetString()
requires length. This can handle null characters within a string. There is another SetString()
overloaded function without the length parameter. And it assumes the input is null-terminated and calls a strlen()
-like function to obtain the length.
Finally, for literal string or string with safe life-cycle can use const-string version of SetString()
, which lacks allocator parameter. For string literals (or constant character arrays), simply passing the literal as parameter is safe and efficient:
For plain string pointers, the RapidJSON requires to mark a string as safe before using it without copying. This can be achieved by using the StringRef
function:
Value with array type provides similar APIs as std::vector
.
Clear()
Reserve(SizeType, Allocator&)
Value& PushBack(Value&, Allocator&)
template <typename T> GenericValue& PushBack(T, Allocator&)
Value& PopBack()
Note that, Reserve(...)
and PushBack(...)
may allocate memory for the array elements, therefore require an allocator.
Here is an example of PushBack()
:
Differs from STL, PushBack()
/PopBack()
returns the array reference itself. This is called fluent interface.
If you want to add a non-constant string or a string without sufficient lifetime (see Create String) to the array, you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a temporary value in place:
Object is a collection of key-value pairs. Each key must be a string value. The way to manipulating object is to add/remove members:
Value& AddMember(Value&, Value&, Allocator& allocator)
Value& AddMember(StringRefType, Value&, Allocator&)
template <typename T> Value& AddMember(StringRefType, T value, Allocator&)
bool RemoveMember(const Ch*)
Here is an example.
The StringRefType
used as name parameter assumes the same interface as the SetString
function for string values. These overloads are used to avoid the need for copying the name
string, as constant key names are very common in JSON objects.
If you need to create a name from a non-constant string or a string without sufficient lifetime (see Create String), you need to create a string Value by using the copy-string API. To avoid the need for an intermediate variable, you can use a temporary value in place:
If we really need to copy a DOM tree, we can use two APIs for deep copy: constructor with allocator, and CopyFrom()
.
Swap()
is also provided.
Swapping two DOM trees is fast (constant time), despite the complexity of the tress.
This tutorial shows the basics of DOM tree query and manipulation. There are several important concepts in RapidJSON:
Reader
/Writer
to implement even faster applications. Also try PrettyWriter
to format the JSON.You may also refer to the FAQ, API documentation, examples and unit tests.