Partitioning classes

Partitioning classes

Functions

Properties

gpointer partitioning Write / Construct Only
gboolean infer-dictionary Read / Write
GArrowSchema * schema Read / Write
GADatasetSegmentEncoding segment-encoding Read / Write

Types and Values

Object Hierarchy

    GEnum
    ╰── GADatasetSegmentEncoding
    GObject
    ├── GADatasetPartitioning
       ├── GADatasetKeyValuePartitioning
       ╰── GADatasetKeyValuePartitioning
           ├── GADatasetDirectoryPartitioning
           ╰── GADatasetDirectoryPartitioning
    ╰── GADatasetPartitioningOptions

Includes

#include <arrow-dataset-glib/arrow-dataset-glib.h>

Description

GADatasetPartitioningOptions is a class for partitioning options.

GADatasetPartitioning is a base class for partitioning classes such as GADatasetDirectoryPartitioning.

GADatasetKeyValuePartitioning is a base class for key-value style partitioning classes such as GADatasetDirectoryPartitioning.

GADatasetDirectoryPartitioning is a class for partitioning that uses directory structure.

Functions

gadataset_partitioning_options_new ()

GADatasetPartitioningOptions *
gadataset_partitioning_options_new (void);

Returns

The newly created GADatasetPartitioningOptions.

Since: 6.0.0


gadataset_partitioning_new ()

GADatasetPartitioning *
gadataset_partitioning_new (void);

Returns

The newly created GADatasetPartitioning that doesn't partition.

Since: 6.0.0


gadataset_partitioning_get_type_name ()

gchar *
gadataset_partitioning_get_type_name (GADatasetPartitioning *partitioning);

Parameters

partitioning

A GADatasetPartitioning.

 

Returns

The type name of partitioning .

It should be freed with g_free() when no longer needed.

Since: 6.0.0


gadataset_directory_partitioning_new ()

GADatasetDirectoryPartitioning *
gadataset_directory_partitioning_new (GArrowSchema *schema,
                                      GList *dictionaries,
                                      GADatasetPartitioningOptions *options,
                                      GError **error);

Parameters

schema

A GArrowSchema that describes all partitioned segments.

 

dictionaries

A list of GArrowArray for dictionary data types in schema .

[nullable][element-type GArrowArray]

options

A GADatasetPartitioningOptions.

[nullable]

error

Return location for a GError or NULL.

[nullable]

Returns

The newly created GADatasetDirectoryPartitioning on success, NULL on error.

Since: 6.0.0

Types and Values

enum GADatasetSegmentEncoding

They are corresponding to arrow::dataset::SegmentEncoding values.

Members

GADATASET_SEGMENT_ENCODING_NONE

No encoding.

 

GADATASET_SEGMENT_ENCODING_URI

Segment values are URL-encoded.

 

Since: 6.0.0


GADATASET_TYPE_PARTITIONING_OPTIONS

#define             GADATASET_TYPE_PARTITIONING_OPTIONS

struct GADatasetPartitioningOptionsClass

struct GADatasetPartitioningOptionsClass {
  GObjectClass parent_class;
};

GADATASET_TYPE_PARTITIONING

#define GADATASET_TYPE_PARTITIONING (gadataset_partitioning_get_type())

struct GADatasetPartitioningClass

struct GADatasetPartitioningClass {
  GObjectClass parent_class;
};

GADATASET_TYPE_KEY_VALUE_PARTITIONING

#define             GADATASET_TYPE_KEY_VALUE_PARTITIONING

struct GADatasetKeyValuePartitioningClass

struct GADatasetKeyValuePartitioningClass {
  GADatasetPartitioningClass parent_class;
};

GADATASET_TYPE_DIRECTORY_PARTITIONING

#define             GADATASET_TYPE_DIRECTORY_PARTITIONING

struct GADatasetDirectoryPartitioningClass

struct GADatasetDirectoryPartitioningClass {
  GADatasetKeyValuePartitioningClass parent_class;
};

GADatasetDirectoryPartitioning

typedef struct _GADatasetDirectoryPartitioning GADatasetDirectoryPartitioning;

GADatasetKeyValuePartitioning

typedef struct _GADatasetKeyValuePartitioning GADatasetKeyValuePartitioning;

GADatasetPartitioning

typedef struct _GADatasetPartitioning GADatasetPartitioning;

GADatasetPartitioningOptions

typedef struct _GADatasetPartitioningOptions GADatasetPartitioningOptions;

Property Details

The “partitioning” property

  “partitioning”             gpointer

The raw std::shared<arrow::dataset::Partitioning> *.

Owner: GADatasetPartitioning

Flags: Write / Construct Only


The “infer-dictionary” property

  “infer-dictionary”         gboolean

When inferring a schema for partition fields, yield dictionary encoded types instead of plain. This can be more efficient when materializing virtual columns, and Expressions parsed by the finished Partitioning will include dictionaries of all unique inspected values for each field.

Owner: GADatasetPartitioningOptions

Flags: Read / Write

Default value: FALSE

Since: 6.0.0


The “schema” property

  “schema”                   GArrowSchema *

Optionally, an expected schema can be provided, in which case inference will only check discovered fields against the schema and update internal state (such as dictionaries).

Owner: GADatasetPartitioningOptions

Flags: Read / Write

Since: 6.0.0


The “segment-encoding” property

  “segment-encoding”         GADatasetSegmentEncoding

After splitting a path into components, decode the path components before parsing according to this scheme.

Owner: GADatasetPartitioningOptions

Flags: Read / Write

Default value: GADATASET_SEGMENT_ENCODING_URI

Since: 6.0.0