Extending Airtunnel

Airtunnel was built to be easily extendable. To give some inspiration of what can be done and how, this section shows some common use-cases.

Extra Declaration Properties

As noted, data asset declaration options are still a (too) small set of sensible defaults.

To still enable the use of the declaration store for any properties you like, we created the (schemaless) extra section.

Example

An example use case for this would be that one would like to additionally declare maintainers alongside with their data asset declaration.

We can easily define this in a my_asset.yaml using extra declarations:

type: derived

extra:
      maintainer:
         Tom

And the usage of this value is as simple as:

from airtunnel.declaration_store import DataAssetDeclaration
d = DataAssetDeclaration("my_asset")
# access the declared extra section:
d.extra_declarations

…where the value of d.extra_declarations will simply be: {'maintainer': 'Tom'}

Custom MetaAdapter

As introduced in the architecture section, Airtunnel’s metadata is stored using so called metadata adapters – the default being SQLMetaAdapter which uses SQLAlchemy to interact with an Airflow connection. To use your custom adapter, implement it by basing airtunnel.metadata.adapter.BaseMetaAdapter and specify this new class in the [airtunnel] section of your airflow.cfg using the full reference, like this:

metadata_store_adapter_class = yourpackage.yoursubpackage.RedisMetaAdapter

You can verify it is being properly loaded, by running:

from airtunnel.metadata.adapter import get_configured_meta_adapter
get_configured_meta_adapter()

…which should then return: <class ‘yourpackage.yoursubpackage.RedisMetaAdapter’>

Custom DataStoreAdapter

Airtunnel’s physical data store is by default on the local filesystem, for which LocalDataStoreAdapter is a bridge to carry out all commands like move, copy or list files.

Since we expect a large number of users wanting to have their data store located on i.e. cloud storage, Airtunnel can be easily extended to support it.

To use a custom DataStoreAdapter, implement it by basing airtunnel.data_store.BaseDataStoreAdapter and specify this new class in the [airtunnel] section of your airflow.cfg using the full reference, like this:

data_store_adapter_class = yourpackage.yoursubpackage.S3DataStoreAdapter

You can verify it is being properly loaded, by running:

from airtunnel.data_store import get_configured_data_store_adapter
get_configured_data_store_adapter()

…which should then return: <class ‘yourpackage.yoursubpackage.S3DataStoreAdapter’>