Physical data store setup ========================= Folder setup ~~~~~~~~~~~~ In plain terms of a folder hierarchy, please lay out your physical data store as follows (read section below on why): .. code:: shell ├── archive ├── ingest │   ├── archive │   └── landing ├── ready └── staging ├── intermediate ├── pickedup └── ready Folder meanings and motivation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each data layer must store data persistently and efficiently. This is why the physical data store - be it a SQL system, an S3 bucket or cloud-enabled data lake service - is a key building block, worthy of a rigorous structure. In a scalable, filesystem-like storage medium, for example, a common approach to structure includes: **ingest**: Raw files as received from sources - **landing**: Incoming data files will be 1:1 loaded in here and suffixed with a timestamp of arrival - **archive**: Files from landing that have been consumed by a pipeline will be moved in here **ready**: Processed data assets that are ready to be read from, each one being in its own folder. Ready assets may possibly be further partitioned using a Hive/Spark style folder convention, according to which each subfolder is named using the partition predicate (useful to seamlessly read data assets using Hive, Spark or PyArrow). **staging**: Files in staging are not for general consumption because they are incomplete or currently being worked on. It is further broken down into **pickedup** (data that was moved there from ingest/landing), **intermediate** (any kind of temporary data for intermediate processing steps) and **ready** (where the next version for the ready layer is produced). Airtunnel's load operator will use an atomic move operation (or SQL transaction) to push finished data from *staging/ready* to *ready*, so consumers will never run into access issues or tap into half-finished files. **archive**: Whenever a new version of an asset in ready has been computed and it is valuable to keep a copy of the previous run, move it here under [asset-name]/[load-time]/. **export** (optional): This is for files to be exported to other consumers and that will never be re-introduced into data assets contained in the folders above. Examples include final csv-exports, front-end specific data and reports. Physical data store adapters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default Airtunnel currently uses the local filesystem as the physical data store, it can be however extended to use for example cloud storage providers, see :doc:`Extending Airtunnel `.