Backup and Restore

Charudatta Mundale Updated by Charudatta Mundale

Backup and Restore (Disaster Recovery)

As part of operating the Tellius platform on-premise or in customer controlled public cloud instances, back-up and restore needs to be considered by the Customer IT Teams (CIT).

Regular snapshots of volumes is the recommended solution for backing up Tellius. The snapshots can be used to restore the data in a different instance or cluster in case of a disaster.

Tellius DevOps (TDO) will be available to assist in identifying the components that need to be backed-up as well as with deployment on new instances as needed.

The Backup feature allows you to take the Backup of your data to the AWS S3 bucket or your local machine. You have to log in as an admin.

To create the backup of your data:

  1.   Click the BACKUP button.
  1.   Select the location for backup, for example, S3 or local.

Note: Once the backup is completed, Tellius provides the S3 backup link or option to download the backup file.

To Restore your Data:

Restore feature allows you to restore the instance to the previously taken backup through the S3 link of the backup file or upload the backup file from local storage. You can restore data of one instance to another instance as well and restore all your Search, Vizpads, Insights, AutoML Models, Metadata, and Data uploaded by the user.  

Types of Deployments

Standalone Deployment

In a standalone environment, all different Tellius services write data into multiple directories in the same volume. TDO team will identify the specific volume required to be snapshot and share the details with the CIT Team.

CIT team will be responsible for regular snapshots of this volume and in case of a disaster, they should be restoring the latest snapshot onto a new volume or a new instance and hand it over to the TDO team.

TDO team is available to assist with the reinstall of required services on the new instance and restore all services with the snapshot data.

Multi Node Deployment

In a multi node environment, all different Tellius services write data into different volumes mounted onto the instance. EBS volumes in case of AWS and Azure Disks in case of EKS.

There are 2 types of volumes attached to services within Tellius

  1. Temporary Data Volumes: which are used to write intermediate data, any data loss from these volumes will not result in the user losing any resources created within Tellius. These volumes are used to store temporary intermediate output. For instance, Spark worker etc
  2. Persistent Data Volumes: which are used to write persistent data, any data loss from these volumes might result in the user losing complete or partial resources created within Tellius. For instance Postgres, MongoDB, Spark, Azkaban etc. There can be around 13 volumes of this type.

Types of Deployments

Full Backup

As a part of the full backup, all the identified persistent data volumes should have snapshot and restore process would involve connecting the new cluster with the restored volumes. 

TDO team should identify all the volumes that need to be snapshot and share the details with the CIT Team.

CIT team will be responsible for regular snapshots of all the persistent data volumes and in case of a disaster, they should be restoring the latest snapshot onto new volumes. TDO team will use the new volumes for the recovery.

TDO team is available to assist with reinstall of required services on a new Kubernetes cluster and restore all services with the snapshot data.

Pros

  • Simple Setup
  • Faster Restore

Cons

  • Multiple volume snapshots are needed
  • Snapshots size can be large

Metadata Backup

As part of the metadata backups, Tellius uses its Backup & Restore service to backup only necessary metadata from all the services into a single volume.

Restore process would involve using Tellius Backup & Restore service to restore all the data into different services within Tellius and also recreate all the data by pulling them from the configured data sources.

TDO team will identify the volume used by the Backup & Restore service which needs to be snapshot and share the details with the CIT Team.

CIT team will be responsible for regular snapshots of this specific volumes and in case of a disaster, they should be restoring the latest snapshot onto new volume and hand it over to the TDO team for the recovery.

TDO team s available to assist with reinstall of required services on a new Kubernetes cluster and restore all services and resources from the snapshot.

Pros

  • Single volume snapshot
  • Backups only single copy of relevant data

Cons

  • Regular backups within Tellius and Regular snapshots of the backup volume in the Cloud console, both need to be configured.
  • Restore process can be time consuming (It can take from few hours to 1 or 2 days for the restore).

How did we do?

How to Attach additional EBS volumes for Backup

Contact