You are either looking to establish a new data warehouse/data lake/data mart on the cloud, or migrate/expand an existing on-premise solution. As a first step, you need to determine if AWS Redshift, one of the most popular cloud Data Warehouse Platform as a Service, is the right fit for your need. Most people might just consider the apparent evaluation factors of performance, cost, security, support and compatibility (with data integration, BI & Analytical tools). I am not going to dwell into them as you would find numerous posts discussing the same. I am going to take up a fundamental criterion and evaluate the Redshift fit for it.
The foremost criteria to consider would be your immediate and mid-term (next 3 years) use cases of your data warehouse. Are they primarily analytical workloads, or Business Intelligence with a mix of some analytical workloads? The answer to this question would first determine the governing architecture:
Enterprise Data Warehouse (EDW):An enterprise data warehouse that is integrated, subject-area driven and is optimized for large query processing with support for mixed workloads is better suited for primarily Business Intelligence with a mix of some Analytical workload.
Logical Data Warehouse (LDW):A Logical Data Warehouse that uses repositories, virtualization and distributed processing is well suited for large Analytical workloads with a mix of some Business Intelligence.
In the recent years, Redshift has been able to significantly improve its performance as a EDW provider and add critical features such as continuous data loading through Kinesis firehose and enhanced management of copious amounts of data, making it well suited as an EDW Solution. Although Teradata, still considered as the industry leader, also available on AWS marketplace, is a better suited option for only its existing customers.
In the case of a Data Management Solution for LDW Redshift had been considered infeasible due to its lack of support for managing data variety and volume from various sources. Products such as Teradata’s Virtual Query or Microsoft’s PolyBase has helped position the competitors as better alternatives for establishing what some may call as Unified Data. The introduction of Redshift Spectrum in early 2017 has changed this position. With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” — without having to load or transform any data. This gives you the freedom to store the data where you want in the format you want and it is available for processing when you need it.
In short, with the regular roll out of features Redshift is a viable data management solution that would support the architecture of choice. It also scores well in criteria such as performance, cost, security, support and compatibility. With the constant roll out of features, it is prudent to arm yourself with a specialized consulting partner to establish the right logical and physical architecture for not only scalability and performance but also accelerate the implementation of your business use cases.