Redshift ra3 vs. dc2 Performance
Hunter Fernandes
Software Engineer
I have been trying to figure out whether switching from dc2.large nodes to ra3.xlplus nodes is worth it. And nobody has posted their benchmarks online! So I decided to do it myself.
The crux of the issue is that ra3.xplus nodes have twice the CPU and twice the memory but about the same IO performance as dc2.large nodes. But ra3.xplus nodes cost about 4 times as much as dc2.large nodes!
That means that ra3.xplus nodes are about twice as expensive per unit of CPU and memory and four times as expensive per unit of IO. The IO performance is particularly concerning because Redshift is an IO-bound system. So why would you ever use ra3.xplus nodes? It’s because they effectively let you decouple storage from compute.
All other Redshift nodes are bound to their storage capacity. ra3 nodes, however, have their main storage backing on S3. The hot data is cached on local SSDs, but the cold data is stored on S3. This means you can scale your storage and compute independently, which is a huge advantage for lots-of-data but not-so-much-compute workloads.
The largest IO-bound operation in Redshift is the COPY command for loading data, typically seen during ETL processes. If your ETL processes is incremental, and you only load a sliver of your data each time, then there is an argument to be made for using ra3 nodes.
But it’s really going to come down to 4x the cost for 1x the IOs. Let’s benchmark the COPY command to see how badly the IO performance of ra3.xplus nodes compares to dc2.large nodes. Since we’re already comparing across families, let’s also compare within families to see how much of a difference the node count makes.
Methodology
To compare COPY performance, which tests network and disk IO, I randomly selected a few ETL jobs across a few orders of magnitude in data size. I selected these because they are representative of the types of ETL jobs we run in production. I’ve included what kind of data they are and how many partitions (files in S3) they have.
Dataset Name | Compressed Size | Number of Partitions |
---|---|---|
Timeseries 1 | 40 GB | 64 |
Timeseries 2 | 18 GB | 23 |
Log 1 | 9.5 GB | 5 |
Log 2 | 3.5 GB | 3 |
OLTP 1 | 0.4 GB | 1 |
For these datasets, the test is defined as the time it takes to run the following query, which loads the data (gzip compressed) from S3 into a table. This is a very standard ETL operation and query.
copy xxx
from 's3://xxx/datasetname'
GZIP delimiter '|' escape TIMEFORMAT 'auto' TRUNCATECOLUMNS;
I tested loading these datasets into Redshift clusters of varying configurations.
Cluster Name | Node Type | Node Count | Concurrency Scaling | Cost Per Month |
---|---|---|---|---|
test-dc2l-2-s | dc2.large | 2 | No | $365 |
test-dc2l-4-s | dc2.large | 4 | No | $730 |
test-dc2l-7-s | dc2.large | 7 | No | $1277.5 |
test-ra3-1-s | ra3.xlplus | 1 | No | $792.78 |
test-ra3-2-s | ra3.xlplus | 2 | No | $1585.56 |
test-ra3-4-s | ra3.xlplus | 4 | No | $3171.12 |
test-ra3-cs-1-s | ra3.xlplus | 1 | Yes | $792.78 |
test-ra3-cs-2-s | ra3.xlplus | 2 | Yes | $1585.56 |
I threw in two concurrency scaling tests to see how that affects performance. Some of these are nightly ETL jobs, which would benefit massively from the “temporary capacity” boost that concurrency scaling could provide. This would allow us to pay only for peak capacity (ETL) and not for idle capacity (the rest of the day).
When you compare pricing, the key comparison will be between test-dc2l-4-s
and test-ra3-1-s
, which both cost in the neighborhood of $730-$800 per month. For cost-to-performance to maintain parity, we expect ra3.xlplus nodes to be about 4x “better” than dc2.large nodes.
Results
I ran each of these COPY command 5 times against each cluster configuration, took the average, and this is what I got (values in query duration seconds).
Cluster | Timeseries 1 | Timeseries 2 | Log 1 | Log 2 | OLTP 1 |
---|---|---|---|---|---|
test-dc2l-2-s | 5921 | 2208 | 423 | 170 | 41 |
test-dc2l-4-s | 2752 | 1133 | 235 | 134 | 34 |
test-dc2l-7-s | 1611 | 700 | 150 | 134 | 33 |
test-ra3-1-s | 3735 | 1404 | 269 | 98 | 20 |
test-ra3-2-s | 2393 | 955 | 196 | 97 | 29 |
test-ra3-4-s | 1151 | 507 | 111 | 87 | 29 |
test-ra3-cs-1-s | 3799 | 1422 | 270 | 96 | 21 |
test-ra3-cs-2-s | 2480 | 970 | 198 | 101 | 28 |
Let’s normalize this to the test-dc2l-2-s
cluster, as it’s clearly the worst performer. I used X = x / (test-dc2l-2-s)
to normalize the times. This means that if your query takes a fourth of the time of test-dc2l-2-s
, it will have a value of 400%. This makes price comparisons easier later.
Cluster | Timeseries 1 | Timeseries 2 | Log 1 | Log 2 | OLTP 1 |
---|---|---|---|---|---|
test-dc2l-2-s | 100% | 100% | 100% | 100% | 100% |
test-dc2l-4-s | 215% | 195% | 180% | 127% | 121% |
test-dc2l-7-s | 368% | 315% | 282% | 127% | 124% |
test-ra3-1-s | 159% | 157% | 157% | 173% | 205% |
test-ra3-2-s | 247% | 231% | 216% | 175% | 141% |
test-ra3-4-s | 514% | 436% | 381% | 195% | 141% |
test-ra3-cs-1-s | 156% | 155% | 157% | 177% | 195% |
test-ra3-cs-2-s | 239% | 228% | 214% | 168% | 146% |
I am most interested in the closest head-to-head comparison, test-dc2l-4-s
and test-ra3-1-s
.
Right out of the gate, we see that the ra3.xplus node cluster is slower than the dc2.large node cluster for the larger datasets by about 25-15%. The larger the dataset size, the worse the ra3.xplus nodes perform compared to the DC2 nodes.
On the other hand, the smaller data sets load faster — almost 40% faster for Log 2 and 70% faster for OLTP 1.
The other thing to note is that the concurrency scaling was a wash. I tweaked the CS configuration multiple times and it never made a difference. It has never kicked in for these tests, no matter how much I tried to force it on.
OK, we know the price of each cluster, so let’s divide the normalized query times by the cost of the cluster to get a cost-to-performance ratio. Once again, I will use test-dc2l-2-s
as the baseline ($365 / month = 1 “unit”).
Dataset values are given in units of performance per unit of cost, higher is better.
table | Cost | Timeseries 1 | Timeseries 2 | Log 1 | Log 2 | OLTP 1 |
---|---|---|---|---|---|---|
test-dc2l-2-s | 1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
test-dc2l-4-s | 2 | 1.08 | 0.97 | 0.90 | 0.63 | 0.60 |
test-dc2l-7-s | 3.5 | 1.05 | 0.90 | 0.81 | 0.36 | 0.35 |
test-ra3-1-s | 2.172 | 0.73 | 0.72 | 0.72 | 0.80 | 0.94 |
test-ra3-2-s | 4.344 | 0.57 | 0.53 | 0.50 | 0.40 | 0.33 |
test-ra3-4-s | 8.688 | 0.59 | 0.50 | 0.44 | 0.22 | 0.16 |
test-ra3-cs-1-s | 2.172 | 0.72 | 0.71 | 0.72 | 0.82 | 0.90 |
test-ra3-cs-2-s | 4.344 | 0.55 | 0.52 | 0.49 | 0.39 | 0.34 |
Finally, just our head-to-head comparison normalized against test-dc2l-4-s. Performance per unit of cost, higher is better.
table | Cost | Timeseries 1 | Timeseries 2 | Log 1 | Log 2 | OLTP 1 |
---|---|---|---|---|---|---|
test-dc2l-4-s | 1 | 1 | 1 | 1 | 1 | 1 |
test-ra3-1-s | 1.09 | 0.68 | 0.74 | 0.80 | 1.26 | 1.57 |
Again, for the larger datasets, the ra3.xplus nodes are between 30%-20% less cost-effective than the dc2.large nodes. But for the smaller datasets, the ra3.xplus nodes are between 20%-60% more cost-effective than the dc2.large nodes.
This is a mixed bag. It kind of makes sense, though? The ra3 family having relatively half the IO performance per cost unit really hurts it for loading larger datasets.
Another interesting thing to note is that the cost-to-performance ratio levels out as we go from RA 1 node/2 nodes/4 nodes. This signals that there is acceptable scaling in the ra3 family for larger datasets: you can throw more nodes at the problem and get a linearish increase in performance.
Summary
On a (1) ETL-loading, (2) cost-to-performance basis, (3) and when compared to dc2, ra3 nodes are about 25% worse for large datasets and 20-60% better for small datasets. My guess is that significantly reduced IO performance is the main reason for the ra3 nodes’ poor performance.
One of the ways to work around this is to cut your large nightly ETL jobs into smaller chunks that run throughout the day.
Do You Even Have a Choice?
It’s clear that AWS wants to push ra3 nodes as the future of Redshift. They have a lot of features that are only available on ra3 nodes, such as Aurora Zero ETL. Additionally, they removed the ability to buy Reserved Instances for dc2 nodes in February 2024 (and did not announce it…). Regardless of the performance, you might be forced to switch to ra3 nodes in the future.
If you were previously sitting on a lot of data on dc2 nodes, you have two options:
- Stay on dc2 nodes and pay the on-demand price, which is about 30% more expensive than the no-longer-available Reserved Instance price.
- Switch to ra3 nodes and get about 30% less performance for the same price. Add 30% more compute cost to compensate.
Either way, you’re going to be paying more for the same performance. This is a hidden tax.