- type of Search needed by the application(Different Columns, colored Blue)
- amount of Scale needed by the application (colored green)
- amount of consistency required by the application (colored brown)
Here we are presenting 3D data using a 2D table. Repeated columns under each type of scale column sets takes care of that. For example, "DB" in the forth row forth column says "if you need where clause like search, with small scale and transactions, use a DB". Other cells use the same idea.
The notation I use is KV: Key-Value Systems, CF: Column Families, Doc: document based Systems. Questions marks in the table means, it might work, but you should verify. The table only put the recommendations and does not exactly say how I come up with the recommendations. More details are in the slides, and I will get out a writeup soon. Some of the key ideas are
- Transactions and Joins does not scale great
- KV scale most, then CF and Doc models, then DB. So if KV is good enough, go for that.
- Offline case have time to do MapReduce and walk through the data.
- If you need transactions or Joins with scale, you have to try partitioned DBs. But you have to try and see, and it might not work either. If it does not, you are out of luck.
Small (1-3 nodes)
|
Scalable (10 nodes)
|
Highly Scalable (1000s nodes)
| |||||||
Loose
Consistency
|
Operation Consistency
|
ACID
Transactions
|
Loose
Consistency
|
Operation Consistency
|
ACID
Transactions
|
Loose
Consistency
|
Operation Consistency
|
ACID
Transactions
| |
Primary Key
|
DB/
KV/ CF
|
DB/
KV/ CF
|
DB
|
KV/CF
|
KV/CF
|
DB?
|
KV/CF
|
KV/CF
|
No
|
Where
|
DB/ CF/Doc
|
DB/ CF/Doc
|
DB
|
CF/Doc(?)
|
CF/Doc
(?)
|
DB?
|
CF/Doc
|
CF/Doc
|
No
|
JOIN
|
DB
|
DB
|
DB
|
??
|
??
|
??
|
No
|
No
|
No
|
Offline
|
DB/CF/Doc
|
DB/CF/Doc
|
DB/CF/Doc
|
CF/Doc
|
CF/Doc
|
No
|
CF/Doc
|
CF/Doc
|
No
|