Scale Hacking: Cloud Computing, Software and System Performance: Case Study: Handle 1 Billion Events Per Day Using a Memory Grid

Feb 21, 2009

Case Study: Handle 1 Billion Events Per Day Using a Memory Grid

Hi,

We just published our case study regarding affiliate marketing billing system performance boosting, and we got a great post from Todd Hoff from HighScalability.com regarding it.

The case study main highlights are:
1. How to grow from 1 million events per day system to 1 billion events per day
2. How to keep cost low and avoid millions of USD equity investments
3. How to grow fast keeping close with the business objectives while designing the road map

Please feel free to read our performance boosting case study and comment regarding it in this blog.

Best,
Moshe Kaplan
RockeTier. The Performance Experts.

Update #1: Answers to questions I received through the email:

How do we provide HA?
We usually deploy the systems in a active/active configuration.

What about crash recovery? If counters are kept in memory only there is a window time where a crash will loose the updated counters? Is the client OK with loosing some updates or do you address it someway?
First of all, many of our clients prefer to avoid data loss and risk the lose of data saved in the memory. This is based on simple arithmetic. If you lose a 1 minute of business operation in single server that is about 400 events/second and in case every 1000 events generates you a revenue of few dozens cents, you will prefer losing these 30 bucks than replicating every part of the system.

How do you make sure that the other request to the failed server are still getting answer?
The load balancer is smart enough to detect the server failure, change the rotation algorithm, and making sure an alternative server will take care of the processing.

How do we support Multi datacenter HA?
Multi datacenter HA can be achieved using geo clustering

What about customers that require zero loss of data?
Other customer that require zero lose of data, get an answer using 1) Gigaspaces XAP, which supports data synchronization on the fly between two servers, keeping the two servers synchronized to the last operation and 2) using RDMA.

Scale Hacking: Cloud Computing, Software and System Performance

Pages

Feb 21, 2009

Case Study: Handle 1 Billion Events Per Day Using a Memory Grid

No comments:

ShareThis

Intense Debate Comments

Ratings and Recommendations

Tags