Dec 27, 2008

Online Games Industry Software Perfromance Boosting Case Study

Hi,

My colleague, Romi Kuntsman, refered me to Big Fish Games software performance boosting case study that was published not long ago by MySQL (now SUN).

MySQL is a significant player in the internet based systems niche which includes the online advertisement, SaaS, games and gaming and many others. The internet based systems were not long ago considered as not so complex systsms as the enterprise systems.
This assumption is not longer true. Internet based system and mostly the top players in the market are facing challenges that require non conservative solutions. Players such as Facebook, Google, Big Fish and CJ are facing the challenge to handle millions of events per second. This is significantly larger than most enterprise software and n-tier architectures based systems were designed to handle with.

Therefore, MySQL based systems that were considered to be a low-mid range systems, are facing these days the most challenging requirements of traffic and load.

MySQL just released a good example for that (I will present several other cases we dealt and dealing with in future posts): BigFishGames.com is a fast-growing website with over 25 million unique customer accounts and over 2.5 million visitors per month and it well financed (over $80M in the last financial round).

Due to their high growth rate (100%-220% per year in last few years accurding to TechCrunch replys), Big Fish faced performance issues, although they used over 40 different MySQL servers...

What did Big Fish do?

1. Database Profiling and Tuning:
Big Fish DBA team used MySQL Query Analyzer. This product provides a consolidated view of query activities and execution details, and it enables quick identification of poorly running queries and tackle the root causes directly in the SQL code. The DBA team caught a "bad" query running 400K times overnight which never showed up in query logs. Since the Query Analyzer uses a Service Agent listening to application queries and performances metrics, the MySQL servers can always be live and operational when being analyzed. There is no need to switch the servers back and forth between on-line and off-line, which eliminates unnecessary risks to server availability and reliability. Moreover, it enables the DBA team gain a real time statistics, rather than make the analysis based on test environment simulations.
After deploying the MySQL Query Analyzer, Big Fish Games tripled (X3) its database performance within three days.

2. In Memory Database and Distributed Cache Layer
Big Fish Games used Memcached as an in memory database soluton and distributed caching layer. By using Memcached, all online queries are performed in the application server memory, and no online queries are processed in the database. Once in a while the Memcached is being syncronized with the database. This solution improved the end user response time, system scalability and database performance.

3. High Performance Servers
Big Fish Games chosen Sun Fire x64 Servers (not so surprising since MySQL is owned by SUN) in three different architectures:
  • Sun Fire X2100: for applications which require lots of local disk space but less I/O or CPU speed
  • Sun Fire X4100: for applications which demand fast processors but don't need speedy local disk I/O
  • Sun Fire X4140: for applications where faster local disk I/O via RAID 10 and battery backed up write cache is essential (Memcached servers)
Right selection of the hardware enabled 20x in performance by merely replacing an X4100 server with an X4140 machine (and probably they paided for this selection)

I think that this case study is a great example of how a poor performance system can be boosted in factors of X100 and more. Moreover, these cases are not so rare. Many players in the market gain success, and require better performing systems at the fraction of the cost of legacy enterprise systems, without major redesigning their existing servers, and while keeping their current code base.

Best Regards,
Moshe,
RockeTier, the performance experts


Dec 24, 2008

The World Summit of Cloud Computing - Presentations and Impressions

Hi,

The IGT just released the presentations and videos of all the two days lectures taken place in the IGT2008 - World Summit of Cloud Computing. It was a great event with great lecturers and presentations.

You can find my lecture, both slides and video. My focus in the lecture was presenting several aspects from our cloud computing field experience and strategic aspects we have in this fields including: market current status, market niches (IaaS, PaaS and SaaS), how to avoid vendor lock-in and how to establish SLA.
We also discussed why enterprise should get into the market (Who will be the next financial giant after Citi age? maybe Paypal or Google?)
We presented how cloud computing enables enterprises gain equity and Op-Ex reduction, and why software performance boosting is a major requirement for organizations that move into the cloud and want to keep their Op-Ex low.

I would like to recommand also several other facinating lecutres like:
* Stevie Clifton from Animoto with their amazing cloud case study: scaling from 50 CPUs to 3500 CPU to meet the digged effect (we have a client who is going to be the next story in this field, a story for another time)
* Dr Owen O'Malley from Yahoo! and Apache Hadoop with a great lecture on Hadoop and test cases (it is really a great tool, and we already using it)
* Paul Strong with a not so short presentation, but with a great discussion about eBay (did you know that eBay reaches 150 Billion events per day) and the requirements of large enterprise from the cloud.

It was a great event and we hope to see you next year in the IGT 2009,

Best,
Moshe,
RockeTier. The performance experts.


Dec 21, 2008

GPU Cloud - VMware moves ahead?

Hi,

Since I wrote about GPU cloud a week ago, I met another two software companies which are interested in this issue.

Is there any change in the foreseen future? The following post by Michael Larabel reveals a possible answer: "Tungsten Graphics has been acquired by VMware. Open source graphics technology development will continue as part of VMware's engineering team". VMware is the leading virtualization player in the market (but weaker in the cloud market) and this move will probably lead to "improving the 3D support within virtualized environments".

Probably, a move of a key technology player will enable cloud service providers provide better GPU support in the future.

Moshe,
RockeTier


Dec 16, 2008

RockeTier Annual Brunch

PLEASE NOTICE THAT DUE TO WEATHER CONDITIONS, THE EVENT WAS POSTPONED

Hi,
Performance and Cloud Computing posts will be updated soon, in the meanwhile:
RockeTier is having its annual brunch next week on Friday (26/12), and you are welcome to join us. The event will take place Tel Aviv, Israel on Friday the 26/12 at 11:00. Look forward for good food, good booze and great company, More details at Facebook. The favor of a reply is requested by 21/12/2008
P.S. It's also Moshe's birthday :-)

Yours Truly.
Moshe and the RockeTiers

Dec 10, 2008

GPU Cloud

Hi,

We are conducting a survey for one of our clients, that is interested in migrating its highly intensive rich media generation system into the cloud.
Since common cloud computing providers lack of a good GPU support, we looked for a niche player in this field.
However, we did not found one. To be more precise, it seems that we are not the only one who looked for this solution.
Seems like a great opportunity for hosting provider, who wants to find its niche in this growing business,

Ciao,
Moshe,
RockeTier

Dec 8, 2008

Top Performance JVM and Cloud Computing

Hi,
In the last three months we at RockeTier help a Java based SaaS company boost their software system performance and achieve their next success story.
Java? Can it be boosted?
Java is a byte code compiled language. Therefore, it usually drags several performance limitations compared to fully compiled code (like unmanaged C++).
One of first recommendations in this project was JVM migration to JRockit. BEA JRockit a JVM optimized for the Intel platform, enabling Java applications to run with increased reliability and performance on lower cost, standard-based platforms. According to industry benchmarks (here and here), BEA JRockit leads in performance over other published RISC-based benchmark results. BEA mentioned a boost ratio of 50% to 100%, These is definitely the numbers we are familiar of.
How JRockit achieve these results?
Well they are using several smart techniques out there:
  • Optimized Code Generation - They monitor continuously the code, and adapting the JIT compilation according to the dynamic code, rather than using known general statistics. They are using a background bottleneck detector as well, that collects runtime statistics to detect bottlenecks caused by frequently executed methods. These statistics are being used even on run time to deliver on-the-spot improvements
  • Seamless Garbage Collection - They eliminate the pauses and operational disruption that garbage collection often causes. They are doing that in several ways in order to meet different types of applications and environments (more details can be found here).
Why is it relevant to Cloud Computing?
x86 CPUs are the de facto standard in Cloud Computing. Therefore, JVM that better utilize the cloud computer CPU, requires less CPU hours to complete a given task and therefore saves you money every hour. Therefore, using optimized JVM (and software) help you reduce your Op-EX when you choose cloud computing.
And the bottom line?
Surprise, you can download the JRockit here, absolutely free out of charge...
I think there is no doubt in here; it is a call for action.
Start up your engines :-)
Moshe,

Dec 6, 2008

Gigaspaces XAP vs Oracle Coherence

Hi,

We analyse several Data Grid solutions these days inc. Oracle Coherence and Gigaspaces XAP. During the literature survey we noticed this interesting debate at The Server Side. Take a look in it if you want to know more, while we prepare our analysis.

Best,
Moshe, RockeTier, the performance experts

Dec 4, 2008

The World Summit of Cloud Computing - First Impressions


Hi,

It was a very interesting week in Israel, all fully loaded with cloud computing conferences, sessions and seminars. The central event was the World Summit of Cloud Computing, lasting two days with all the top players in the field including Amazon, Animoto, SUN, eBay, Google, IBM, Cloudera, Yahoo!, Hadoop, and RockeTier. We had a booth as well in the conference. Our performance boosting and cloud computing consulting offers drawn a significant attention from market leaders and it seems that we have pretty much work to catch up now.

In the next post I will summerize several notes from the conference,

Best Regards,
Moshe,
Rocketier, the performance experts
Moshe

Nov 28, 2008

Garage Cloud Computing Event - Short Summary

It was a great evening last night in the GarageGeeks event along with industry fellow and mates: Avner Algom (IGT), Guy Nirpaz and Nati Shalom (Gigaspaces) and Shlomo Swidler. Each one of us has a great presentation related to Cloud Computing. I preseted how a startup can take advantage of Cloud Computing. Please find atttached this video

Moshe,
RockeTier


Nov 25, 2008

How to start up using 10 bucks NRE and Cloud Computing

I'm very excited to announce that RockeTier, the performance experts, are gonna present in the coming GarageGeeks event. We will present there how start ups can take advantage of Cloud Computing during development, beta sites, presentations and simulations, and yes even for production.

We found out that this model enables start up cut there costs, the needed equity to be raised and shorten the time to market (TTM)

Don't forget to register

See you there
Moshe
RockeTier

Nov 24, 2008

Online advertisement. How do you handle billion events per day?

Hi,

It was an interesting day today. I'm taking a part in a two days conference named Affilicon. This is the first affiliates networks and affiliates conference in Israel.

Why do I care?
Well, we have several clients in this field.

Why these clients are interested in our services?
Well, it seems that online advertisement systems is a major source for high end system that must handle thousands of events per seconds, or billions of events per day!

Are there so many?
Ya, these systems are counting every banner, text ad or video impression. Think for a second about Google Adwords. Have you taken a look on your conversation rates? Did you think how many text ads do they serve per day? Did you think how do they count all these impressions?

How do they handle these rates?
Well, these players have very large farms (Google for example has around 1 Millions servers in its data centers). They also implement complex solutions to handle these stress in complex ways, doing it all using commodity servers

So how do RockeTier help them?
Well in various ways: 1) boosting their performance, reducing their number of servers by factor of up to 20; 2) implementing shredding, load balancers and grid solutions in order to split the processing between several servers and reducing the stress on each server and 3) use better algorithms to better their system

Best,
Moshe
RockeTier

Nov 21, 2008

How to minimize testing costs using cloud computing

Hi All,

Development of high quality software requires testing. A lot of testings. Usually it requires verifying the new software version on a large number of operating systems. Moreover, if your software system had miles on the road, you usually need to support several previous versions of the software system.

How do handle this large number of servers each required to handle each version, test scenario and operating system? If you are using test driven development it probably even harder... You should every time get back to the base version...

Few years ago, the only option to support such a case scenario was having a large server farm, where servers are restored from backup in the end of each test. The bottom line: this old fashion solution required significant equity and a lot of manual work to perform.

Virtualization changed the market. On a single physical server you can load several logical machines and perform tests. Moreover, if don't need to verify some of the tested versions, you can avoid turning on these logical machines. You thought it could not get better? since virtualization platforms enable getting back to defined points in time, a.k.a checkpoints, you could restore base version by a simple click! The bottom line: virtualization enables software developers and ISVs cut their costs and manual work and get much more using few machines.

These days you can even get better. Using CLI you can automate this process, raise your logical machines just before starting the build and tests, and restore to the relevant checkpoint automatically. More information can be found in Jani Järvinen's article.

What about doing all these testing and performing all these tests automatically without a single penny NRE? Yes we can! using cloud computing can perform all of these by just paying for the used CPU hours and a neglectable payment for the on going storage. Cloud computing providers such as Amazon EC2, Flexiscale and AppNexus enables you allocate servers on demand, attach to them a snapshot and start running your test in short time. Did you know that in just 10 USD you can perform a one hour test cycle on 30 different machines?

During our development when we need to simulate software systems which handle hundreds of millions of events per day. Since several software engineers work on the same project and sometimes on the same file, we need to make several builds every day, in order to make sure that the end of day version will be stable. we found these methods useful. Hope it help you as well.

Best Regards,
Moshe
RockeTier

Nov 11, 2008

The world summit of cloud computing and the cloud computing directory

Hi again,

As you may know, I'm a board member at the IGT (the Israeli association of grid technologies). The IGT annual conference is taking place in few weeks. This year IGT2008 is focused on cloud computing, as you may conclude from its formal name "The world summit of cloud computing". You may find in this conference every major player in the field like Amazon EC2, eBay, Google, Microsoft, Sun, HP, Gigaspaces, Intel and many more. You will even find us, RockeTier, speaking regarding our field experience and our strategic aspects regarding cloud computing. We will have a booth as well, so we'll be glad if you'll go by and visit us.

While arranging the conference we noticed that there is no formal directory. which map the players in this emerging field. Therefore, the IGT created an interactive cloud computing players directory that you'll can find here. This directory includes an interactive map so you can easily dive and find the relevant players. We hope you'll find this tool useful, and we'll be glad if you'll update us if we missed any player in the field.

Waiting for your feedback,

Best
Moshe
RockeTier

Nov 7, 2008

Green IT (Green Computing) - IT as an environmental issue

Environment is a major global issue of the 21st century due to growing air and water pollution, toxic wastes and consumption of non – renewable natural resources causing global warming.

The IT industry help to the global effort of protecting the environment is becoming eminent as IT systems are the fastest growing segment of electrical power consumption worldwide.

The building of IT systems infrastructure requires expensive non-renewable natural resources for electrical power distribution systems, backup power systems, cooling systems and fire suppression systems. In addition, there is a need to consider the large electricity consumption.

As technology evolves servers' cost reduces and many find it easier to buy another server for a new application, instead of using their computing resources efficiently. However, the industry is changing and preference of "green" products over not "green" products is a growing trend.

Green Computing requires new thinking in order to reduce the use of hazardous materials, to maximize energy efficiency and to promote recycling.

The IT industry made numerous changes to improve the systems' overall energy efficiency:

1st Generation Green Computing

The first generation of Green IT is consolidation. Using this methodology major enterprises have gathered servers from different departments in the enterprise into the enterprise data center. The consolidation enabled these organizations consolidate several clients from various departments using the same application or software architecture into a single server, reducing the number of servers and the environmental costs.

2nd Generation Green Computing

These days we are in the middle of the second generation of Green Computing, which is better known as virtualization. Servers’ virtualization enables enterprises gathering several different servers which do not share the same architecture or vendor and place them on a single physical server. Virtualization reduces the number of needed physical servers, which usually leads to high CPU and Memory usage from few percents to high utilization. This technology enables you doing more with every dollar you spent on your servers' hardware and data center floor space. Therefore, Virtualization has a proven 5 months ROI and it leads to average 30% data center cost reduction.

3rd Generation Green Computing

Virtualization made a great change in enterprises, turning low utilization production servers into high utilization servers.

Today, every new virtual machine you place on an existing physical server, leads to immediate savings, but the savings are limited by the server resources utilization.

The obstructing components for economical and environmental savings are the CPU, memory and network usage efficiency of every logical machine.

The components' efficiency is directly connected to the software performance. It is common to see that efficient implementation of a key business process, can lead to 50% reduction in the resource utilization. Software performance boost is considered to be the 3rd Generation Green Computing, and will enable enterprises reduce both economical and environmental costs


We at RockeTier, are leading the 3rd generation Green Computing by providing novel methods and methodology to boost software performance. Our methodology enables hardware utilization, reduction of servers, floor space usage and electricity consumption.

I highly recommend that you go over Gartner review, which identified the top 10 strategic technologies for 2009. Performance is a key component in Green Computing, one of the top 10 strategic technologies in both 2008 and 2009.

Latest Green Computing news are available at CNET and ZDNet

Save Earth,

Moshe

RockeTier


.NET Web Application Boosting

Hi,

We are involved these days in a large ASP.Net project for a new company in the online advertising field (Great company with a great innovative product, which I'll clearly present when time will come). The product includes both a large back office system, and an impressive real time/black box server system which is designed to support 200 million events per day (and counting).

I wanted to share several concepts we are using in order to boost their system and reach these numbers:
1. Using a grid infrastructure - we are using Gigaspaces XAP. This is a great product, which enables us reaching 20 Million events per day on a single commodity server and achieve linear growth. Since our customer is a startup, it is registered to Gigaspaces startup program, meaning it gets the product free out of charge.
2. Using custom http handlers - this technique enables creating a class library assembly that can be linked directly to, without going through a standard .aspx page. This method reaches 5-10% boost over the standard ASPX method.
3. Using async pages - This method turns the regular syncronized ASPX web pages into async ones. Why is it so important? since even interactive web pages may include calls to relatively time consuming methods like reading/storing files on disk or consuming web services. In these cases the request will wait in the thread pool till the time consuming method will finish its job. The result is clear: preventing another web requests to be handled. Async pages releases the bottleneck by processing these time consuming methods in another thread pool. By doing that, it enables us serving more requests on a single server. Read some more at: Pluralsight, ASP.NET Resources blog and this one.

Have a great weekend,
Moshe
RockeTier

Nov 3, 2008

The Cloud Computing Triple Play

I just read a great article by Ted Dziuba at the register. In this article Ted covers the three different attidutes of Amazon, Google and Microsoft to Cloud Computing. This 3 large players, each provides its own attidude to this topic: EC2 with cheap books... oops.. servers, App Engine with lightweight web applications and Azure heavy weight windows/SQL server/.Net based applications.
Ted also described the three different marketing attitudes of these companies.
It is clearly that we are getting into a battlefield between cloud providers who provide open root access servers (Amzaon, Flexiscale and AppNexus) and the providers who give you a closed environment (Walled garden?) which is heavily depended on using the provider web services and API (Yap, Microsoft and Google definitly seems to share the same prespective...)

Have a bright and shiny day,
Moshe
RockeTier

Oct 28, 2008

My Performance Tips

Here you can find some great tips, based on our vast experience, to help you boost your software system performance, while avoiding full rewrite of the system or spending a fortune on new hardware.

These tips will help you utilize your system's current potential by following 5 simple steps and remembering very important facts.

  1. What is the main bottleneck?
    Did you know that only a small percent of the code is responsible to the major performance bottlenecks? Detect the source and half your problem is solved.
  2. You are probably asking yourself: "how should I solve the problem?"
    There are many solutions, some require full rewrite of the system, and some are more elegant… First you have to understand your options, and evaluate which solution will return the greatest performance boost with the lowest investment.
  3. Now we can finally get to work! After we have rated the solutions we'll implement the most effective solutions to reach great improvement in a short time frame. Implementation will give us time to plan a long-term solution that will support future business requirements.
  4. Now it's time to think forward! We'll release each bottleneck until we reach the business' requirements.
  5. Scale up and Scale out, using next generation technologies: Grid, Cloud Computing and In-Memory Databases. These solutions can help you increase your servers' performance by 20 times, and enable you linear scalability while using low-cost servers.

    Remember - it is not complicated: if your code is ready and working – it is possible to double your system's performance in a short time.

    Your Value:

    Boosting your system's performance will not only keep your clients happy but will also allow you to reach your business requirements, reduce hardware and 3rd party software cost.

    Help protect the environment - reduce servers and instantly reduce power consumption on energy and air conditioning.

Oct 27, 2008

RIP Good Times: Open Letter to VCs

Dear Sir/Ma'am
Tough times are ahead and it's the right time to consider solutions to cut expenses.
SaaS and Web 2.0 startups rely on large server farms which cost a lot of money due to hardware costs, software licensing and operation.
More often than not, these companies could run on the same amount of servers, serving 4 times the number of users… Moreover, this change can be achieved without changing architecture, without getting into a project spanning several years an without spending major NRE.
We at RockeTier deliver results in short time frames (based on a risk/reward mechanism when applicable).
We provide a service based on a unique methodology: analyzing the existing code base; detecting the critical business processes, optimizing the code and achieving the goals: doing much more on the same hardware.

I would be glad if you forward this proposition to your portfolio companies' management

Best Regards
Moshe Kaplan
RockeTier

ShareThis

Intense Debate Comments

Ratings and Recommendations