Jan 27, 2009

Hibernate Composite Keys Bug

Hi,

While working on the last few weeks on the last Gigaspaces XAP version, we were upset to find that it does not support composite keys (2 or more fields that serve as the table primary key) when using the default Hibernate connector.

When we investigated the issue, we found out that since Gigaspaces XAP is heavily relayed on Hibernate, and since Hibernate does not support this, Gigaspaces does not support this as well. UPDATE: Gigaspaces is not heavily relayed on Hibernate, rather than provides Hibernate as its default out of the box ORM mapping solution.

It seems that we are not the only one who suffers from this issue in Hibernate (see the error code: Expected positional parameter count: 1, actual parameters: [Parent@bec357b] [from Child this where this.id.parent = ?])

We are working on an applicative solution to solve this issue, but not found time yet to get deep into the Hibernate source code and solve this issue due time stress.

Hope somebody will solve that, before we'll get to it again :-)

UPDATE: Please notice the thread regarding this post which discuss Gigaspaces other external data sources, virtual fields, how to overcome this Hibernate composite keys limitation and their drawbacks.

Best Regards,
Moshe. RockeTier. The Performance Experts.

Jan 26, 2009

How much can you get out of your MySQL

Hi,

As you probably understand, our team, as always, is taking these days MySQL to its limits. It is our pleasure to share the insights with you.

First lets make a small assumptions to the process: We were dealing with InnoDB configuration (which MySQL engine should I choose?)

MySQL Performance Benchmarks:
I would like to refer you to several case studies and benchmarks of MySQL:
1. In our tests while boosting an OLAP mechanism, we reached a number of 12 Group by queries/second on a 1 Million records.
2. 17 Transactions/second were reached on a basic machine (PIII, 256MB RAM on a 100K records table). However, the benchmark performer never revealed his exact benchmark.
3. A MySQL performance benchmark paper from 2005, reached 500 reads on a 10 Million records table on a 8 CPU machine (8 MySQL instances). Meaning that MySQL reached a 60 reads/seconds on a single instance MySQL in a well optimized benchmark.
4. Sun performed on a Sun Solaris based 4 AMD Opteron Dual Core Model 875 (8 MySQL instances) 16GB RAM reached 1800 (RW) and 2900 (RO) transactions/seconds or 220 (RW) and 350 (RO) transactions per second on a 1 Million records table.

Bottom Line:
As a rule of thumb we recommand to not use tables with more than few dozens million of records, and not to expect to more than few dozends reads per second per MySQL instance

Other useful issues for building your scalable software system:
1. MySQL supports triggers. However, we recommend to avoid this feature due to performance issues. Please, if you fill any need to use triggers, please implement it in the BLL level, rather then using triggers.
2. MySQL supports Identity using @@Identity. However, please notice that this one is not well working with triggers (noitce our hazard before).
3. MySQL supports XML, and both SQL Server "FOR XML" and "OPEN XML" can be implemented using various methods (critics). However, we do NOT recommand to use these methods either in MySQL and SQL Server. Usually the database is the bottleneck of any system; Therefore, you would like to avoid any unnecesary operation in the database.
4. INSERT DELAYED: very useful in cases, when you would like to make an insert to a table (e.g log tables and queue like tables) and wants to avoid the wait till the INSERT is processes. INSERT DELAYED are performed on the server open windows and not immidiatly.
5. Multiple inserts: multi valued INSERT performs best in MySQL
6. Transactions are supported in MySQL, use them when needed.
7. Try Catch are supported as well.
8. MySQL row sizes:
- Regular fields take up to 8KB.
- You can use VARBINARY, VARCHAR, BLOB and TEXT columns to get more.
- 1000 columns is the limit per table

Best,
Moshe. RockeTier. The Performance Experts.

Jan 25, 2009

SQL+MapReduce

Hi,

It was only a matter of time since MapReduce concept was presented by Google, and Hadoop (ya, still owe you a post regarding it) was turned into public by Yahoo! and Apache, that a new solution that fits online queries and not only batch processing will be presented to the market.

Greenplum, a silicon valley based startup, providings a datawarehouse and analytical database, that can scale to petabytes while processing single query in parallel (or in other words, do SQL in the MapReduce way).

Greenplum already has several interesting clients including NYSE, Reliance communications (1 billion CDRs per day) and FOX.

Take a look at this company,

Best,
Moshe. RockeTier, The Performance Experts.

Jan 24, 2009

Java, MySQL and Large Datasets Retrieval

Hi,

As told before, it was a MySQL week,

We had a major work this week solving a performance issue in a reporting component to one of our clients. Since its current component worked directly against the raw database, it was facing degragated performance as database and business get larger.

Therefore, we designed an OLAP solution that extracts information from the raw tables, group and summarizes the data and then created a compact table, which data can be easily read from.

However, the database is MySQL, and we used Java to implement this mechanism. Unfortunately, it seems that Java and MySQL don't really each other or at least like large tables: When you try to extract records our of large MySQL table you receive out of memory error in the execute and executeQuery methods.

How to overcome this?
1. As suggested by databases&life, set the fetchSize to Integer.MIN_VALUE. Yes, I know it a bug, not a feature, but yet it solves this issue:

The reason for this bug is the code in StatementImpl.java of the MySQL JDBC driver code:
protected boolean createStreamingResultSet() {
return ((resultSetType == ResultSet.TYPE_FORWARD_ONLY)
&& (resultSetConcurrency == ResultSet.CONCUR_READ_ONLY)
&& (fetchSize == Integer.MIN_VALUE));
}

And the solution is:

public void processBigTable() {
PreparedStatement stat = c.prepareStatement(
"SELECT * FROM big_table",
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY
);
stat.setFetchSize(Integer.MIN_VALUE);

ResultSet results = stat.executeQuery();

while (results.next()) {
...
}
}


2. The other option is doing this fetch applicative, meaning that each time setMaxRows will set to N and reocrds will be extracted only if their id is larger than the extracted before

public void processBigTable() {
long nRowsNumber = 1;
long nId = 0;
while (nRowsNumber > 0) {
nRowsNumber = 0;
PreparedStatement stat = c.prepareStatement(
"SELECT * FROM big_table WHERE big_table_id > " + nId ,
ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY
);
stat.
setMaxRows(10000);
ResultSet results = stat.executeQuery();

while (results.next()) {
++nRowsNumber;
nId = results.getLong(1);
...
}
}

Hope you find it useful as we found it, and thanks again to databases&life,

Best
Moshe. RockeTier. The Performance Experts.

Jan 20, 2009

Does MySQL 5.0 work with multi-core processors? yes???


Hi,

I think this week can truly be named: "The MySQL week". So many issues regarding this product. You know, sometimes it is the cost of getting things for free...

Well, first of all, MySQL is a great product. Many start ups started with this product and many giants are using it for their billions events per day systems.

However, MySQL has several limitations. One of them is that it not really supports multi core processors. Yes, I know, MySQL definitly tell that they do support multi threading. However, our analysis found out that this is not the case. As you can see in the attached graph, the MySQL machine really works hard. However, since it's a quad core machine, it is reaching only 25% CPU utilization

To be more accorate, too many people around the globe like Jeremy Kusnetz, peter with who Sun reach performance on 256 cores and starnixhacks got to the same conclusion: MySQL do use multi threading to it periphrial components, but when it get to do real work, it not really supports multi core processors.

This can be really defined as a major issue, since modern CPUs are based on slower clocks and more cores... So what can we do?

Well the answer is Sharding...
In few words, sharing is the internet companies way to install many weak type databases, each dealing with a vertical or horizental part of the database. This way you can install on a dual qual core CPUs machine, 8 MySQLs instances, each dealing with a single partition of your application (for example the first stores customer with name starts with 'A', while the second stores these that start with 'B', 'C' and 'D' and so on). A broader review of this method will be provided in the next weeks.

Anyhow, I recommend MySQL Optimization guide, as a must to any MySQL would be tuning professional

Moshe.
RockeTier, The Performance Experts

Jan 19, 2009

RockeTier Agile Development Methodology

Hi,

We at RockeTier, believe that almost any software system can do better by changing and modifying only a small portion of existing code base.
We proved in several cases that using this methodology you can gain major business value in short time. or in other words Agile.

Therefore, it was only a matter of time before we migrate our development team to Agile methodology (you guess right, our software team develops high load software systems which process hundreds of millions of events per day for our clients). This change enabled us providing business value in shorter time frames.

Our development methodology is based on the following basics:

1. System wide design - the product owner is responsible to provide long term road map for the system, including software architecture, database architecture, middleware, non functional requirements, timeline and so on based on inputs from the client or internal product manager. This is an important component which sometimes Agile evalgelists tends to neglect.

2. Product backlog - product owner is responsible for breaking the long term architecture into business processes. Each business process is analysed, confirmed by the client and used as an input for the sprint backlog. Many times business processes are not complete, but describing current days needs in order to gain business value, knowing that these requirements will be changed in the future.

3. Sprint backlog - each business process is broken into a list of tasks (usually by the programmer). This list is documented in a central repository and being written on the team whiteboard.

4. Daily Cycle Status Review- each day a peer meeting is being handled on 10AM, syncronizing open issues with closed issued, adding high priority tasks (usually bugs)

5. Daily Version - each day on 4PM we upload a new "for review" version which includes all the commited new features added to the software in the last 24 hours.

6. Weekly Version - based on the last week "for review" versions we upload a new version on Wednesday 12:00, enabling our team and the client examine the system before weekend.

7. Quality - We believe that people achieve best results when they are responsible for their products. Therefore, most QA is done by the programmers themselves and the team leader:
- Peer review and pair programming - we often use these methods when developing sensitive components and when time is short, to achieve best quality in short time, and in order to reduce the risks.
- Code review - done by the team leader before a new feature is committed to the SVN
- Code management - we use SVN as company standard, but also consider to use GIT, in order commit (locally on your computer) every change you make that way you can return to any point in time but do not break everybody's code if your changes are not complete. and only when code is ready, commit your changes in svn.
- TDD - we use test driven development to make sure that code is not broken between cycles, and making sure that new changes will not harm current business processes

8. Knowledge management - knowing that sharing a knowledge will help our people gain best results, we opened few weeks ago a new code snippets and best methodologies blog. This is an open blog by nature so the community can enjoy our insights and experience.

We seen a great improvment in our development products since implementing this method including: reducing the needed time to get new people into productivity, reducing error and misunderstandings, reducing time to market and reducing bugs,

Keep agile,
Moshe, RockeTier. The performance experts.

Jan 15, 2009

Amazon Web Services Meetup Tel Aviv

Hi,

There is still room at the next Amazon Web Services Meetup Tel Aviv. It will be held in the Tel Aviv univerisity

The schedule is:
5:00 - 5:30 Networking
5:30 - 5:45 Opening remarks, Prof. Mel Rosenberg
5:45 - 6:15 Amazon Web Services Overview, Martin Buhr
6:15 - 7:00 Customer Panel
7:00 - 7:15 Open question and answer session
7:15 - 7:30 Closing Remarks, Martin Buhr
7:30 - 8:00 Networking


See you there
Moshe,
RockeTier

Jan 6, 2009

Secure your Amazon EC2 Database

Hi,

An important issue that must be addressed in every system, and for sure if it is an internet based, is security. When we build high scalable systems, we face this issue time after time.

Therefore, when you install a new system in Amazon EC2, with a MySQL in it, you should consider how to secure and restrict the access to the system and the database.

Amazon provides set of solutions in order to support these needs:
1. Firewall: Amazon EC2 provides a firewall solution. This inbound firewall is configured in a default deny mode and you must explicitly open any ports to allow inbound traffic. The traffic may be restricted by protocol, by service port, as well as by source IP address (individual IP or CIDR block). More can be found here.
2. Security Groups: Security group is the easiest way to aggregate several EC2 servers based on rule, access needs and functionality: Database, application, web, etc (if you are familiar with VLAN segments and FW legs, it's very similar indeed).
Every server is associate at start up with a security group, so access to this server is restricted based on the group definitions. This can be done by defining FW rules between Amazon EC2 security groups and between a security group and the world (for example HTTP may be enabled from the world to the web servers security groups, while the database will enable access in 3306 to the application servers security group and your company static IP)
3. Anti DDoS, IP spoofing and other issues are addressed here

So how should I start?
1. Define you servers roles and the security groups. Tip: usually it should be the same as your AMI
2. Define the FW rules between the groups and the logical rules between each other, the world, your secured office connection (static IP is recommanded) and every other relevant 3rd party.
3. Define the security groups in Amazon (see ec2-...-group RTFM)
4. Define you FW rules. see more here to learn the exact syntax and in the ec2-authorize RTFM

Best,
Moshe. RockeTier, the performance experts

ShareThis

Intense Debate Comments

Ratings and Recommendations