2013年1月21日星期一

MySQL Utility的新功能


除了MySQL數据庫,自dev.mysql.com/downloads還能找到不少的好東西,包括了各種connector,proxy等...。其中MySQL Workbench是較少為人注意到的一個軟體,然而對DBA或資料庫開發者,MySQL Workbench是一個能為我們增加生產力的一個利器。用過MySQL Workbench的人大概都知道它有三大功能-SQL Development、Data Modeling、Server Administrator,分別支援以圖形介面自資料庫查詢/修改資料,以正向工程或反向工程,以圖形化介面做資料庫設計,並以此建立資庫的Data Schema和將Data Dictionary轉為圖形化的資料庫設計模型,以及以圖形化介面協助DBA管理MySQL資料庫,包括開、關、調整、授權、調校資料庫等工作。但是我在這裡想要和各位分享的却是MySQL Workbench自5.2版以後的新功能 - MySQL Utility。

MySQL Utility是內嵌於MySQL Workbench的一套以Python開發的script,使我們能不用花工夫寫script,以一道指令執行一連串的操作,例如在不資料庫間抄資料、比對資料、故障移轉、設定主從複製等。除了透過MySQL Workbench,它也可以自launch pad(https://launchpad.net/mysql-utilities)下載。

如果透過MySQL Workbench使用MySQL Utility可自Workbench上列之工具列最右邊的一個工具(如下圖所示)帶出來,或在Workbench的選單中選 'Plugins',再選'Start Shell for MySQL Utilities'即可帶出一個command mode 的terminal。



MySQL Utility提供的script有以下種類:
1. 資料庫操作,包括-
  • mysqldbcompare – 比對兩個database間的差異,包括table definition和table其中的資料 ,在比對table中的資料時會lock該table;另外還可以加上  '--changes-for' 參數,以一個資料庫為主去改另一個資料庫,使兩個資料庫一致化。這個工具在主從複製時,查是否有資料還有資料沒有自主資料庫複製到從資料庫相當有用。

範例如下:

$ mysqldbcompare --server1=root:welcome1@localhost:3308 --server2=root:welcome1@localhost:3306 world:world --run-all-test --show-reverse
# server1 on localhost: ... connected.
# server2 on localhost: ... connected.
# Checking databases world on server1 and world on server2
#
# WARNING: Objects in server1.world but not in server2.world:
#        TABLE: tt
#
#                                                   Defn    Row     Data  
# Type      Object Name                             Diff    Count   Check 
# -------------------------------------------------------------------------
# TABLE     City                                    pass    pass    pass   
# TABLE     Country                                 pass    pass    pass   
# TABLE     CountryLanguage                         pass    FAIL    FAIL   
#
# Row counts are not the same among world.CountryLanguage and world.CountryLanguage.
#

# Rows in world.CountryLanguage not in world.CountryLanguage
+--------------+-----------+-------------+-------------+
| CountryCode  | Language  | IsOfficial  | Percentage  |
+--------------+-----------+-------------+-------------+
| TWN          | Taiya     | F           | 0.2         |
| TWN          | Lukai     | F           | 0.2         |
+--------------+-----------+-------------+-------------+

#
# Rows in world.CountryLanguage not in world.CountryLanguage
# +--------------+-----------+-------------+-------------+
# | CountryCode  | Language  | IsOfficial  | Percentage  |
# +--------------+-----------+-------------+-------------+
# | TWN          | Taiya     | F           | 0.2         |
# | TWN          | Lukai     | F           | 0.2         |
# +--------------+-----------+-------------+-------------+
#
# Database consistency check failed.
#
# ...done

上例顯示比對 在同一個server上,用不同port(3306,3308)的兩個database instance的內容,用3308的資料庫多了一個table – tt,另外用3308的資料庫在CountryCode還多了兩個rows.

  • mysqldbcopy – 指定database object,將database自來源資料庫instance抄到目的資料庫instance,是一個��輯層的抄寫,以登入來源資料庫,下SELECT指令,再INSERT到目的資料庫的方式進行抄寫。但是現在的問題(MySQL 5.6.9 搭配MySQL Workbench 5.2.45)是執行mysqldbcopy時會有 ‘ERROR: Query failed. 1840: GTID_PURGED can only be set when GTID_EXECUTED is empty.’ 的錯誤訊息' ,bug report上說會在5.6.11解決此問題,參考http://bugs.mysql.com/bug.php?id=68038。

  • mysqldbexport – 倒出資料及資料詞典資料,和mysqldump類似,但能有更多的選項,可導出更多種的格式,以 '--format' 指定產出的格式,可產出sql﹑grid﹑csv﹑tab﹑vertical (各column垂直列示) 。如果沒指定 '--export=data' 則只會傳出建table的SQL DDL

範例如下:
$ mysqldbexport --server=root:welcome1@127.0.0.1:3306 --export=data -v world

  • mysqldbimport – 倒入資料及資料詞典資料

  • mysqldiff – 比對不同資料庫物件的定義的差異,和mysqldbcompare不同,它不會比對table內各行的值,只會看table况stored procedure是否有差異。

範例如下:
$mysqldiff --server1=root:welcome1@127.0.0.1:3306 --server2=root:welcome1@127.0.0.1:3308 --difftype=context  world.CountryLanguage:world.CountryLanguage
2. 一般性的作業,包括 -
  • mysqldiskusage – 顯示資料庫對磁碟空間的使用,大致上等同在OS下 ‘du' 的指令。

  • mysqlindexcheck – 檢查是否有多餘的索引。

範例如下:
$ mysqlindexcheck --server=root:welcome1@127.0.0.1:3306 world
# Source on 127.0.0.1: ... connected.
# The following indexes are duplicates or redundant for table world.CountryLanguage:
#
CREATE INDEX CountryCode ON world.CountryLanguage (CountryCode) USING BTREE
#     may be redundant or duplicate of:
ALTER TABLE world.CountryLanguage ADD PRIMARY KEY (CountryCode, Language)


  • mysqlmetagrep – 找尋指定(可加wild card到搜尋字串)的資料物件的定義,包括stored program。

範例如下:

$ mysqlmetagrep --pattern="t%" --server=root:welcome1@127.0.0.1 --format=vertical world
*************************      81. row *************************
  Connection: root:*@127.0.0.1:3306
 Object Type: TABLE
 Object Name: table_io_waits_summary_by_table
    Database: performance_schema
  Field Type: TABLE
     Matches: table_io_waits_summary_by_table
*************************      82. row *************************
  Connection: root:*@127.0.0.1:3306
 Object Type: TABLE
 Object Name: table_lock_waits_summary_by_table
    Database: performance_schema
  Field Type: TABLE
     Matches: table_lock_waits_summary_by_table


  • mysqlprocgrep – 指定類型找show processlist的資訊。


3. 高可用

  • mysqlfailover – 為MySQL 5.6.5以後的MySQL資料庫做自動故障移轉,我個人認為這是MySQL Utilities最有趣的一部份,它能監測主從複製架構中各資料庫的健康狀況(特別是主資料庫),在主資料庫不能使用時自動故障移。要使該程式發生作用,必需將GTID(Global Transaction Identifier)開啟,也就是在my.cnf檔串加上GTID_MODE=ON的參數配置,用戶還以設定—interval參數以秒為單位指定多久去測一次主數据庫的狀況,再以—ping參數設定多少次測試沒回應,就以—failover-mode參數所指定的故障移轉方式進行故障移轉。--failover-mode可指定的故障移轉方式有:
  1. auto:依mysqlfailover所指定的故障移轉侯選資料庫順序自動做故障移轉,如果在故障移轉序列中沒有找到可用的資料庫,則到所有的從服務器中找故障移轉的資料庫;當故障移轉到從資料庫,且將之升級為主資料庫,其他的從資料庫會將此新的(剛升級的)主資料庫設為主資料庫。如果完全找不到可以作故障移轉的從資料庫,則會產生一個錯誤訊息再結束該命令。
  2. elect:和auto類似,只是當主資資料庫當掉時,自故障移轉侯選資料庫序列中找不到適合的資料庫,則不會找其他的從資料庫,直接丟錯誤訊息,然後結束該命令。
  3.  fail:主資料庫當掉時,不做故障移轉,直接丟錯誤訊息,然後結束該命令。

mysqlfailover的使用範例如下:

$ mysqlfailover --master=root:welcome1@127.0.0.1:3306 --discover-slaves-login=root:welcome1
# Discovering slaves for master at 127.0.0.1:3306
# Checking privileges.
# WARNING: You may be mixing host names and IP addresses. This may result in negative status reporting if your DNS services do not support reverse name lookup.
#
# Failover console will start in 10 seconds.


MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Mon Jan 21 11:34:06 2013

Master Information
------------------
Binary Log File       Position  Binlog_Do_DB  Binlog_Ignore_DB  
ivan-ubuntu-bin.0000  231                                       


當mysqlfailover成功開啟後,在console上可看到以下訊息,它顯示mysqlfailover定期(此處為10秒)ping主資料庫和從資料庫。

GTID Executed Set
CE495FF3-5965-11E2-8479-0023184A0390:1-10 [...]

Replication Health Status
+------------+-------+---------+--------+------------+---------+
| host       | port  | role    | state  | gtid_mode  | health  |
+------------+-------+---------+--------+------------+---------+
| 127.0.0.1  | 3306  | MASTER  | UP     | ON         | OK      |
| localhost  | 3308  | SLAVE   | UP     | ON         | OK      |
| localhost  | 3310  | SLAVE   | UP     | ON         | OK      |
+------------+-------+---------+--------+------------+---------+


上列指令會自動找連上該指定的主資料庫之從資料庫,當成故障移轉序列資料庫;在mysqlfailover的console上可觀查到有一個主資料庫在IP為127.0.0.1,佔用端口為3306,兩個從資料庫分別在localhost,佔用端口為3308及3310。但是要注意的是,在該指令生效前必需在my.cnf中加上以下參數:

report-host=name>
report-port=#>

report-user=>


report-password=
relay-log-info-repository=table
master-info-repository=table

對上列參數的說明請參考http://dev.mysql.com/doc/refman/5.6/en/slave-logs.html

當故障移轉發生時-例如關掉主資料庫,可在mysqlfailover console上看到下列訊息,它顯示mysqlfailover開始做故障移轉,且挑在localhost用3308的從資料庫升級成主資料庫。
Failover starting in 'auto' mode...
# Candidate slave localhost:3308 will become the new master.
# Preparing candidate for failover.

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Mon Jan 21 11:47:40 2013

Master Information
------------------
Binary Log File       Position  Binlog_Do_DB  Binlog_Ignore_DB  
ivan-ubuntu-bin.0000  271              
                         

當故障移轉完成時, console上可看到以下訊息,它顯示新的主資料庫在localhost,所用的端口為3308;

GTID Executed Set
68E65F0B-5A40-11E2-8A0B-0023184A0390:1-4 [...]

Replication Health Status
+------------+-------+---------+--------+-----------+---------------------------------------------------------------------+
| host       | port  | role    | state  | gtid_mode |health                                                              |
+------------+-------+---------+--------+-----------+---------------------------------------------------------------------+
| localhost  | 3308  | MASTER  | UP     | ON   |OK                                                                  |
| localhost  | 3310  | SLAVE   | UP     | ON         | SQL thread is not running., Slave has 1 transactions behindmaster. |
+------------+-------+---------+--------+-----------+---------------------------------------------------------------------+


mysqlreplicate – 配置主從複製,該指令可使我們不需登入MySQL客戶端直接設定主從複製架構,範例如下:

$ mysqlreplicate --master=root:welcome1@127.0.0.1:3306 --slave=root:welcome1@127.0.0.1:3308 --master-log-file=ivan-ubuntu-bin.000014 --master-log-pos=231 –rpl-user=root

# master on 127.0.0.1: ... connected.
# slave on 127.0.0.1: ... connected.
# Checking for binary logging on master...
# Setting up replication...

# ...done.

  • mysqlrpladmin – MySQL 5.6.5 為MySQL 5.6.5做主從複製管理、切換、故障移轉;這是除了mysqlfailover之外另一個很用的工具。它使我們能簡化主從複製資料庫的管理工作,能應用系統的交易自主資料庫切換到指定的從資料,同時在切換時會鎖定資料庫,使它不會丟失任何交易;在把交 易切回原主資料庫時,會等原主資料庫的資料都補齊時才會切回來。
    本工具的指令如下:
    • elect:根据GTID自一序列從資料庫中選一個資料庫當成switchover或failover的對象(找GTID跟得最近的)
    •  failover:依elect或指定的從資料庫做故障移轉
範例如下:
$ mysqlrpladmin --master=root:welcome1@127.0.0.1:3306 --discover-slaves-login=root:welcome1 failover

# Discovering slaves for master at 127.0.0.1:3306
# Checking privileges.
# Performing failover.
# Candidate slave localhost:3308 will become the new master.
# Preparing candidate for failover.
# Creating replication user if it does not exist.
# Stopping slaves.
# Performing STOP on all slaves.
# Switching slaves to new master.
# Starting slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Failover complete.
#
# Replication Topology Health:
+------------+-------+---------+--------+------------+---------+
| host       | port  | role    | state  | gtid_mode  | health  |
+------------+-------+---------+--------+------------+---------+
| localhost  | 3308  | MASTER  | UP     | ON         | OK      |
+------------+-------+---------+--------+------------+---------+

    • gtid:找出指定主資料庫及其從資料庫目前的GTID
範例如下:
$ mysqlrpladmin --master=root:welcome1@127.0.0.1:3306 --discover-slaves-login=root:welcome1 gtid
# Discovering slaves for master at 127.0.0.1:3306
# Checking privileges.
#
# UUIDS for all servers:
+------------+-------+---------+---------------------------------------+
| host       | port  | role    | uuid                                  |
+------------+-------+---------+---------------------------------------+
| 127.0.0.1  | 3306  | MASTER  | ce495ff3-5965-11e2-8479-0023184a0390  |
| localhost  | 3308  | SLAVE   | df6e1330-5965-11e2-847a-0023184a0390  |
| localhost  | 3310  | SLAVE   | 68e65f0b-5a40-11e2-8a0b-0023184a0390  |
+------------+-------+---------+---------------------------------------+
#
# Transactions executed on the server:
+------------+-------+---------+----------------------------------------------+
| host       | port  | role    | gtid                                         |
+------------+-------+---------+----------------------------------------------+
| 127.0.0.1  | 3306  | MASTER  | CE495FF3-5965-11E2-8479-0023184A0390:1-11    |
| 127.0.0.1  | 3306  | MASTER  | DF6E1330-5965-11E2-847A-0023184A0390:1-2     |
| localhost  | 3308  | SLAVE   | 68E65F0B-5A40-11E2-8A0B-0023184A0390:1-4     |
| localhost  | 3308  | SLAVE   | CE495FF3-5965-11E2-8479-0023184A0390:1-11    |
| localhost  | 3308  | SLAVE   | DF6E1330-5965-11E2-847A-0023184A0390:1-17    |
| localhost  | 3310  | SLAVE   | 68E65F0B-5A40-11E2-8A0B-0023184A0390:1-5313  |
| localhost  | 3310  | SLAVE   | CE495FF3-5965-11E2-8479-0023184A0390:1-11    |
| localhost  | 3310  | SLAVE   | DF6E1330-5965-11E2-847A-0023184A0390:1-4     |
+------------+-------+---------+----------------------------------------------+

    • health:列示指定的主資料庫和其從資料庫的狀況
範例如下:
$ mysqlrpladmin --master=root:welcome1@127.0.0.1:3306 --discover-slaves-login=root:welcome1 health
# Discovering slaves for master at 127.0.0.1:3306
# Checking privileges.
#
# Replication Topology Health:
+------------+-------+---------+--------+------------+---------+
| host       | port  | role    | state  | gtid_mode  | health  |
+------------+-------+---------+--------+------------+---------+
| 127.0.0.1  | 3306  | MASTER  | UP     | ON         | OK      |
| localhost  | 3308  | SLAVE   | UP     | ON         | OK      |
| localhost  | 3310  | SLAVE   | UP     | ON         | OK      |
+------------+-------+---------+--------+------------+---------+

    • reset:清掉所有的主從複製的設定,相當於在所有的從資料庫上下'stop slave; reset slave all;'的指令
範例如下:
$ mysqlrpladmin --master=root:welcome1@127.0.0.1:3306 --discover-slaves-login=root:welcome1 reset
# Discovering slaves for master at 127.0.0.1:3306
# Checking privileges.
# Performing STOP on all slaves.
#   Executing stop on slave localhost:3308 Ok
#   Executing stop on slave localhost:3310 Ok
# Performing RESET on all slaves.
#   Executing reset on slave localhost:3308 Ok
#   Executing reset on slave localhost:3310 Ok
# ...done.
    • start:開始主從複製,相當於在所有的從資料庫下'start slave;'
    • stop:關掉主從複製,相當於在所有的從資料庫下'stop slave;'
    • switchover:主從複製架構中,主從角色對掉

範例如下:

$ mysqlrpladmin --master=root:welcome1@127.0.0.1:3306 --new-master=root:welcome1@127.0.0.1:3308 switchover
# Checking privileges.
# Performing switchover from master at 127.0.0.1:3306 to slave at 127.0.0.1:3308.
# Checking candidate slave prerequisites.
# Waiting for slaves to catch up to old master.
# Stopping slaves.
# Performing STOP on all slaves.
# Switching slaves to new master.
# Starting all slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Switchover complete.
#
# Replication Topology Health:
+------------+-------+---------+--------+------------+---------+
| host       | port  | role    | state  | gtid_mode  | health  |
+------------+-------+---------+--------+------------+---------+
| 127.0.0.1  | 3308  | MASTER  | UP     | ON         | OK      |
+------------+-------+---------+--------+------------+---------+

在執行前要確認複製的設定都正,例如,配置成兩個主資料庫,要清乾淨原來的從資料庫設定要用'reset slave all;'指令

  • mysqlrplcheck – 檢查主從複製的配置,範例如下:

$ mysqlrplcheck --master=root:welcome1@127.0.0.1:3308 --slave=root:welcome1@127.0.0.1:3310 –show-slave-status

# master on 127.0.0.1: ... connected.
# slave on 127.0.0.1: ... connected.
Test Description                                                     Status
---------------------------------------------------------------------------
Checking for binary logging on master                                [pass]
Are there binlog exceptions?                                         [pass]
Replication user exists?                                             [pass]
Checking server_id values                                            [pass]
Checking server_uuid values                                          [pass]
Is slave connected to master?                                        [pass]
Check master information file                                        [pass]
Checking InnoDB compatibility                                        [pass]
Checking storage engines compatibility                               [pass]
Checking lower_case_table_names settings                             [pass]
Checking slave delay (seconds behind master)                         [pass]

#
# Slave status: 
#
                Slave_IO_State : Waiting for master to send event
                   Master_Host : localhost
                   Master_User : root
                   Master_Port : 3308
                 Connect_Retry : 60
               Master_Log_File : ivan-ubuntu-bin.000010
           Read_Master_Log_Pos : 271
                Relay_Log_File : ivan-ubuntu-relay-bin.000001
                 Relay_Log_Pos : 4
         Relay_Master_Log_File : 
              Slave_IO_Running : Yes
             Slave_SQL_Running : No
               Replicate_Do_DB : 
           Replicate_Ignore_DB : 
            Replicate_Do_Table : 
        Replicate_Ignore_Table : 
       Replicate_Wild_Do_Table : 
   Replicate_Wild_Ignore_Table : 
                    Last_Errno : 1593
                    Last_Error : Failed during slave workers initialization
                  Skip_Counter : 0
           Exec_Master_Log_Pos : 0
               Relay_Log_Space : 7705
               Until_Condition : None
                Until_Log_File : 
                 Until_Log_Pos : 0
            Master_SSL_Allowed : No
            Master_SSL_CA_File : 
            Master_SSL_CA_Path : 
               Master_SSL_Cert : 
             Master_SSL_Cipher : 
                Master_SSL_Key : 
         Seconds_Behind_Master : None
 Master_SSL_Verify_Server_Cert : No
                 Last_IO_Errno : 0
                 Last_IO_Error : 
                Last_SQL_Errno : 1593
                Last_SQL_Error : Failed during slave workers initialization
   Replicate_Ignore_Server_Ids : 
              Master_Server_Id : 2
                   Master_UUID : df6e1330-5965-11e2-847a-0023184a0390
              Master_Info_File : mysql.slave_master_info
                     SQL_Delay : 0
           SQL_Remaining_Delay : None
       Slave_SQL_Running_State : 
            Master_Retry_Count : 86400
                   Master_Bind : 
       Last_IO_Error_Timestamp : 
      Last_SQL_Error_Timestamp : 130121 11:42:31
                Master_SSL_Crl : 
            Master_SSL_Crlpath : 
            Retrieved_Gtid_Set : DF6E1330-5965-11E2-847A-0023184A0390:5-17
             Executed_Gtid_Set : 68E65F0B-5A40-11E2-8A0B-0023184A0390:1-5313,
CE495FF3-5965-11E2-8479-0023184A0390:1-10,
DF6E1330-5965-11E2-847A-0023184A0390:1-4
# ...done.

  • mysqlrplshow – 列示複製結構的拓樸圖,範例如下:

$ mysqlrplshow --master=root:welcome1@127.0.0.1:3308 --discover-slaves-login=root:welcome1
# master on 127.0.0.1: ... connected.
# Finding slaves for master: 127.0.0.1:3308

# Replication Topology Graph
127.0.0.1:3308 (MASTER)
   |
   +--- 127.0.0.1:3310 - (SLAVE)

  • mysqlserverclone – 在空的資料庫中以現有的資料庫為範本建一個新的資料庫,範例如下:

$ mysqlserverclone --server=root:welcome1@localhost:3306 --new-data=./data4 --new-port=3312 --new-id=4 --root-password=welcome1
# Cloning the MySQL server running on localhost.
# Creating new data directory...
# Configuring new instance...
# Locating mysql tools...
# Setting up empty database and mysql tables...
# Starting new instance of the server...
# Testing connection to new instance...
# Success!
# Setting the root password...
# Connection Information:
#  -uroot -pwelcome1 --socket=/home/ivan/data4/mysql.sock
#...done.

我原本以為會以指定的資料庫instance為範本-在localhost上用3306端口,完全複製同樣內容的資料庫instance到./data4,同時用3312為端口;但是出乎我意料之外的是,我只看到一個空的資料庫instance,只有預訊的資料庫 -如mysql,information_schema,先前我在第一個資料庫instance以手動方式建的資料庫在新的資料庫完全看不到。

  • mysqlserverinfo – 列示資料庫伺服器的資訊,範例如下:
$ mysqlserverinfo --server=root:welcome1@127.0.0.1:3308 --format=vertical
# Source on 127.0.0.1: ... connected.

*************************       1. row *************************
         server: 127.0.0.1:3308
        version: 5.6.9-rc-log
        datadir: /mysql56/data2/
        basedir: /home/ivan/mysql-5.6.9-rc-linux-glibc2.5-i686
     plugin_dir: /home/ivan/mysql-5.6.9-rc-linux-glibc2.5-i686/lib/plugin/
    config_file: /etc/mysql/my.cnf, /home/ivan/mysql-5.6.9-rc-linux-glibc2.5-i686/my.cnf
     binary_log: ivan-ubuntu-bin.000010
 binary_log_pos: 271
      relay_log: ivan-ubuntu-relay-bin.000001
  relay_log_pos: 4
1 rows.
#...done.
 

2011年9月13日星期二

Oracle MySQL/Linux/Virtual Machine (LVM) Integrated Demo Script

Objective
To show that MySQL runs on top of Oracle Virtual Machine and Oracle Enterprise Linux would maximize the value for the customers by offering the lowest total cost of ownership, the highest performance, scalability and productivity. This Oracle LVM integrated demo includes several software/techinques from Oracle, they are MySQL Enterprise Monitor, MySQL Database, MySQL replication, MySQL Proxy, Oracle Virtual Machine, Oracle Virtual Machine Template and Oracle Linux. Those products would bring the following the users:
MySQL Enterprise Monitor helps MySQL DBA managing the database, improving performance, and increase DBA's skill. It maximize the SLA of the database and help the DBAs increase their productivity.
MySQL Proxy enable scale out solution by offering read-write splitting features in the most cost effective manner.
MySQL replication helps the user increase availability and capacity of the database in a lowest cost.
The user would enjoin the best performance by adopting MySQL on top of Oracle Enterprise Linux.
Oracle Virtual Machine template for MySQL increase the productivity of the DBA by per-configured/optimized MySQL database, and decrease the resource needed for deploying MySQL.
Oracle Virtual Machine would help the users with maximize HA from virtual machine layer, it helps MySQL application get zero down time, when migrating the virtual machine between different servers.

Demo Environment

The Servers of the Demo Environment
Install and Configure the Demo Environment
As above graph shows, we apply 3 physical servers – ovmm, ov1, ovs2, ovmm contains OVM Mangaer, PHP pages that act as MySQL client application, and MySQL Enterprise Monitor, ovs1 and ovs2 need to be registered on Oracle Virtual Machine manager as virtual machine server, ovs2 contains one virtual machine for master database, and one virtual machine for first slave, ovs2 contains one virtual machine for 2nd slave database, and ovs2 also act as the backup server for the master database, we will demo the fail-over features by live migration on the Virtual Machine layer. The details of install OVM with HA please refer the document http://www.rachelp.nl/userfiles/file/Create_oraclevm_haserverpool.pdf and. There are several points need to take care of while installing the environment, I list them as below:
For high availability from the virtual machine layer, we need to configure HA options with OVM servers pool on OVM manager, and the servers that support HA need to attach to a shared storage, in this case, it is NFS, and we need to add read_write, sync, and no_root_squash options when doing NFS export, refer http://download.oracle.com/docs/cd/E15458_01/doc.22/e15440/toc.htm#BABDEFDG for the details.
MySQL virtual machine template will runs firewall when ever is started, we need to turned if off manually.
To enable replication and MySQL Enterprise Monitor Agent, the time lag between each virtual machine servers could not larger than a certain period, for say 15 seconds.
The available space on the shared storage need to larger than 70 GB.
For some cases when restart virtual machine the option file - /etc/my.cnf could reset to the default value.
Demo Flow
1 Show MySQL, OVM, Oracle Linux, Oracle Virtual Machine, Oracle Virtual Machine Template for MySLQ download address on Oracle eDelivery.
2 OVM/OEL Demo
  • Navigate OVM interface.
  • Create/Initiate virtual machine with Oracle Virtual Machine Template for MySQL (the template was imported on the “Resource” tab).
OVM template for MySQL offers the users with a configured/tuned MySQL environment, we do not need to tune MySQL options and it can be a base for customized template to foster the deploying speed. For example, we can create template for slave servers from the standard OVM template for MySQL, so we can easily extend the capacity by adding more slave servers from the customized OVM MySQL slave template.3 MySQL Demo
  • Show slide 17 of MySQL technical presentation, explain the demo scenario to the audience
  • Show MySQL master on 10.186.122.69
ssh root@10.186.122.69, password: “oracle”
login MySQL master server, with username “root” and pssword “oracle” by
mysql -h 10.186.122.69 -P 3306 -u root -p oracle
use world;
show tables;
select code, name, continent, population, gnp from Country;
  • Show MySQL slave 1 on 10.186.122.72 and slave 2 on 10.186.122.174
login MySQL slave 1 with usrname “root”, password “oracle”
mysql -h 10.186.122.72 -P3306 -u root -p oracle
show slave status\G
use world;
show tables;
select code, name, continent, population, gnp from Country;
login MySQL slave 1 with usrname “root”, password “oracle”
mysql -h 10.186.122.174 -P3306 -u root -p oracle
show slave status\G
use world;
show tables;
select code, name, continent, population, gnp from Country;
  • Show MySQL Proxy on ovmm.sg.oracle.com
ssh root@ovmm.sg.oracle.com, password “Pass0rd”
ps -ef | grep proxy (show MySQL Proxy command)
  • Show PHP demo application source code on ovmm.sg.oracle.com
vi /var/www/html/worldApp/application/phpQueryPage.php and /var/www/html/worldApp/application/phpUpdatePage.php
high light the code of
$con = mysql_connect("127.0.0.1:4040", "root", "oracle");
show the audience that the database connection is through MySQL Proxy, and read-write splitting is accomplished by this service.
  • Display phpQueryPage page from browser, the URL is below:
http://ovmm.sg.oracle.com/worldApp/application/phpQueryPage.php
show audience the host name on the page, it will change to the other slave when the current slave server is down (it shows slave 1 at this moment)
shutdown slave 1 (shutdown virtual machine – slave 1 is an approach that close to the real case, but it take more time, shutdown database would be fast, it should be easier to keep the pace of demo)
show audience the host name on the page, it had changed to the slave 2 at this moment
  • Demo the master slave replication feature by:
Show phpUpdatePage.php
Query the data by type “HKG” in the “Code” field, and press enter
Insert one record by changing the “Code” field form “HKG” to “ABX”, and press “Insert” button, and query the new record out by type “ABX” on the “Code” filed then press “Query” button.
Back to phpQueryPage.php, the country code “ABX” shows on the query page
  • Demo the live migration feature, and MySQL HA supported by the feature
Ping 10.186.122.69 from ovmm.sg.oracle.com, show the response and message sequence numbers displayed on the terminal
Live migration from OVM interface, from OVM manager by pick “Live Migration” from the “More Action” list.

Show “ping ”, there is a gap with the message sequence number from the ping output after live migration, but the interruption is very small (less than 2 seconds)
Show the message sequence number leap, but the phpUpdatePage.php works fine after live migration, the user interface has no interruption for the migration on the virtual machine layer.

2011年5月8日星期日

Benchmark MySQL 5.5.9 on Oracle Sun X4170 M2

As I posted the testing result from MySQL on HP server (refer to the previous benchmark), my colleagues from Oracle System Practice are very interesting in MySQL performance on Sun's x86 servers (of course we hope we can proof the servers from Oracle are superior than HP). With the help from our VAD partner - Bestcom, we got a Sun X7140 M2 installed in our data center, thanks for the endeavor form my colleague Cano Lai, the server was setup and and ready for testing very soon.

Technical Specification

The configuration of the system as below:

Server


  • CPU: 6 Core Intel E5640 * 2, 2 threads with each core, 12 Cores/24 Threads in total
  • Memory: 16G Bytes
  • Hard Disk: 146GB * 2

Operating System

Red Hat Enterprise Linux 5.6, with base kernel ( Linux Kernel 2.6.18-238.el5 ) 1st run, Oracle Enterprise Kernel ( Linux Kernel 2.6.32-100.26.2 el5) for the 2nd run.

Database

MySQL 5.5.9 Community Edition with InnoDB storage engine

Tasks


  • Setup Sun X4170 M2
  • Install Red Hat Linux 5.6
  • Install MySQL 5.5.9
We downloaded MySQL for Red Hat & Oracle Enterprise Linux 5 (x86, 64-bit) from MySQL download site, apart from MySQL server, shared component is needed for compiling Sysbench.
  • Install Sysbench 0.4.12
Following the installion steps on Sysbench web site http://sysbench.sourceforge.net/docs/#install, would get things done easily, the only thing that need to take care of is to install MySQL shared component in advanced, and let the software and libraries should be accessible.
  • Tune MySQL
Some of tuned options in my.cnf file are list as below:

# The number of threads might be more than 1000, set to 1500 to accommodate it
max_connections = 1500
# The server has 16 GB RAM, evaluate the overhead from OS and Sysbench,
# 12 GB innodb buffer pool, could be fully utilize the available memory
innodb_buffer_pool_size = 12G
# Multiple innodb buffer pools could reduce contention for some of resources such as
# free lists, flush lists, LRUs and buffer pool mutex
innodb_buffer_pool_instance=3
# New file format from MySQL 5.5
innodb_file_format=Barracuda
innodb_file_per_table
# Increase the number of threads for writing dirty pages
innodb_write_io_threads = 8
# Increase the number of I/O threads for read operations
innodb_read_io_threads = 8
# Increase the number of innodb_thread_concurrency to fully utilize the multiple
# cores/threads environment
innodb_thread_concurrency = 24
# innodb_flush_log_at_trx set to 1, InnoDB will flush (fsync) the transaction logs to the
# disk at each commit, which offers full ACID behavior. This a conservative approach.
# If set this to 0 or 2 might reduce disk I/O and increase the measurements of the test.
innodb_flush_log_at_trx_commit = 1

  • Test against MySQL on Red Hat 5.6 based kernel with Sysbench
The following command would generate data for the test, we had 100 million rows in the largest table.

nohup sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=100000000 --mysql-user= --mysql-password= prepare

Execute the test with the following command, and collect the test results from the outputs of nohup

nohup sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=100000000 --num-threads= --mysql-user= --mysql-password= run
  • Substitute Red Hat 5.6 based kernel with Oracle Enterprise Linux Kernel
Apart from MySQL performance on Oracle server, we were curious if there is any difference between the Based Kernel and Oracle Unbreakable Kernel. There was time available after the 1st run, we decided to have the same benchmark on the 2nd run which had the same configuration with the previous run, but replace the OS kernel with Oracle UEK. To upgrade the kernel is not a difficulty job, the required command as below:

# cd /etc/yum.repos.d/
# wget http://public-yum.oracle.com/public-yum-ol6.repo
# wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6
# yum -y install kernel-*

After reboot the server, you will see the kernel had been upgraded with the following command:

# uname -a
# yum info kernel-uek
  • Test against MySQL on Oracle Enterprise Kernel
The steps are the same as task 6.
  • Consolidate the testing results Collect the number of read/write and transactions with each run, and generate line charts.
Testing Results

The benchmark result summarized as chart 1 and Chart 2, for the scale of transaction per second was too small, I created the 3rd chart to depict it.

  • Chart 1 is the measurement for read-only transaction, MySQL on OEL got more than 54,000 operations at 900 threads while with everything the same except Red Hat, we got around 20,000 read-only operations and down to 11,000 at 900 threads.
  • In Chart 2, the red line is number of read/write operations with Red Hat based Kernel, the number of average read/write was 15,426 compare the measurement from the previous benchmark of average 4,033 read/write operations, this 382% improvement is a very one improvement (although the power of the server was better the previous one).
  • In Chart 2, the purple line shows the number of read/write operations with Oracle Unbreakable Kernel, it performed much better than the original kernel, Oracle UEK start from 32,110 and reach to the peak of 39,941 at 700 threads. On the contractually, Red Hat Based Kernel got 20,493 at the smallest number of threads and slipped down onward. Oracle UEK get 36,200 operations which is 234% improvement! And if we compare the measure from the previous benchmark, with technical stack from Oracle (hardware and OS kernel) we can enjoin near 9 times throughput.
  • The number of transaction shows in chart 3, it is consistent to the chart 1, UEK start from 1,690 transaction per second, creep to 2,102 at 700 threads, however Based Kernel start from 1,078 and slip down there after, the gap extended to 1,343 at 700 threads.


Chart 1. Read-only benchmark


Chart 2. Read-Write Benchmark

Chart 3. Read-write benchmark with transaction per second measurement

Conclusion
In the passed year, MySQL had lots of improvements, they enable MySQL to be the database that support mission critical applications. We hope the testing would help us figure out the right technical combination (from hardware, operating system to the database) for our customers. The result of this benchmark shows Oracle technical stack would offer superior performance for the user. With Oracle Linux we got near 300% performance improvement when the number of concurrent connection is 900, and compare with the result form previous benchmark, we got almost 9 times throughput, although the configuration of the hardware is better, it's still a significant improvement.

However I did not apply any performance analyze tool (for example sar) to collect the performance related information from the operating system layer, thus I have no idea what makes this huge difference between Red Hat based Kernel and Oracle UEK, hope I could have the opportunity to run the same benchmark with performance analyze tool.

2011年1月12日星期三

MySQL5.5.8和5.0.77间运算效能的比较-简体中文版

前言
前一阵子和Oracle MySQL大中华区的业务负责人 – Sott Chen讨论制做一个MySQL DB appliance的可行性,希望能做出一套经过优化、即插即用(plug & play)的数据库设备(database appliance),这套设备在经过验证的情况下(例如,特定数量之下的同时联机数、数据量和SQL指令类型)可确保能达成一定的反应时间和吞吐量。这样的设备需要的技术应该有:
1.多核芯的X86硬件
2.经过调校的Linux或Unix操作系统
3.经过调校的MySQL数据库
为了验证这个想法的可行性,先从我最容易做的地方开始-证明经过调校的新版MySQL数据库的效能比一般Linux附带的MySQL社群版数据库效能更佳。

测试环境
1.基础设备,我请Oracle Linux部门的Hans Qing帮忙找到一台HP x86服务器,并在这台机装上Oracle Enterprise Linux,这台机器的配备如下:
CPU: 4x Intel(R) Xeon(R) CPU E5520 @ 2.27GHz, 16 Cores
Memory: 12GB
Hard Disk:300GB
OS: Oracle Enterprise Linux 5.5.8 x86_64

                                                   基本上这是一个压力测试,我的测试脚本是对一个有一亿行的table做读写兼具的OLTP作业。
我用Sysbench当作测试工具,Sysbench和MySQL都在同一台x86的PC Server上,所以网络的迟滞影响可降到最低。
Sysbench的指令如下:
1.nohup sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=100000000 --mysql-user=root --mysql-password=welcome1 --mysql-host=127.0.0.1 --mysql-port=3306 prepare
上述指令,Sysbench会在MySQL数据库建一个名为"sbtest" 的table,并且插入一亿行到该table中(详情请参阅Sysbench的手册 http://sysbench.sourceforge.net/docs/ ) ,第一个指令-nohup是要Linux在背景执行随后的sysbench指令,终端机可在执行sysbench时接受下一个的指令。
2.nohup sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=100000000 --num-threads= --mysql-user=root --mysql-password=welcome1 --mysql-host=127.0.0.1 --mysql-port=3306 run
上述指令是用Sysbench当成MySQL客戶端的程序对MySQL下达SQL指令,MySQL会将执行结果的报告存于nohup的记录档中-nohup.out,我们在执行上述指令时只需更动 的值就可仿真当并行联机数增加时MySQL之负荷状况及效能的变化

MySQL的调整和设定
由于Innodb为本测试所用的主要storage engine,我的调整主要针对innodb的相关参数做设定,MySQL 5.0.77的参数档(my.cnf)的内容如下:

max-connections=4000
innodb_buffer_pool_size=6G
key_buffer_size=512M
innodb_flush_method=O_DIRECT
innodb_file_per_table
innodb_log_buffer_size=4M

主要的调整有:
innodb_buffer_size设为6G,这台x86 PC Server有12G 的内存,因为Sysbench也用同一台机器,设得较为保守,如果整台机器只有MySQL在用应可设为总内存的75% (在这台机器上应为8G)。
innodb_flush_method设为O_DIRECT,少了重复的Buffer对DML会有一定的帮助。

MySQL 5.5.8的option file (my.cnf)调整如下:
max-connections=4000
innodb_buffer_pool_size=6G
key_buffer_size=512M
innodb_flush_method=O_DIRECT
innodb_file_per_table
innodb_log_buffer_size=4M
innodb_file_format=Barracuda
innodb_change_buffering=all
innodb_thread_concurrency=33
innodb_io_capacity=400
innodb_buffer_pool_instances=3
innodb_fast_shutdown=0
innodb_purge_threads=1

innodb_file_format设为新的格式-Barracuda,如此可完全发挥InnoDB 1.1的效能。
innodb_change_buffering设为all,在更改或删除数据时,若该table有非主键的index,可加速效能。
innodb_thread_concurrency设为33,为核芯数的两倍再加一。
innodb_io_capacity设为400,以增加在大量更新数据时flush dirty buffer到硬盘的效能。
innodb_buffer_pool_instances设为3,以增加同时做DML的容量。
innodb_fast_shutdown设为0,以开启多个rollback segment,增加DML的容量和效能。
innodb_purge_threads设为1,以在main thread之外再开一个purge thread,以改善5.1版以前main thread的忙碌程度。

测试结果
本测试由10个threads逐次增加到1020个threads(仿真1020个同时联机),其结果如下两图所示,蓝色线为经调校的MySQL 5.5.8的结果,橘线为经调校的MySQL 5.0.77的结果,黄线为未调校(完全用默认的参数) 的MySQL 5.0.77。图一为每秒交易量,图二为每秒的读写量。这两种吞吐量的衡量标准的结果基本上是一致的。经调校的5.5.8比经调校的5.0.77效能高7%至60%,5.5.8比未经调校的5.0.77效能高113%到11%。5.5.8最佳吞吐量出现在208个threads,5.0.77则出现在96个threads。
图一 - 每秒的交易量

另外,未经调校的MySQL 5.0.77的效能在threads数超过368时比经调校的MySQL 5.0.77高,可能的原因为InnoDB Buffer Pool在调校的MySQL设为6GB,而Sysbench的table (sbtest)一个row为250 Bytes,一亿个row为25GB,当SQL指令取用范围近乎均分散在整个table(Sysbench的测试特性),而且同时连线使用数多达一定数量时,MySQL会有大量的资料自磁碟载入,这所增加的负荷超会过InnoDB Buffer Pool所带来的记忆体缓存效益。



图二 - 每秒的读写量

进一步改善的可能性
由于时间的限制-这台x86 PC Server的借用期间只有一个月,同时在测时期间又参加在北京的Oracle OpenWorld 和大中华区的MySQL road show等杂事缠身,能用的时间实在不多,没有机会测操作系统层的调整对MySQL效能的影响。另外Table大小对效能的影响也没测到。
另外还有一些对效能有影响的MySQL参数在这次测试中也没用上,包括:
Innodb_commit_concurrency
InnoDB_log_file_size
InnoDB_flush_logs_at_trx_commit
InnoDB_file_io_threads
Innodb_max_dirty_pages_pct
row_format 设为compress
transaction isolation level的比较
等都值得一试,希望下次能有更寛裕的时间都能测到。

结论
新推出的MySQL5.5在效能、扩充性和可用度等各方面和以前的版本比起来均有长足的进步,尤其对最重要的储存引擎做了相当多的改良,使得MySQL更能发挥新进的硬件和操作系统在多个processor、大容量内存的优势,DBA们能用的工具也更多了,相对对的也使得MySQL的复杂度有些许的增加。本次的测试验证了MySQL的进步,和Oracle对MySQL用户的承诺。

MySQL5.5.8和5.0.77間運算效能的比較



前言
前一陣子和Oracle MySQL大中華區的業務負責人 – Sott Chen討論製做一個MySQL資庫設備(DB appliance)的可行性,希望能做出一套經過最佳化、即插即用(plug & play)的資料庫設備(database appliance),這套設備在經過驗證的情况下(例如,特定數量之下的同時連線數、資料量和SQL指令類型)可確保能達成一定的反應時間和吞吐量。這樣的設備需要的技術應該有:
1.多核芯的x86硬體
2.經過調校的Linux或Unix作業系統
3.經過調校的MySQL資料庫
為了驗證這個想法的可行性,先從我最容易做的地方開始-證明經過調校的新版MySQL資料庫的效能比一般Linux附帶的MySQL社群版資料庫效能更佳。

測試環境
1.基礎設備,我請Oracle Linux部門的Hans Qing幫忙找到一台HP x86伺服器,並在這台機裝上Oracle Enterprise Linux,這台機器的配備如下:
CPU: 4x Intel(R) Xeon(R) CPU E5520 @ 2.27GHz, 16 Cores
Memory: 12GB
Hard Disk:300GB
OS: Oracle Enterprise Linux 5.5.8 x86_64

測試方法
基本上這是一個壓力測試,我的測試據本是對一個有一億行的table做讀寫兼具的OLTP作業。
我用Sysbench當作測試工具,Sysbench和MySQL都在同一台x86的PC Server上,所以網路的遲滯影響可降到最低。
Sysbench的指令如下:
1.nohup sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=100000000 --mysql-user=root --mysql-password=welcome1 --mysql-host=127.0.0.1 --mysql-port=3306 prepare
上述指令,Sysbench會在MySQL資料庫建一個sbtest table,並且插入一億行到該table中(詳情請參閱Sysbench的手冊 http://sysbench.sourceforge.net/docs/ ) ,第一個指令-nohup是要Linux在背景執行隨後的sysbench指令,終端機可在執行sysbench時接受下一個的指令。
2.nohup sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=100000000 --num-threads= --mysql-user=root --mysql-password=welcome1 --mysql-host=127.0.0.1 --mysql-port=3306 run
上述指令是用Sysbench當成MySQL Client的程式對MySQL下達SQL指令,MySQL會將執行結果報的告存於nohup的記錄檔中-nohup.out,我們在執行上述指令時只需更動 的值就可模擬當併行連線數增加時MySQL之負荷狀況及效能的變化

MySQL的調整和設定
由於Innodb為本測試所用的主要storage engine,我的調整主要針對innodb的相關參數做設定,MySQL 5.0.77的參數檔(my.cnf)的內容如下:

max-connections=4000
innodb_buffer_pool_size=6G
key_buffer_size=512M
innodb_flush_method=O_DIRECT
innodb_file_per_table
innodb_log_buffer_size=4M

主要的調整有:
innodb_buffer_size設為6G,這台x86 PC Server有12G 的記憶體,因為Sysbench也用同一台機器,設得較為保守,如果整台機器只有MySQL在用應可設為總記憶體的75% (在這台機器上應為8G)。
innodb_flush_method設為O_DIRECT,少了重複的Buffer對DML會有一定的幫助。

MySQL 5.5.8的option file (my.cnf)調整如下:
max-connections=4000
innodb_buffer_pool_size=6G
key_buffer_size=512M
innodb_flush_method=O_DIRECT
innodb_file_per_table
innodb_log_buffer_size=4M
innodb_file_format=Barracuda
innodb_change_buffering=all
innodb_thread_concurrency=33
innodb_io_capacity=400
innodb_buffer_pool_instances=3
innodb_fast_shutdown=0
innodb_purge_threads=1

innodb_file_format設為新的格式-Barracuda,如此可完全發揮InnoDB 1.1的效能。
innodb_change_buffering設為all,在更改或刪除資料時,若該table有非主鍵的index,可加速效能。
innodb_thread_concurrency設為33,為核芯數的兩倍再加一。
innodb_io_capacity設為400,以增加在大量更新資料時flush dirty buffer到硬碟的效能。
innodb_buffer_pool_instances設為3,以增加同時做DML的容量。
innodb_fast_shutdown設為0,以開啟多個rollback segment,增加DML的容量和效能。
innodb_purge_threads設為1,以在main thread之外再開一個purge thread,以改善5.1版以前main thread的忙碌程度。

測試結果
本測試由10個threads逐次增加到1020個threads(模擬1020個同時連線),其結果如下兩圖所示,藍色線為經調校的MySQL 5.5.8的結果,橘線為經調校的MySQL 5.0.77的結果,黃線為未調校(完全用預設的參數) 的MySQL 5.0.77。圖一為每秒交易量,圖二為每秒的讀寫量。這兩種吞吐量的衡量標準的結果基本上是一致的。經調校的5.5.8比經調校的5.0.77效能高7%至60%,5.5.8比未經調校的5.0.77效能高113%到11%。5.5.8最佳吞吐量出現在208個threads,5.0.77則出現在96個threads。


圖一 - 每秒的交易量

另外,未經調校的MySQL 5.0.77的效能在threads數超過368時比經調校的MySQL 5.0.77高,可能的原因為InnoDB Buffer Pool在調校的MySQL設為6GB,而Sysbench的table(sbtest)一個row為250 Bytes,一億個row為25GB,當SQL指令取用範圍近乎均分散在整個table(Sysbench的測試特性),而且同時連線使用數多達一定數量時,MySQL會有大量的資料自磁碟載入,這個增加的負荷超會過InnoDB Buffer Pool所帶來的記憶體緩存效益。

圖二 - 每秒的讀寫量


進一步改善的可能性
由於時間的限制-這台x86 PC Server的借用期間只有一個月,同時在測時期間又參加在北京的Oracle OpenWorld 和大中華區的MySQL road show等雜事纏身,能用的時間實在不多,没有機會測作業系統層的調整對MySQL效能的影響,Table大小對效能的影響也没測到。
另外還有一些對效能有影響的MySQL參數在這次測試中也没用上,包括:
Innodb_commit_concurrency
InnoDB_log_file_size
InnoDB_flush_logs_at_trx_commit
InnoDB_file_io_threads
Innodb_max_dirty_pages_pct
row_format 設為compress
transaction isolation level的比較
等都值得一試,希望下次能有更寛裕的時間都能測到。

結論
新推出的MySQL5.5在效能、擴充性和可用度等各方面和以前的版本比起來均有長足的進步,尤其對最重要的儲存引擎做了相當多的改良,使得MySQL更能發揮新進的硬體和作業系統在多個processor、大容量記憶體的優勢,DBA們能用的工具也更多了,相對對的也使得MySQL的複雜度有些許的增加。本次的測試驗證了MySQL的進步,和Oracle對MySQL使用者的承諾。

2010年10月31日星期日

2010年9月23日星期四

MySQL Benchmark - A Telecom Case Study

Preface
This benchmarking exercise was part of a feasibility study for a Telecom User. This was one of my most important tasks during my last 6 months at Sun Software Practice, and through it I gained a lot of useful knowledge about MySQL. I wish here to show my special appreciation to my colleagues. Eric Li was technical account manager of the system team. He took care of platform issues and brought a lot of creativity to the project. Kajiyama-san is the MySQL evangelist for the Asian-Pacific area, and devoted ample time to addressing our special needs. He introduced some critical new features to the project and helped us solve many tough issues in a professional and timely manner.

Initiatives
The User is facing a vigorous challenge from competitors—and subsequently pressure from stockholders—expressing a need to squeeze more margin from a market which is becoming increasingly barren. The User hoped MySQL would bring them such benefits as:
  • Cost Savings
The high costs of the current proprietary database regimen has been a sore point since day one: It has continuously eroded operating margins. The User sharply felt a need to find some way out from this constraint, in order to free up cash for promising new IT development projects.
  • Open Source Model Adoption
The Open Source community has proven to be a source of truly mature innovations in recent years. The User expressed their hope that by using MySQL they could take advantage of some of this innovation and break the lock-in from the legacy database. In short, Open Source has for some time signaled the promise of increased real-world competitiveness.
  • The Special Case of CRD (Call Detail Records) Cost Scaling
CDR databases are huge. Under our current legacy database pricing regimen, only 3 months of data can be stored. This is a limiting factor, and is about to meet the challenge of upcoming new government regulations which will require telecoms to keep 6 months of call details online. MySQL imposes no scaled costs on database size. Seeing also that the preponderance of CDR operations are READ-only (relatively risk-free) and that those DML operations are performed in batch mode (not real-time critical) the case for a migration to MySQL is further strengthened. However CDR functionality, itself, is mission critical, and therefore the User needs to consider both the cost savings and risks carefully before committing to migration.

Objectives

The proposes of these benchmarks are as follows:

  • To evaluate the traits and performance characteristics of MySQL

Before deciding to use MySQL, the User needs to gain a clear understanding of MySQL, and know the pros and cons of any proposed migration paths. This information will help ensure that MySQL is used effectively.

  • To ascertain specific applications to which MySQL may be especially pertinent

The user has a large applications profile, ranging from small task force scheduling to a mission critical BOSS system. We need to know which applications can reasonably use MySQL. Ideally, MySQL would not be limited to small and simple applications. Perhaps certain large systems with specific traits could also be good candidates for migration. The further up the chain we can safely and effectively go, the greater the cost savings.

  • To collect sizing information for capacity plan and architecture design

Every database system has its own peculiar overhead costs and performance characteristics. Typically, the larger the data collections, the sooner one encounters idiosyncrasies. And it is these system traits that ultimately impact operating costs; for example in the area of backup and recovery. So we need to find out the traits of MySQL—coupled with varied storage engines—that could impact capacity plans and architecture designs for the target applications, some of which manage huge bases of information.

  • To evaluate possible cost savings

In setting up a test database that is designed to parallel a production one, we can take pains to measure the total costs associated with using MySQL (servers, software license, storage, network etc.) with some precision, and compare these figures against those for the existing platform. In short, we can know with some certitude the possible cost savings from MySQL.


Testing Tasks

  • Install, configure the environment
We configured two servers running Redhat ES 5 Linux and one storage server running Sun Open Storage, with 10 terabytes of capacity. The databases were configured using a master-slave replication architecture, detailed in the System Architecture section, below. To migrate data from the current CDR database, we created a virtual environment and installed Windows Server 2003 and SQLWays on the virtual machine. We used SQLWays to extract data from the current CDR database, and store it as plain, “flat” text files in a directory of the storage system. This is described in greater detail, below. This directory was network-shared with the Windows virtual environment and its host environment.
  • Setup a LVM snapshot for backing up the database
We set up LVM on the Redhat servers and created a logical volume for a MySQL data directory. We used LVM to create a snapshot against the data directory and dump the data from the snapshot to the backup sets directories. With this technique, the impact of backup operations on MySQL database availability would be decreased.
  • Install/Configure a monitoring system for collecting performance related data
In the beginning we used nmon to collect and show performance related information from the servers. Later, we used sar and ksar to simplify data collection and decrease overhead caused by nmon’s rather over-intensive data collection demon.
  • Port data from the original database to plain text CSV files
We used SQLWays to extract data from the proprietary database into CSV files. These were kept in shared storage. We dumped 650 gigabytes; data only, no indexes. This is one month’s worth of PBX-generated records.
  • Load data into MySQL and create indexes for each table
We used the "load data infile..." command to load data from the CSV files into MySQL tables and then regenerated indexes identical to those from the current production database.
Stress test with client applications to emulate DML from daily operations
Most of the DML operations performed on the CDR database are in batch mode. In the current production system, the procedure for stocking tables from the PBX with data that is then available for query is:
  1. load plain text data from the PBX into temporary tables. This step is basically identical to that described in the task described above; “Load data into MySQL…”.
  2. Use stored procedures to dispatch call data from the temporary tables into permanent tables—one table for each day, by date that the call was put. After the data has been loaded, the stored procedure regenerates indexes for permanent tables.
In implemented the above procedure, most of the effort would involve the rewrite of stored programs performing that second step. An easier task would be writing, testing, and introducing a processing step in which delimiters are inserted into raw PBX data: The MySQL load data command needs field delimiters, and data from the PBX currently lacks these. I wrote this program. It uses a property file that describes the delimiter character and field column widths of the source records.
  • Stress test with a multi-threaded Java program that issues SQL statements to emulate concurrent database accesses
This was one of the most challenging parts of benchmark development; to generate SQL command executions that closely match those of a real production system. I looked for an off-the-shelf testing tool to do this—where SQL commands roll up data generated from several tables dynamically, based on the time period that the queries are covered—but couldn’t find one. I applied the following strategy to solve the issue:
  1. With help from the User’s CDR operation team, we logged the SQL commands hitting the production database over a period of time. We got a file containing more than 5112 timed SQL statements, reflecting database activity over a 10-hour period.
  2. I wrote a Java program to perform the stress test. It reads the file mentioned above, and uses that information to emulate a query regimen against the MySQL database undergoing testing. This Java program logs both the queries and system responses in a CSV file for later performance evaluation. The program runs in multiple threads to simulate multiple concurrent connections.
  • Failover test with the MySQL master-slave architecture
We implemented a slave server as a failover database. The User needs the assurance that there will be no data lag between the master and slave servers in the event of failover. This needs to be tested.
There are 3 options for triggering failover with MySQL: Virtual IP, load balancer, and multiple hosts with the JDBC connection string. Due to resource constraints, and given that the last option was the cheapest, I only tested the last one. All I had to do was fill in a multitude of host names in the database host name parameter of the stress testing program while running the test.
  • Backup the database with an LVM snapshot and then recover the database from the backup sets—including support for incremental backup/recovery
I configured a logical volume for the MySQL’s data directory, and used Zmanda Recovery Manager as a user interface for creating a backup from the snapshot which was configured in the 2nd task mentioned above. I also tested incremental backup with ZRM.
  • Test the feasibility of point-of-failure recovery
For point-of-failure recovery, we need to recover the data from the full backup set stored in the backup target directory, and then apply the binary logs from the time after that full backup, up to the last records before the database crashed.

System Architecture

As the graph on the left shows, we had:

  1. One Sun X4600 server as the master server; this server also acted as host server for the virtual machine running Windows Server 2003. SQLWays ran on the virtual environment for unloading tables from the existing database into CSV files. The server is equipped with 8 CPUs each with 4 cores, the server has 32 cores in total, and 32 gigabytes of RAM.
  2. One Sun X4450, with two 4-core CPUs (8 cores in total), and 8 gigabytes of RAM. The server acted as the slave server.
  3. One Sun 7410 storage server, equipped with 64 gigabytes of RAM. We configured the 20 terabytes of disk capacity into a mirror, making the effective available space 10 terabytes. This has 4 data channels, each with of 1 gigabit/second bandwidth. In the beginning only one channel was used. In later tests this was increased to 2 channels for tuning the performance of data loading. The storage is shared between the two abovementioned servers, and each one had their own partition.

The results

  • Exporting data from original database to csv text files
SQLWays worked well, normally taking 1 hour and 45 minutes to export 22G bytes data (the data generated by the PBX in one day), I found that the job was I/O bound when exporting data with a single task thread: There should not be much room for improvement.
  • Loading data into MySQL and creating table indexes
We started with MySQL 5.1.41, with a built-in Innodb storage engine. We tested both Innodb and MyISAM tables, we found that loading and indexing the 22G bytes data—generated by the PBX in a single day—into Innodb tables took more than 24 hours: A totally unacceptable arrangement! We tuned the “my.cnf” option file, increasing the innodb_buffer_pool_size and changing the flush method to O_DIRECT. This shorted the data processing task to 12 hours 32 minutes; still significantly slower than the current legacy database which only requires around 8 hours to load data into temporary tables and then dispatch them to the permanent tables. Most daunting of all was the thought that if the database crashed while loading such a large amount of data, the time needed to recover from that failure would be very long when restarting the database; much more than the time needed to load the data.

Thanks to the new Innodb plugin storage engine, we found a way out. We used Innodb 1.0.6. set the file format to "Barracuda". The following table shows the results. The time for loading the 22 gigabytes of data shrunk to 42 minutes, with an additional 1 hour and 47 minutes to create the indexes—a tremendous improvement! Furthermore, loading and creating indexes with MyISAM tables provided even more stability. It took 51 minutes to load data, and index generation needed less than 278 minutes. Further details on our database tuning exercise will be narrated in the next post of this blog.
Although the Innodb plugin performed much better than the legacy database, it’s still quite annoying to have to wait a such a long time in the recovery stage when restarting MySQL from a crashed state while loading large amounts of data. And as most of the DMLs in the CDR system are performed by a batch job, there is no need to incur data integrity overhead on transaction, so we decide to use MyISAM in the later tests.

  • Stress testing with client application to emulate daily DML operation tasks
Because the raw data from the PBX has no separators, and the MySQL load data command needs delimiters to elucidate the fields, I wrote a program to add separators based on a properties file that can be edited by the users. This file contains two types of info, 1) a specification of the delimiter character, and 2) column positions for each delimiter.
Apart from loading data, almost all the DMLs are performed by a set of stored procedures which dispatch data in the temporary tables to permanent tables, with a new set of tables generated for each day. I rewrote the current stored procedures from the existing proprietary database language to MySQL's ANSI SQL compliant store procedures. The syntax converting job was not as hard as I thought it would be. After the learning experience of the first rewrite, the job became routine and easy. However, there were dynamic SQLs that created tables and indexes dynamically; it took a while to figure out how to implement them via local variables and call procedures with the variables as parameter. Details regarding this conversion experience will be described in a subsequent blog entry.
Most of the DMLs in the dispatching stored procedure are insert commands. The insert speed of MySQL stored procedures is amazing: It only took 57 minutes to dispatch 66 gigabytes/210 million rows of data (3 days worth of PBX data), and 3 hours 20 minutes to dispatch the data to tables, compared with more than 5 hours to dispatch a single day's data in the existing system; it's a significant leap!
Summing up the time required for the whole procedure –adding separators, loading the CSVs to temporary tables, and dispatch the temporary table to permanent tables and create indexes , 2 points stand out:
  1. In terms of time required, there is not much difference between creating indexes prior to or after the tables have been loaded with data.
  2. It took 15 hours and 20 minutes to go through the entire process dealing with 3 days of PBX data: That’s an average of 5 hours and 7 minutes per day. This is superior to the present regime. Furthermore, the User has indicated that they will be able to add field separators on the PBX side at record generation time: This means it will be possible to shorten our process by 3 hours 20 minutes, thereby needing only 1 hour and 20 minutes. This would be 6 times faster than the existing database!

  • Stress testing with multi-threaded Java program that issues SQL statements to emulate concurrent database access
As the following table shows, we ran the stress test program with 1 thread, 4 threads, and 8 threads, to simulate 1, 4, and 8 concurrent connections, respectively. The time needed to run the test was 3 hours and 5 minutes (11082 seconds), 11 hours and 14 minutes, and 11 hours and 21 minutes, respectively. The 1- and 4-thread trials had only 1 channel of 1-gigabits/channel connectivity. I observed there was IO contention between sorting in temp directory and table accessing, so I added one more channel (2 channels in total) when running the 8-thread test, and assigned the “tempdir” to the new partition that added by the new channel. This should be the reason for the results from the 4-thread and 8-thread tests being about the same (compare the results between 1 thread and 4 threads).

The following graphs show that the maximum response times for each test ranged between 420 to 546 seconds. Apart from the 6-fold increase between 1 thread and 4 threads, there is not much variation.

As the number of threads rises further to12 there are few exceptions shown in the response log, so it’s safe to say the request load ceiling of the database should between 8 to 12 threads. I use several PCs to run the stress test program, where the SQL requests return a few rows. I think any performance factors owing to the test tools and network bandwidth are negligible, and can be ignored.
The SQL statements used were taken from real-world database logs, containing database hit traffic over a 10 hour period. When I ran the test with 8 threads over 17 hours the total number of SQL requests the database had processed was 40896 (5112 * 8). In point of fact, these test cases were much more rigorous and demanding than a real situation could ever be. When the response times for the 5112 SQL statements were checked, I found that SQLs that return 0 to 1000 rows are returned within 3 seconds. But there were 63 cases where the request had to return 497848 to 64415 rows, response times fell between 25 to 408 seconds. As most of these large result set SQL requests occur consecutively, this was a notable bottle neck in the test. Out of curiosity, I tried distributing these larger request statements to temporally disparate points in the SQL test-bed command file, and congestion on these larger commands weren’t much different—pretty much returning the same performance stats under the 4- and 8-thread test regimens.
In response to this information, the User expressed concern that with a limited number of concurrent connections, these larger transactions would negatively impact access by the flurries of smaller transactions. So I decided to do some special testing to determine MySQLs behavior under a regiment in which both large and small transactions vie for attention. I derived two command files from the original real-world file; one containing only 63 large-transaction commands, and another containing only 100 small-transaction commands. I ran both command files simultaneously. The results are shown in the following table and graph. The number of threads does not appear to be much of a factor: The small transaction set finishes quickly regardless. However, the response times for the large result requests increased linearly with the thread count, as the following table and graph shows. When the threads number increased from 12 to 28, the maximum response time increased from 640 seconds to 1138 seconds, and the run durations increased from 1 hour and 42 minutes to 2 hours and 41 minutes.
I noticed that MySQL uses temp directory to deposit its temporary tables for sorting rows requested by transactions. It occurred to me that this whole enterprise may have been haunted by a simple disk I/O bottleneck, related to a configured limitation in the number of available storage IO channels. In cases where the number of threads exceeded 20, I configured additional IO channels and mapped the "tempdir" to the mount point of the new IO channel. It fixed the problem! I was able to elevate the number of concurrent connections to 28 with no exception in the log file.
  • Failover testing with the MySQL master-slave architecture
I tested the failover by assigning multiple host names in the connection string of the MySQL JDBC driver. I ran the stress test program with 2 host names (one for the master server, one for the slave) in the host name parameter. It showed that when I shut down the master server while running the test program, currently connected threads cleanly threw database exceptions on the console. Threads attempting to connect after the shutdown were “failed overed” and queried the slave server properly. Further improvements in connection exception handling should be readily obtainable by simply adding code in the exception handler section of the client programs to failover the connection to the 2nd database.
  • Backing up the database with LVM snapshot and recovering the database from the backup sets; incremental backup/recovery also included
Because MySQL’s built-in backup solution is very limited, and there is very significant overhead to backing up large volumes of data, augmenting the backup with a snapshot would be a good idea. There are OS level snapshot solutions such as ZFS for Solaris, and LVM for Redhat, and on the storage level snapshot, you have thin provisioning, iSCSI, and NFS which are supported by Sun Open Storage. OS snapshot solutions are easier to integrate into Backup tools such as Zmanda.
I used LVM for the snapshot and ZRM to manage the backup, but there is a pitfall for novice Users: If you wish to create a dedicated disk for backup set storage, apart from the logical volume for the MySQL’s data directory, it’s necessary to create another logical volume for the backup set storage. Otherwise erroneous recursive inserting on the same logical volume kicks in, and space is quickly exhausted.
ZRM performed well both in full backup and incremental backup, but when I tried to recover the database from the time before the full backup started to the time that an incremental backup had been performed, the ZRM console indicated a duplicate data insertion error. I think this might have been caused by data from before the full backup in the binlog having been applied. I have not discussed it with Zmanda yet, but I think it could be solved by 1) first generating a log switch before the full backup, then 2) removing the binlog files that contain data from before the full backup when the full backup is finished.
  • Testing the feasibility for point-of-failure recovery
With ZRM's nice user interface, it is easy to do a point-of-failure recovery with full backup and binlog. I think that in a production environment, storing binlog files in a volume separate from the data directory would be safer.

Conclusion
MySQL is unique for its plugable storage engine. This gives Users freedom to tune the database to the characteristics of the applications. For applications with few online DMLs and a lots of queries (e.g., CDR), we can use MyISAM for the tables storage engine to take advantage of the low overhead and the easy maintenance and operation. On the other hand, for tables requiring online transaction performance, while carefully maintaining data integrity, we can use Innodb as the storage engine; the higher overhead is warranted by the application.
As Innodb evolves, we can expect to see further performance benefits via a faster data loading speed. Benchmarks indicate a 5-fold speed improvement, which could cut total operating time from 12.5 to 2.5 hours. I believe that with MySQL 5.5, the performance should improve yet further, and I am looking forward to trying it out in another set of benchmark tests.