星期二, 五月 08, 2012

Bug 9290526 - Poor plan for recursive SQL used for DML involving a UNIQUE constraint [ID 9290526.8] …


A suboptimal plan may be seen for the specific recursive SQL:
select c.name, u.name
from con$ c, cdef$ cd, user$ u where c.con# = cd.con# and cd.enabled =:1 and
c.owner# = u.user#;

This can cause inserts involving a unique constraint to show poor performance.

Workaround
Set optimizer_mode=rule for the affected DMLs.

https://supporthtml.oracle.com/epmos/faces/ui/km/SearchDocDisplay.jspx?_afrLoop=2093761045238000&type=DOCUMENT&id=9290526.8&displayIndex=2&_afrWindowMode=0&_adf.ctrl-state=8wo4jycn3_44

sys@test>explain plan for
2 select c.name, u.name from con$ c, cdef$ cd, user$ u where c.con# =
3 cd.con# and cd.enabled = :v1 and c.owner# = u.user#;

Explained.

sys@test>@plan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2222027377

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 336 | 10 (10)| 00:00:01 |
|* 1 | HASH JOIN | | 7 | 336 | 10 (10)| 00:00:01 |
|* 2 | HASH JOIN | | 7 | 203 | 8 (13)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| CDEF$ | 7 | 56 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | I_CDEF4 | 1 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | CON$ | 3584 | 75264 | 5 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | USER$ | 33 | 627 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("C"."OWNER#"="U"."USER#")
2 - access("C"."CON#"="CD"."CON#")
4 - access("CD"."ENABLED"=TO_NUMBER(:V1))

20 rows selected.

sys@test>explain plan for
2 select /*+ rule */ c.name, u.name from con$ c, cdef$ cd, user$ u where c.con# = cd.con# and cd.enabled = :v1 and c.owner# = u.user#;

Explained.

sys@test>@plan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1027684349

-------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | NESTED LOOPS | |
| 2 | NESTED LOOPS | |
| 3 | TABLE ACCESS BY INDEX ROWID| CDEF$ |
|* 4 | INDEX RANGE SCAN | I_CDEF4 |
| 5 | TABLE ACCESS BY INDEX ROWID| CON$ |
|* 6 | INDEX UNIQUE SCAN | I_CON2 |
| 7 | TABLE ACCESS CLUSTER | USER$ |
|* 8 | INDEX UNIQUE SCAN | I_USER# |
-------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("CD"."ENABLED"=TO_NUMBER(:V1))
6 - access("C"."CON#"="CD"."CON#")
8 - access("C"."OWNER#"="U"."USER#")

Note
-----
- rule based optimizer used (consider using cbo)

26 rows selected.

星期三, 四月 11, 2012

星期六, 六月 18, 2011

NUMA架构下oracle的oom异常

numa oracle oom

3.NUMA+oracle的一些问题

1.rh54-hugepage oraclesga并非完全预分配,hugepage应该是预分配出page的,虽然没有划给SGA,所以内存分配上应该不会导致问题。另外,hugepage实际分配的大多数位于node1pagecache位于node0,虽然整体较为平均,不确定是否会有影响,不过个人偏向于影响不大。

2.numa 进程倾向总是从同一node分配内存,对于oracle,由于使用filecache的进程以arch进程为主,且大部分时候oracle只有一个arch进程工作,可能导致filecache过分集中到一个node里面。

3.numa vm.zone_reclaim_mode = 0,该模式下,node内存回收会很懒惰。

4.oracle oracle可能使用了bind方式在node中申请内存,结合上面所述,该模式并不是很合适。

oracle DG VCS切换异常问题

oracle11.2.0.2 dg vcs ha 切换异常

现象:oracle11.2.0.2+vcs,当有主库的public进程连接到备库时,ha无法正常完成备库实例的切换。

问题关键点如下:

1oracle11.2.0.2 shutdown immediate操作中,会等待active call状态的进程,(不确定是会等待所有处于active call状态的进程还是单单active call状态的RFS进程);并且这个等待可能是没有超时或者超时设置相当长(从测试中看超过1h)。shutdown操作会一直卡在ipcs看到的第一个信号量(释放?)处,对应系统调用一直在报超时错误。

2)当监听器所侦听的网络异常(这里是IPdown掉)时,主库与备库RFS进程的网络连接会持续相当长的时间(超过1h,不确定是否有超时设置,16010, 00000, "disable stale RFS process extermination"?),主库连接是处于ESTABLISHED,备库处于SYN-SENTtcpdump显示备库一直在向主库发包,但是没有收到回应。

3VCS配置中,当监听offline后,即可offline VIP,从VCS日志可以看到,oraclevipoffline是并发操作的。

解决方案:修改oracle instance依赖于vip,删除listener对vip的依赖即可。

星期六, 二月 19, 2011

slave_exec_mode

http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#sysvar_slave_exec_mode

一直都习惯使用slave_skip_errors来忽略类似键值冲突和找不到键错误,偶然发现原来mysql里面竟然有这么一个参数,而且可以动态修改,当然由于未曾在生产环境使用过,还是要谨慎对待之。

http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#sysvar_slave_exec_mode

星期二, 一月 11, 2011

记录:Bug 8575528 Missing entries in V$MUTEX_SLEEP.location

Bug 8575528 Missing entries in V$MUTEX_SLEEP.location

Affects:

Product (Component)Oracle Server (Rdbms)
Range of versions believed to be affectedVersions >= 10.2.0.4 but BELOW 10.2.0.5
Versions confirmed as being affected
  • (None Specified)
Platforms affectedGeneric (all / most platforms affected)

Description

This problem is caused by the fix for bug 6795880.
 
Null mutex locations may be seen in V$MUTEX_SLEEP or a core dump 
encountered in a call from kqlfMutexClnFetch() to strcpy(). 
 
Note: Fixes for this bug also fix the issue described in bug 6795880
 
Note:
  This fix is disabled by default in 10g.
  To enable this fix you must explicitly set the following
  parameter for instance startup:
    "_cursor_features_enabled" = 10
 

星期二, 十二月 21, 2010

try mysql5.5.8

install mysql558

首先阅读下官方文档说明http://dev.mysql.com/doc/refman/5.5/en/source-installation.html

按照cmake吧,就不说了,下载源码,改bug

http://bugs.mysql.com/bug.php?id=58350
http://lists.mysql.com/commits/126782

编译

CC='/usr/bin/gcc'
CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIC -Wall -O3 -g -static-libgcc -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF'
CXX='/usr/bin/gcc'
CXXFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -felide-constructors -fno-exceptions -fno-rtti -fPIC -Wall -Wno-unused-parameter -fno-implicit-templates -fno-exceptions -fno-rtti -O3 -g -static-libgcc -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF'
LDFLAGS=''
ASFLAGS=''
cmake -DCMAKE_INSTALL_PREFIX=$MYSQL_HOME
-DDEFAULT_CHARSET=gbk
-DMYSQL_UNIX_ADDR=$MYSQL_HOME/run/mysql.sock
-DWITH_INNOBASE_STORAGE_ENGINE=1
-DWITH_PERFSCHEMA_STORAGE_ENGINE=1
-DWITHOUT_ARCHIVE_STORAGE_ENGINE=1
-DWITHOUT_BLACKHOLE_STORAGE_ENGINE=1
-DWITHOUT_EXAMPLE_STORAGE_ENGINE=1
-DWITHOUT_FEDERATED_STORAGE_ENGINE=1
-DWITHOUT_PARTITION_STORAGE_ENGINE=1
-DWITHOUT_ARCHIVE_STORAGE_ENGINE=1 .

make -j 16

make install

InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
101221 13:51:32 InnoDB: Using Linux native AIO
101221 13:51:32 InnoDB: Initializing buffer pool, size = 32.0G
101221 13:51:33 InnoDB: Completed initialization of buffer pool
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
101221 13:51:33 InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
101221 13:51:33 InnoDB: Log file ./ib_logfile0 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile0 size to 500 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Progress in MB: 100 200 300 400 500
101221 13:51:34 InnoDB: Log file ./ib_logfile1 did not exist: new to be created
InnoDB: Setting log file ./ib_logfile1 size to 500 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Progress in MB: 100 200 300 400 500
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: 127 rollback segment(s) active.
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
101221 13:51:37 InnoDB: 1.1.4 started; log sequence number 0
101221 13:51:37 [Warning] 'user' entry 'root@userext4-ztb' ignored in --skip-name-resolve mode.
101221 13:51:37 [Warning] 'user' entry '@userext4-ztb' ignored in --skip-name-resolve mode.
101221 13:51:37 [Warning] 'proxies_priv' entry '@ root@userext4-ztb' ignored in --skip-name-resolve mode.
101221 13:51:37 [Note] Event Scheduler: Loaded 0 events
101221 13:51:37 [Note] /home/oracle/mysql/bin/mysqld: ready for connections.
Version: '5.5.8-log' socket: '/home/oracle/mysql/run/mysql.sock' port: 3306 Source distribution

星期二, 十一月 30, 2010

mysql bug 55981

blob bug


http://www.mysqlperformanceblog.com/2010/11/29/data-corruption-drbd-and-story-of-bug/
http://bugs.mysql.com/bug.php?id=55981

root@test 06:52:04>CREATE TABLE t4(c1 int,c2 int,c3 MEDIUMBLOB,PRIMARY KEY(c1)) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

root@test 06:52:05>insert into t4 values(1, 2, REPEAT('b', 54903));
Query OK, 1 row affected (0.04 sec)

root@test 06:52:12>start transaction;
Query OK, 0 rows affected (0.00 sec)

root@test 06:52:18>update t4 set c1 = 3;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0

root@test 06:52:24>rollback;
Query OK, 0 rows affected (0.01 sec)

root@test 06:52:30>start transaction;
Query OK, 0 rows affected (0.00 sec)

root@test 06:52:37>update t4 set c1 = 3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

root@test 06:52:41>rollback;
ERROR 2013 (HY000): Lost connection to MySQL server during query
悲剧了

星期五, 十月 15, 2010

service_name or sid

service_name or sid

问题起源于itpub一个帖子http://www.itpub.net/thread-1358298-1-1.html,答案呢,通过service_name实现failover和lb自然是一方面,顺便查了下oracle文档相关介绍,还是有些特别的发现的。

http://download.oracle.com/docs/cd/B19306_01/network.102/b14213/tnsnames.htm#sthref754
.6.4.8 SERVICE_NAME
Purpose

Use the SERVICE_NAME parameter to identify the Oracle9i or Oracle8 database service to access. Set the value to a value specified by the SERVICE_NAMES parameter in the initialization parameter file.

6.6.4.9 SID
Purpose

Use the SID parameter to identify the Oracle8 database instance by its Oracle System Identifier (SID). If the database is Oracle9i or Oracle8, use the SERVICE_NAME parameter rather than the SID parameter.

http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams188.htm#REFRN10194
Real Application Clusters Do not set the SERVER_NAMES parameter for Real Application Clusters (RAC). Instead, define services using Database Configuration Assistant (DBCA) and manage services using Server Control (SRVCTL) utility.

http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/concepts.htm#i1041507
http://download.oracle.com/docs/cd/B19306_01/network.102/b14212/architecture.htm

这个里面有更为详细的介绍

星期三, 八月 18, 2010

星期四, 七月 29, 2010

Surge Scalability 2010

Surge Scalability 2010

Surge Scalability 2010,值得关注下,转一位speaker的演讲概要在这里,希望有所启发。

http://ronaldbradford.com/blog/speaking-at-surge-scalability-2010-2010-07-28/

Overview:

Some details of the presentation would include:

  • The different types of accessible data (e.g. R/W, R, none)
  • What limits MySQL availability (e.g software upgrades, blocking statements, locking etc)
  • The three components of scalability – Read Scalability/Write Scalability/Caching
  • Design practices for increasing scalability and not physical resources
  • Disaster is inevitable. Having a tested and functional failover strategy
  • When other products are better (e.g. Static files, Session management via Key/Value store)
  • What a lack of accurate monitoring causes
  • What a lack of breakability testing causes
  • What does “No Downtime” mean to your organization.
  • Implementing a successful “failed whale” approach with preemptive analysis
  • Identifying when MySQL is not your bottleneck

星期四, 七月 01, 2010

On MySQL replication, again…

http://mituzas.lt/2010/06/30/replication/

http://mituzas.lt/2010/06/30/replication/

很是认同,转载下

There are few things one is supposed to know about MySQL replication in production, as manual doesn’t always discuss things openly.

This is small set of rules and advices I compiled (some apply to statement based replication only, but row based replication may benefit from one or two too):

  • Don’t use MyISAM. Or in fact, don’t use any non-transactional engine, if you care about your data. On either side, master or slave, or both – using non-transactional engines will cause data drift, as partially executed statements on master would be fully executed on slave, or simply stop replication. Of course, every crash has the capacity of getting your tables desynced from each other and there are absolutely no guarantees.
    This “don’t” can be easily transformed into “do” – just use InnoDB. Bonus point – one doesn’t need to take down the server, to clone a slave from a master :)
  • Don’t use temporary tables. MySQL manual is very funny about temporary tables in replication, it says “do not shut down the slave while it has temporary tables open.” That of course means that you’re not supposed to crash either – and more slaves there are, more of them will crash because of various reasons (e.g. solar flares).
    The operational overhead temporary tables add is huge – even though it may not show up in the benchmark.
  • Prefer simple, idempotent statements. If one can replay same statements multiple times without having database drift it doesn’t matter much if replication position is somewhat outdated. Updating rows by PK to fixed values, avoiding multiple table updates/deletes can allow to recover after crash much faster.
  • Set sync_binlog=1. This will introduce biggest bottleneck for transactions, but losing 30s of data may be worse (as this will force to do full slave resync in most cases). On really busy servers one can go for higher values (e.g. sync every 20 transactions), but 0 is asking for disaster.
  • Avoid long running updates. Though all long statement would cause on a master is slightly longer locking window and some performance pressure, once it gets replicated to the slave, all the updates will have to wait for the giant one to finish, in many cases rendering the slave useless.
    If something big has to be replicated, either split it into smaller chunks or run it directly against slaves (with binary logging on the master disabled for it).
    Splitting into smaller chunks can allow wait-for-slave logic to be implemented, thus not having any major impact on production environments.
  • Don’t use replicate-do-db. Or replicate-ignore-db. They both rely on database context, and statements like ‘INSERT INTO database.table’ will fail.
    If you need it, use replicate-wild-do-table=db.% – but even then, be careful with cross-database statements, that involve tables from multiple databases – as they may be filtered out…
  • Note the multiversioning. Some statements may become replication performance hogs because of long-running transactions (backups? reporting? ETL?) running on slaves – it may not need to rescan all the row versions on master, but they’d be still there on a slave. Such statements may need to be rewritten to avoid scanning gaps with too many invisible rows, or long transactions have to be split.

Though probably the best advice I can give now is “call your mysql vendor and ask for transactional replication“. Server, rack, datacenter crashes will not cause excessive work on fixing replication – it will be always consistent. One can even disable log syncing to disk then o/

星期一, 六月 21, 2010

innodb急着奔1.1?

作者: 弦乐之花 | 可以转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明

链接http://shiri512003.itpub.net/post/37713/501173


5148的发布把innodb plugin带入了1.0.9时代,5147innodb plugin更新到了108版本还是马马虎虎能接受,5148的109版本真是乱来了,大家可以看下109的版本变化,innodb本身没有任何变化嘛。。。

http://dev.mysql.com/doc/refman/5.1/en/news-5-1-48.html

InnoDB Plugin has been upgraded to version 1.0.9. This version is considered of General Availability (GA) quality. InnoDB Plugin Change History, may contain information in addition to those changes reported here.

In this release, the InnoDB Plugin is included in source and binary distributions, except RHEL3, RHEL4, SuSE 9 (x86, x86_64, ia64), and generic Linux RPM packages. It also does not work for FreeBSD 6 and HP-UX or for Linux on generic ia64.

不禁在想109的下个版本是什么?1011 or 1.1?oracle这么着急推高plugin的版本,难道是为了55X铺路?

不管怎么样,个人对于plugin的这个版本也是很失望的。

星期五, 六月 18, 2010

Percona release 5147

作者: 弦乐之花 | 可以转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明

链接http://shiri512003.itpub.net/post/37713/501132


最近没太关注percona分支的mysql5145以上版本了,(其实也是一直很期待基于5147的percona版出现的)今天偶然发现percona已经是发布了5.1.47-11.0,(oracle已经release了5148版,细细看了release note,没有升级的欲望也是)于是下来测试一把,发现还是有点意思的
首先看下configue命令的帮助,比较有趣的percona把builtin版本的编译选项修改成只支持动态编译,plugin则支持动静态编译,刚开始被耍了一把,不过还是大快人心的,一直觉得oracle也有必要改下了,也许考虑很多用户还是在用builtin的缘故吧
=== InnoDB Storage Engine ===
Plugin Name: innobase
Description: Transactional Tables using InnoDB
Supports build: dynamic
Configurations: max, max-no-ndb

=== InnoDB Storage Engine ===
Plugin Name: innodb_plugin
Description: Transactional Tables using InnoDB
Supports build: static and dynamic
Configurations: max, max-no-ndb

启动下刚编译的mysql,日志文件记录到下面信息
100618 17:17:46 Percona XtraDB (http://www.percona.com) 1.0.8-11.0 started; log sequence number 124289766

进去看看版本变量值
root@(none) 05:17:58>select @@innodb_version;
+------------------+
| @@innodb_version |
+------------------+
| 1.0.8-11.0 |
+------------------+
1 row in set (0.00 sec)

另外,percona版新增34个参数,还有一些参数初始值不同,经常关注这个分支的话对这些额外参数也应该是比较熟悉的,这里不啰嗦了

btw,现在percona的blog还没有该版本的release声明,官方网站已经是打出来了,有兴趣的同学可以下来玩玩
http://www.percona.com/software/percona-server/
http://www.percona.com/docs/wiki/percona-server:release_notes_51

星期五, 六月 11, 2010

ibdata1

ibdata1

记得之前曾在Pub上看到innodb共享表空间暴涨的案例,自己在实际测试中也遇到多(高并发情况),其中缘由也是知晓一二,今天看到peter的两篇博文进行了详细的描述,简单记录下来备查。

http://www.mysqlperformanceblog.com/2010/06/10/purge-thread-spira-of-death/

I just wrote a large post on reasons for innodb main tablespace excessive growth and I thought it would make sense to explain briefly of why it is so frequently you have purge not being the problem at all and when out of no where you can see purge thread being unable to keep up and undo table space explodes and performance drops down. Here is what happens.

When you have typical OLTP system with small transactions your UNDO space is small and it fits in buffer pool. In fact most of the changes do not need to go to the disk at all – the space for undo space is allocated, used and freed without ever needing to go to the disk.

Now when you have spike in writes or long running transactions which increases your undo space size it may be evicted from buffer pool and stored on disk. This is when problems often starts to happen. Now instead of purge thread simply operating in memory it has to perform IO which slows it down dramatically and makes it unable to handle amount of changes coming in.

The solution to this problem may range from pacing the load (which is helpful for batch job operations as it can be controlled), using innodb_max_purge_lag or enable separate purge thread (or threads) via innodb_use_purge_thread if you’re running Percona Server.

http://www.mysqlperformanceblog.com/2010/06/10/reasons-for-run-away-main-innodb-tablespace/

So you're running MySQL With innodb_file_per_table option but your ibdata1 file which holds main (or system) tablespace have grown dramatically from its starting 10MB size.
What could be the reason of this growth and what you can do about it ?

There are few things which are always stored in main tablespace - these are system tables, also known as data dictionary - table and index definitions and related stuff, double write buffer, insert buffer which temporary stores records to be inserted in the index tree and undo segment holding previous versions of changed rows.

The system tables size depends on the number and complexity of the tables you have in the system. If you have hundreds of thousands of tables it may consume gigabytes of space but for most installations we're speaking about tens of megabytes or less. Double Write Buffer is fixed size (2MB = 128 of 16KB pages) and so will not affect growth of your main Innodb Tablespace. Insert Buffer size is also restricted to half of the buffer pool size (can be changed via innodb_ibuf_max_size option) which can be significant size for systems with large amount of memory. Finally undo space can grow unbound depending on your transaction size.

The challenge is both Insert buffer and undo space will grow and shrink during the database operation and unless you can them on being large your would not know what caused your system tablespace size to explode - they will shrink to the normal size and all what you have will be free space, which you unfortunately can't reclaim without reloading tour database. This is where Trending can help, for example MySQL CACTI Templates will have the graphs you need.

If you're looking at SHOW INNODB STATUS this is how you can see Insert Buffer Size:

SQL:
  1. Ibuf: size 108931, free list len 64619, seg size 173551,

In this case we can see the segment is allocated to 173551 pages which is about 2.7GB only about 2/3 of it is in use right now but for sake of monitoring tablespace size you need a full allocated number.

Lets now look at the Undo Space which is a lot more interesting (and which is the most likely cause of getting extremely large system tablespace)

The records can be stored in the undo tablespace for 2 reasons. First it is re

SQL:
  1. History list length 4000567

In this case the History Length is about 4 million which means there are 4 million of transactions which are not yet purged. This is not very helpful as single transaction may modify single row and so be responsible for one undo entry or it may modify millions of rows. Row size can also vary a lot. For many OLTP applications though which have a lot of tiny transactions this is a very good indicator.

If you're running Percona Server the following can be helpful:

SQL:
  1. mysql> SELECT * FROM innodb_rseg;
  2. +---------+----------+----------+---------+------------+-----------+
  3. | rseg_id | space_id | zip_size | page_no | max_size | curr_size |
  4. +---------+----------+----------+---------+------------+-----------+
  5. | 0 | 0 | 0 | 6 | 4294967294 | 20993 |
  6. +---------+----------+----------+---------+------------+-----------+
  7. 1 row IN SET (0.00 sec)

curr_size will display the current size of RSEG which defines undo segment size, though there is a complex data structure and you can't easily convert this number to number of data pages for undo slots or number of rows stored.

From the practical standpoint there are 3 reasons a lot of undo space may be required:

Running Transaction which does a lot of changes - If transaction modifies a lot of rows it has to use a lot of space in undo segment and there is no way around it. Be careful in particular with update or delete transactions which go over a lot of rows. In many cases doing such processes in chunks, updating/deleting may be thousands of rows per transaction may be better if your application can handle it. Note ALTER TABLE will not require excessive amount of undo space even for very large tables as it internally commits every 10000 rows.

Running Very Long Transaction If you're running very long transaction, be it even SELECT, Innodb will be unable to purge records for changes which are done after this transaction has started, in default REPEATABLE-READ isolation mode. This means very long transactions are very bad causing a lot of garbage to be accommodated in the database. It is not limited to undo slots. When we're speaking about Long Transactions the time is a bad measure. Having transaction in read only database open for weeks does no harm, however if database has very high update rate, say 10K+ rows are modified every second even 5 minute transaction may be considered long as it will be enough to accumulate about 3 million of row changes.

Purge Thread Falling Behind This is the most dangerous reason. It is possible for database updates happen faster than purge thread can purge records when they are no more needed which means undo space can just grow until it consumes all free space (or specified max size for ibdata1 file). "Good" thing is performance typically starts to suffer terribly well before that and it gets noticed. There are to things you can do about this problem first you can use innodb_max_purge_lag to make a threads doing modifications slow down if purge thread can't keep up. This however does not work in all cases. If you're running XtraDB you can also use innodb_use_purge_thread to use dedicated purge thread, which works a bit faster as it does not need to compete with other activities of main theread. You can also use more than one purge thread by setting this variable to higher values though this functionality is a bit experimental.

One related question I get asked often is why Master and Slave may get so much different table space size in the end. The answer is of course their workload is very different. On one hand slave has all updates going from one thread so it has less chance for purge thread to fall behind, on other a lot of people use slave for very long queries which may make long transactions reasons a lot more likely. So it can be both - either master or slave can have it main tablespace growing larger than its counterpart.

Writing this blog post I also discovered even in XtraDB we do not have as much transparency in regards to undo space and purging as I would like. I filed number of feature requests and I hope we'll have a chance to fix this soon.

http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_max_purge_lag

innodb_max_purge_lag

Command-Line Format--innodb_max_purge_lag=#
Config-File Formatinnodb_max_purge_lag
Option Sets VariableYes, innodb_max_purge_lag
Variable Nameinnodb_max_purge_lag
Variable ScopeGlobal
Dynamic VariableYes
Permitted Values
Typenumeric
Default0
Range0-4294967295

This variable controls how to delay INSERT, UPDATE, and DELETE operations when purge operations are lagging (see Section 13.6.9, “InnoDB Multi-Versioning”). The default value 0 (no delays).

The InnoDB transaction system maintains a list of transactions that have delete-marked index records by UPDATE or DELETE operations. Let the length of this list be purge_lag. When purge_lag exceeds innodb_max_purge_lag, each INSERT, UPDATE, and DELETE operation is delayed by ((purge_lag/innodb_max_purge_lag)×10)–5 milliseconds. The delay is computed in the beginning of a purge batch, every ten seconds. The operations are not delayed if purge cannot run because of an old consistent read view that could see the rows to be purged.

A typical setting for a problematic workload might be 1 million, assuming that transactions are small, only 100 bytes in size, and it is allowable to have 100MB of unpurged InnoDB table rows.

The lag value is displayed as the history list length in the TRANSACTIONS section of InnoDB Monitor output. For example, if the output includes the following lines, the lag value is 20:

------------
TRANSACTIONS
------------
Trx id counter 0 290328385
Purge done for trx's n:o < 0 290315608 undo n:o < 0 17
History list length 20