Oracle-Developer.com

 

Navigation: Home  | Discussion Forums (Get expert advice)  |  Scripts  |  About Us  | Links  | Job Openings  

 

 

Oracle 11g Multi Column CBO Statistics

By V.J. Jain, January 2008 (varun.jain@varunjaininc.com)
Varun Jain, Inc. (an Oracle Partner)


 

Oracle 11g has introduced extended statistics that help the cost based optimizer (CBO) to make better decisions using statistics on groups of columns when multiple columns for a single table are specified in the where clause of the query.  In addition to the standard statistics, histograms can also be created on these column groups for improved cost estimates when a skew exists in the data distribution of the column group. 

 

In order for the CBO to make its execution decisions for a query, it uses the available statistics to calculate cost estimates for possible access paths.  If column histograms exist, the optimizer uses these histograms to compute the selectivity of predicate values.  This selectivity is a factor in estimating costs and consequently choosing the best access path.  Prior to 11g, statistics could only be created for individual columns.  The inability of the CBO to realize the relationships between multiple columns on a single table is a significant limitation on the accuracy of cost estimates. 

                    

Consider a table that lists employee ID’s, the years they have been employed, and their annual bonus in thousands.  In this company, the annual bonus of an employee is a calculated variable based on their years employed (years * $1000).  Traditional column histograms can provide the optimizer with information about the distribution of values in each of these columns.  However, querying the table by specifying both years employed and annual bonus, the CBO doesn’t have any means of realizing the correlation between these two columns.  If we wanted to find employees who have been employed for 1 year but have a 20,000 bonus, the CBO would not realize that these two predicates yield an empty result set.  Instead, the CBO would use statistics on each individual column possibly leading to an inefficient execution plan.  For example, if the majority of employees at this company had been employed for only one year, the optimizer would likely favor a full table scan over an index access.  If the optimizer knew that combining the two predicate values would return no records, the CBO could use that information to make better cost calculations. 

 

The new MultiColumn statistics feature addresses this limitation by allowing for the creation of column groups within a single table.  These column groups behave similar to a new column on the table.  Their statistics are gathered using the DBMS_STATS package.  Additionally, dictionary information for column groups is available from the same dictionary tables as regular columns.  Included in the column group statistics is the number of distinct values and histograms (also known as hybrid histograms).  As a result of these features, the optimizer can use the column group statistics to generate better cost estimates than previous Oracle releases by correlating statistics for multiple columns. 

 

 

To demonstrate this new feature, look at the following test case using ALL_OBJECTS.  Under an Oracle 11g base installation, certain queries on this table can result in inefficient execution plans when using multiple columns in the where clause.  With the base installation, the majority of objects are owned by PUBLIC and the second most common object type is JAVA CLASS.

 

 

SQL> select owner, count(*) from all_objects group by owner

  2  order by count(*) desc

  3  /

 

OWNER                            COUNT(*)

------------------------------ ----------

PUBLIC                              26418

SYS                                 23166

ORDSYS                               2099

XMLTEST                              1656

MDSYS                                 856

OLAPSYS                               239

XDB                                   215

FLOWS_020200                          147

WMSYS                                 117

CTXSYS                                 87

EXFSYS                                 80

 

SQL> select object_type, count(*) from all_objects group by object_type

  2  order by count(*) desc

  3  /

 

OBJECT_TYPE           COUNT(*)

------------------- ----------

SYNONYM                  26418

JAVA CLASS               21956

VIEW                      1793

TYPE                      1658

JAVA RESOURCE              818

TABLE                      612

INDEX                      487

PACKAGE                    428

LOB                        289

FUNCTION                   228

TRIGGER                    145

 

 

While the predicates for OWNER= ‘PUBLIC’ and OBJECT_TYPE= ‘JAVA CLASS’ each represent a large portion of the total rows (over 40%), there are no java classes owned by PUBLIC.  This provides a good situation for creating a demonstration.  Using a copy of this dictionary table shows how the optimizer uses multiple column statistics.  Start by creating a test table using the data in ALL_OBJECTS.  Create an index on owner and object_type and finally gather stats on the table.

 

SQL> create table allobjects_test1

  2  as

  3  select * from all_objects

  4  /

 

Table created.

 

SQL> create index TEST1_IDX1 on allobjects_test1 (owner, object_type);

 

Index created.

 

SQL> begin

  2   DBMS_STATS.GATHER_TABLE_STATS(NULL,'ALLOBJECTS_TEST1', method_opt => 'for all columns size skewonly');

  3  end;

  4  /

 

PL/SQL procedure successfully completed.

 

 

View the column statistics on the new test table…

 

SQL> select column_name, num_distinct, histogram from user_tab_col_statistics

  2  where

  3  table_name = 'ALLOBJECTS_TEST1'

  4  /

 

COLUMN_NAME                    NUM_DISTINCT HISTOGRAM

------------------------------ ------------ ---------------

OWNER                                    13 FREQUENCY

OBJECT_NAME                           31096 HEIGHT BALANCED

SUBOBJECT_NAME                            0 NONE

OBJECT_ID                             55095 NONE

DATA_OBJECT_ID                         1181 HEIGHT BALANCED

OBJECT_TYPE                              27 FREQUENCY

CREATED                                2983 HEIGHT BALANCED

LAST_DDL_TIME                          2619 HEIGHT BALANCED

TIMESTAMP                              3020 NONE

STATUS                                    1 FREQUENCY

TEMPORARY                                 2 FREQUENCY

GENERATED                                 2 FREQUENCY

SECONDARY                                 1 FREQUENCY

NAMESPACE                                11 FREQUENCY

EDITION_NAME                              1 FREQUENCY

 

15 rows selected.

 

Set autotrace on and run a query to find all java classes owned by public…

 

SQL> set autotrace on

SQL> select * from allobjects_test1 where owner = 'PUBLIC' and object_type = 'JAVA CLASS'

  2  /

 

no rows selected

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 1539201261

 

--------------------------------------------------------------------------------------

| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |                  | 10291 |  1025K|   233   (1)| 00:00:03 |

|*  1 |  TABLE ACCESS FULL| ALLOBJECTS_TEST1 | 10291 |  1025K|   233   (1)| 00:00:03 |

--------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   1 - filter("OBJECT_TYPE"='JAVA CLASS' AND "OWNER"='PUBLIC')

 

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

        832  consistent gets

          0  physical reads

          0  redo size

       1116  bytes sent via SQL*Net to client

        370  bytes received via SQL*Net from client

          1  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          0  rows processed

 

In this example, the CBO is choosing not to use the index TEST_IDX1 (OWNER, OBJECT_TYPE).  If you think about data distribution of the individual columns, this makes sense.  Since approximately 48% of the rows have OWNER=’PUBLIC’ and 40% of the rows have TYPE= ‘JAVA CLASS’, it is logical for the optimizer to prefer a full table scan when considering these columns independently.  To analyze the decisions made by the CBO that led to this execution plan, generate a 10053 trace file for this query which shows the following…

 

***************************************

SINGLE TABLE ACCESS PATH

  Single Table Cardinality Estimation for ALLOBJECTS_TEST1[ALLOBJECTS_TEST1]

  Column (#1):

    NewDensity:0.000092, OldDensity:0.000009 BktCnt:5410, PopBktCnt:5408, PopValCnt:11, NDV:13

  Column (#6):

    NewDensity:0.000092, OldDensity:0.000009 BktCnt:5410, PopBktCnt:5408, PopValCnt:18, NDV:27

  ColGroup (#0, Index) TEST1_IDX1

    Col#: 1 6    CorStregth: -1.00

  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:

  Table: ALLOBJECTS_TEST1  Alias: ALLOBJECTS_TEST1

    Card: Original: 55095.000000  Rounded: 10291  Computed: 10290.84  Non Adjusted: 10290.84

  Access Path: TableScan

    Cost:  233.14  Resp: 233.14  Degree: 0

      Cost_io: 232.00  Cost_cpu: 25496803

      Resp_io: 232.00  Resp_cpu: 25496803

  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:

  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:

  Access Path: index (AllEqRange)

    Index: TEST1_IDX1

    resc_io: 386.00  resc_cpu: 9233056

    ix_sel: 0.186784  ix_sel_with_filters: 0.186784

    Cost: 386.41  Resp: 386.41  Degree: 1

  Best:: AccessPath: TableScan

         Cost: 233.14  Degree: 1  Resp: 233.14  Card: 10290.84  Bytes: 0

***************************************

 

The optimizer trace file shows that the cost for using the TEST1_IDX1 has been calculated as higher than using a full table scan.  The trace file also shows that the statistics for the columns OWNER and OBJECT_TYPE have been included in the cost comparisons.  The index selectivity for TEST1_IDX1 is based on the ratio of distinct values in the index to total rows in the table (10291:55095).  This is a poor value for index selectivity and therefore results in a higher cost than using a full table scan. 

 

However, if the optimizer had statistics on these columns as a group, it would know that the specified predicates have a very low cardinality.  In order to collect these statistics, create a column group using DBMS_STATS.CREATE_EXTENDED_STATS

 

SQL> declare

  2    cg_name varchar2(30);

  3  begin

  4   cg_name := dbms_stats.create_extended_stats(NULL,'ALLOBJECTS_TEST1', '(owner,object_type)');

  5  end;

  6  /

 

PL/SQL procedure successfully completed.

 

Confirm the column group by using:

 

SQL> Select extension_name, extension

  2  from user_stat_extensions

  3  where table_name='ALLOBJECTS_TEST1';

 

EXTENSION_NAME                 EXTENSION

------------------------------ ---------------------------

SYS_STUXJ8K0YTS_5QD1O0PEA514IY ("OWNER","OBJECT_TYPE")

 

In order to allow the optimizer to begin using this new column group, gather the statistics for these columns using DBMS_STATS. 

 

SQL> begin

  2  dbms_stats.gather_table_stats(null,'ALLOBJECTS_TEST1',

  3  method_opt =>'for all columns size skewonly for columns (owner,object_type) skewonly');

  4  end;

  5  /

 

PL/SQL procedure successfully completed.

 

Now re-query the column statistics for the table to show that the newly created column group now appears with statistics and a histogram (also referred to as a hybrid histogram). 

 

SQL> select column_name, num_distinct, histogram from user_tab_col_statistics

  2  where

  3  table_name = 'ALLOBJECTS_TEST1'

  4  /

 

COLUMN_NAME                    NUM_DISTINCT HISTOGRAM

------------------------------ ------------ ---------------

OWNER                                    13 FREQUENCY

OBJECT_NAME                           31096 HEIGHT BALANCED

SUBOBJECT_NAME                            0 NONE

OBJECT_ID                             55095 NONE

DATA_OBJECT_ID                         1181 HEIGHT BALANCED

OBJECT_TYPE                              27 FREQUENCY

CREATED                                2983 HEIGHT BALANCED

LAST_DDL_TIME                          2619 HEIGHT BALANCED

TIMESTAMP                              3020 NONE

STATUS                                    1 FREQUENCY

TEMPORARY                                 2 FREQUENCY

GENERATED                                 2 FREQUENCY

SECONDARY                                 1 FREQUENCY

NAMESPACE                                11 FREQUENCY

EDITION_NAME                              1 FREQUENCY

SYS_STUXJ8K0YTS_5QD1O0PEA514IY           94 HEIGHT BALANCED

 

16 rows selected.

 

 

This new column group represents the combined values of OWNER and OBJECT_TYPE.  This query shows that a height balanced histogram has been created on this column group.  Also, the number of distinct values for the column group is listed as 94.  In other words, there are 94 distinct values when combining both columns in the table.  This can be verified by running…

 

SQL> select count(*)

  2  from

  3  (select distinct owner, object_type from allobjects_test1)

  4  /

 

  COUNT(*)

----------

        94

 

Now, executing the same query as before, the execution plan and I/O is greatly improved due to the additional statistics provided by the new column group.

 

SQL> set autotrace on

SQL> select * from allobjects_test1 where owner = 'PUBLIC' and object_type = 'JAVA CLASS'

  2  /

 

no rows selected

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 1510138874

 

------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |                  |    57 |  6498 |     3   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| ALLOBJECTS_TEST1 |    57 |  6498 |     3   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | TEST1_IDX1       |    57 |       |     1   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("OWNER"='PUBLIC' AND "OBJECT_TYPE"='JAVA CLASS')

 

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

          2  consistent gets

          0  physical reads

          0  redo size

       1116  bytes sent via SQL*Net to client

        370  bytes received via SQL*Net from client

          1  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          0  rows processed

 

 

 

 

Analysis of the new 10053 optimizer trace file shows that…

 

***************************************

SINGLE TABLE ACCESS PATH

  Single Table Cardinality Estimation for ALLOBJECTS_TEST1[ALLOBJECTS_TEST1]

  Column (#1):

    NewDensity:0.000271, OldDensity:0.000009 BktCnt:5537, PopBktCnt:5537, PopValCnt:12, NDV:13

  Column (#6):

    NewDensity:0.000090, OldDensity:0.000009 BktCnt:5537, PopBktCnt:5535, PopValCnt:17, NDV:27

  Column (#16):

    NewDensity:0.001037, OldDensity:0.006993 BktCnt:75, PopBktCnt:68, PopValCnt:4, NDV:94

  ColGroup (#0, VC) SYS_STUXJ8K0YTS_5QD1O0PEA514IY

    Col#: 1 6    CorStregth: -1.00

  ColGroup Usage:: PredCnt: 2  Matches Full:   Using density: 0.001037 of col #16 as selectivity of unpopular value pred

#0  Partial:  Sel: 0.0010

  Table: ALLOBJECTS_TEST1  Alias: ALLOBJECTS_TEST1

    Card: Original: 55095.000000  Rounded: 57  Computed: 57.14  Non Adjusted: 57.14

  Access Path: TableScan

    Cost:  233.06  Resp: 233.06  Degree: 0

      Cost_io: 232.00  Cost_cpu: 23708255

      Resp_io: 232.00  Resp_cpu: 23708255

  ColGroup Usage:: PredCnt: 2  Matches Full:   Using density: 0.001037 of col #16 as selectivity of unpopular value pred

#0  Partial:  Sel: 0.0010

  ColGroup Usage:: PredCnt: 2  Matches Full:   Using density: 0.001037 of col #16 as selectivity of unpopular value pred

#0  Partial:  Sel: 0.0010

  Access Path: index (AllEqRange)

    Index: TEST1_IDX1

    resc_io: 3.00  resc_cpu: 58754

    ix_sel: 0.001037  ix_sel_with_filters: 0.001037

    Cost: 3.00  Resp: 3.00  Degree: 1

  Best:: AccessPath: IndexRange

  Index: TEST1_IDX1

         Cost: 3.00  Degree: 1  Resp: 3.00  Card: 57.14  Bytes: 0

 

***************************************

 

The density is related to the probability of finding particular value in the rows of a table.  When the statistics are collected on the table, column density is calculated and stored in the column statistics.  If a column histogram is available when the optimizer creates estimates for different access paths, it uses the specified predicate values to recalculate density based on the information in the histogram. 

 

Since the predicates used in the query return no rows, the histogram on the column group has revealed that the predicates contain unpopular predicate values and therefore have a low density.  The optimizer dump file above shows that the values in the predicates have been calculated to produce a density of 0.001037 for the column group. 

 

This dump file also shows that the column group is automatically being used to adjust cost calculation for this query.  It does this by using the density of the column group as a new selectivity value for the index.  The selectivity represents a fraction of rows from a row set.  Therefore, index selectivity influences the number of blocks that will need to be read when scanning an index; the lower the selectivity, the fewer blocks that need to be read, resulting in a lower cost.  In the example above, the estimated cost of using the index has been greatly lowered since the index selectivity has been reduced from ix_sel: 0.186784 to ix_sel: 0.001037.  In other words, the old selectivity value of 0.186784 meant that the optimizer estimated 18.6784% of the total rows would be returned.  Likewise, the new selectivity value of 0.001037 means that 0.1037% of the rows will be returned.  This new selectivity value results in a much lower cost estimation for the access path using the index TEST1_IDX1.  Based on these cost estimates, the optimizer chose the plan using the index access path.  The use of column groups in the example above improved the I/O for this query by several orders of magnitude. 

 

 

By supplementing the statistics that are available to the CBO with column group statistics, the optimizer is able to make more intelligent decisions through understanding the correlation between multiple columns.  These column groups are used to determine more accurate selectivity estimates when multiple columns for a single table are specified in the where clause of the query.  Even when the distribution of data in the column group is skewed, the column group histograms allow for accurate cost estimates.  This enhancement addresses one of the primary limitations of the CBO in previous Oracle releases and greatly improves the capability of the CBO to accurately compute the costs and determine the best execution plans. 


About the author

V.J. Jain is an Oracle Database and Applications Consultant and owner of Varun Jain, Inc. - Oracle Consulting. With over 12 years of experience with database systems, he specializes in database performance, development, interfaces, and high performance solutions. Based in Orange County, California, he actively explores Oracle's newest technologies and is a member of the Oracle Partner Network and Beta program.

Read more about Oracle 11g

To read more about Oracle 11g, make sure you get Oracle 11g New Features Guide by John Garmany, Steve Karam, Lutz Hartmann, V. J. Jain, and Brian Carr http://www.oracle-developer.com/oracle_11g_new_features.html (March 2008)

 


 

 

 

Owned and Operated by Varun Jain, Inc, www.varunjaininc.com

Copyright ©2007 Oracle-Developer.com