ioguix' README

24 June 2014 6 comments

More work and thoughts on index bloat estimation query

A few weeks ago, I published a query to estimate index bloat. Since then, I went back on this query a few times to fix some drawbacks:

This last one is actually far from perfect. Very bad estimation could arise if the query is wrong about this size of pointers.

New version

Here is the gist of the new version of this query if you want to comment/patch:

And the code itself:

-- WARNING: executed with a non-superuser role, the query inspect only index on tables you are granted to read.
SELECT current_database(), nspname AS schemaname, c.relname AS tablename, indexname, bs*(sub.relpages)::bigint AS real_size,
  bs*otta::bigint as estimated_size,
  bs*(sub.relpages-otta)::bigint                                     AS bloat_size,
  bs*(sub.relpages-otta)::bigint * 100 / (bs*(sub.relpages)::bigint) AS bloat_ratio
  -- , index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, datawidth, sub.reltuples, sub.relpages -- (DEBUG INFO)
  SELECT bs, nspname, table_oid, indexname, relpages, coalesce(
      ceil((reltuples*(4+nulldatahdrwidth))/(bs-pagehdr::float)) + 1, 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
    ) AS otta
    -- , index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, datawidth, reltuples -- (DEBUG INFO)
  FROM (
    SELECT maxalign, bs, nspname, relname AS indexname, reltuples, relpages, relam, table_oid,
      ( index_tuple_hdr_bm +
          maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
            WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
            ELSE index_tuple_hdr_bm%maxalign
        + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
            WHEN nulldatawidth = 0 THEN 0
            WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
            ELSE nulldatawidth::integer%maxalign
      )::numeric AS nulldatahdrwidth, pagehdr
      -- , index_tuple_hdr_bm, nulldatawidth, datawidth -- (DEBUG INFO)
    FROM (
        i.nspname, i.relname, i.reltuples, i.relpages, i.relam, a.attrelid AS table_oid,
        CASE cluster_version.v > 7
            WHEN true THEN current_setting('block_size')::numeric
            ELSE 8192::numeric
        END AS bs,
        CASE  -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
          WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
          ELSE 4
        END AS maxalign,
        /* per page header, fixed size: 20 for 7.X, 24 for others */
        CASE WHEN cluster_version.v > 7
          THEN 24
          ELSE 20
        END AS pagehdr,
        /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
        CASE WHEN max(coalesce(s.null_frac,0)) = 0
          THEN 2 -- IndexTupleData size
          ELSE  2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
        END AS index_tuple_hdr_bm,
        /* data len: we remove null values save space using it fractionnal part from stats */
        sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024) ) AS nulldatawidth
        -- , sum( s.stawidth ) AS datawidth -- (DEBUG INFO)
      FROM pg_attribute AS a
        JOIN pg_stats AS s ON (quote_ident(s.schemaname) || '.' || quote_ident(s.tablename))::regclass=a.attrelid AND s.attname = a.attname
        JOIN (
          SELECT nspname, relname, reltuples, relpages, indrelid, relam,
            string_to_array(pg_catalog.textin(pg_catalog.int2vectorout(indkey)), ' ')::smallint[] AS attnum
          FROM pg_index
            JOIN pg_class ON pg_class.oid=pg_index.indexrelid
            JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
        ) AS i ON i.indrelid = a.attrelid AND a.attnum = ANY (i.attnum),
        ( SELECT substring(current_setting('server_version') FROM '#"[0-9]+#"%' FOR '#')::integer ) AS cluster_version(v)
      WHERE a.attnum > 0
      GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, cluster_version.v
    ) AS s1
  ) AS s2
    JOIN pg_am am ON s2.relam = am.oid WHERE am.amname = 'btree'
) as sub
JOIN pg_class c ON c.oid=sub.table_oid
WHERE sub.relpages > 2
ORDER BY 2,3,4;

I left commented some debug code to help diagnosing bad estimations.

**Update** Following the comment of Michael Banck, I updated the query and added a warning on top of the query. The query does not require to be superuser anymore, but you have to make sure the role you are using is able to access all your precious tables and indexes!

Known bug

While testing the query, I found a weird bug where negative bloats show up on some system indexes. As instance:

postgres@pagila=# \i btree_bloat.sql
-[ RECORD 4 ]----+----------------------------------------------------
current_database | pagila
schemaname       | pg_catalog
tablename        | pg_attribute
indexname        | pg_attribute_relid_attnam_index
real_size        | 122880
estimated_size   | 262144
bloat_size       | -139264
bloat_ratio      | -113.3333333333333333

After hunting for some time on this, I found that this was related to the name type. This type has a fixed size of 64 bytes in stats, but they become a simple cstring in index with a variable length depending on real string values!

postgres@pagila=# \d pg_attribute_relid_attnam_index
Index "pg_catalog.pg_attribute_relid_attnam_index"
  Column  |  Type   | Definition 
 attrelid | oid     | attrelid
 attname  | cstring | attname
unique, btree, for table "pg_catalog.pg_attribute"

postgres@pagila=# \d pg_catalog.pg_attribute
    Table "pg_catalog.pg_attribute"
    Column     |   Type    | Modifiers 
 attrelid      | oid       | not null
 attname       | name      | not null
    "pg_attribute_relid_attnam_index" UNIQUE, btree (attrelid, attname)
    "pg_attribute_relid_attnum_index" UNIQUE, btree (attrelid, attnum)

postgres@pagila=# SELECT pg_column_size(attname), avg(length(attname)) FROM pg_catalog.pg_attribute GROUP BY 1;
 pg_column_size |        avg         
             64 | 9.5131713992473486

As the query rely on theses stats to compute the estimated size of the index depending on indexed fields and number of lines in the table, this difference with the real value size in indexes make the stats completely wrong.

I’m not sure how I should handle this. Maybe by defining some incompatible types with this query? Moreover, I am curious about why name is a fixed length type…

As usual, any feedback, help or answers on these two last posts is appreciated :)

Comments Leave a comment by email or Disqus

Note: I only publish your name/pseudo, mail subject and content. I will NOT publish your email address.

Clemens Schwaighofer reply
Wed, 25 Jun 2014 04:52:08 +0200

Interesting query, but there are several things that should be updated.

  • the bytes output should be converted into human readable format (mb,gb,etc)
  • internal postgresql tables shouldn’t be shown
  • it should be sorted by most bloat first

See this index bloat query: (original here:

Jehan-Guillaume (ioguix) de Rorthais reply
Wed, 25 Jun 2014 18:35:00 +0200

Hi Clemens,

the bytes output should be converted into human readable format (mb,gb,etc)

The main purpose of this query is supervision, so I need raw values, but adding some more fields with human values is clearly useful.

internal postgresql tables shouldn’t be shown

I decided to keep all the objects found in the database, even system ones. They can be removed easily by filtering out on the schemaname column if needed

it should be sorted by most bloat first

Agree with that, easy to add.

See this index bloat query: (original here:

I know this query, it has been written by Josh Berkus based on my previous post. As I commented in his blog, I don’t want to break compatibility with older release of PostgreSQL by using CTEs, just for human readability. I need it to monitor old releases (sadly, like 8.2, but even a 7.4!).

I’ll update the query in my gist and this blog post with your comments.


Joe Abbate reply
Wed, 25 Jun 2014 14:04:55 +0200

Re: name being a fixed length type

See the bottom of this page:

I think ‘name’ being fixed length is a tradeoff of space vs. time in the system catalogs. It gives predictable row layouts in the critical catalogs, for initialization, recovery, etc.

Jehan-Guillaume (ioguix) de Rorthais reply
Wed, 25 Jun 2014 18:46:00 +0200

Hi Joe,

Thank you for this answer and doc URL! I would have save a bit of time by reading this part of the doc first ;)


Michael Banck reply
Tue, 01 Jul 2014 21:34:03 +0200

Your query (still) requires superuser privileges due to using pg_statistic. This is not really great for monitoring queries, and can be easily avoided, as shown here (based on Josh Berkus’ version of your earlier query):

Jehan-Guillaume (ioguix) de Rorthais reply
Wed, 02 Jul 2014 13:08:00 +0200

Hi Michael,

That’s true. However, note that using @pg_stats@, you must make sure you are using a supervision role that is always granted to read all tables. If supervision role can not read some of the tables, the query just ignore them.

I updated my query (here and on github) based on your diff and just added a comment about that. See: