More work and thoughts on index bloat estimation query
A few weeks ago, I published a query to estimate index bloat. Since then, I went back on this query a few times to fix some drawbacks:
- making it compatible from PostgreSQL 7.4 to latest releases
- restrict to B-tree index only
- remove psql variables (sorry for code readability and documentation)
- improve 64 vs. 32 bits detection
This last one is actually far from perfect. Very bad estimation could arise if the query is wrong about this size of pointers.
New version
Here is the gist of the new version of this query if you want to comment/patch: https://gist.github.com/ioguix/c29d5790b8b93bf81c27
And the code itself:
-- WARNING: executed with a non-superuser role, the query inspect only index on tables you are granted to read.
SELECT current_database(), nspname AS schemaname, c.relname AS tablename, indexname, bs*(sub.relpages)::bigint AS real_size,
bs*otta::bigint as estimated_size,
bs*(sub.relpages-otta)::bigint AS bloat_size,
bs*(sub.relpages-otta)::bigint * 100 / (bs*(sub.relpages)::bigint) AS bloat_ratio
-- , index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, datawidth, sub.reltuples, sub.relpages -- (DEBUG INFO)
FROM (
SELECT bs, nspname, table_oid, indexname, relpages, coalesce(
ceil((reltuples*(4+nulldatahdrwidth))/(bs-pagehdr::float)) + 1, 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
) AS otta
-- , index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, datawidth, reltuples -- (DEBUG INFO)
FROM (
SELECT maxalign, bs, nspname, relname AS indexname, reltuples, relpages, relam, table_oid,
( index_tuple_hdr_bm +
maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
ELSE index_tuple_hdr_bm%maxalign
END
+ nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
WHEN nulldatawidth = 0 THEN 0
WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
ELSE nulldatawidth::integer%maxalign
END
)::numeric AS nulldatahdrwidth, pagehdr
-- , index_tuple_hdr_bm, nulldatawidth, datawidth -- (DEBUG INFO)
FROM (
SELECT
i.nspname, i.relname, i.reltuples, i.relpages, i.relam, a.attrelid AS table_oid,
CASE cluster_version.v > 7
WHEN true THEN current_setting('block_size')::numeric
ELSE 8192::numeric
END AS bs,
CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
ELSE 4
END AS maxalign,
/* per page header, fixed size: 20 for 7.X, 24 for others */
CASE WHEN cluster_version.v > 7
THEN 24
ELSE 20
END AS pagehdr,
/* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
CASE WHEN max(coalesce(s.null_frac,0)) = 0
THEN 2 -- IndexTupleData size
ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
END AS index_tuple_hdr_bm,
/* data len: we remove null values save space using it fractionnal part from stats */
sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024) ) AS nulldatawidth
-- , sum( s.stawidth ) AS datawidth -- (DEBUG INFO)
FROM pg_attribute AS a
JOIN pg_stats AS s ON (quote_ident(s.schemaname) || '.' || quote_ident(s.tablename))::regclass=a.attrelid AND s.attname = a.attname
JOIN (
SELECT nspname, relname, reltuples, relpages, indrelid, relam,
string_to_array(pg_catalog.textin(pg_catalog.int2vectorout(indkey)), ' ')::smallint[] AS attnum
FROM pg_index
JOIN pg_class ON pg_class.oid=pg_index.indexrelid
JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
) AS i ON i.indrelid = a.attrelid AND a.attnum = ANY (i.attnum),
( SELECT substring(current_setting('server_version') FROM '#"[0-9]+#"%' FOR '#')::integer ) AS cluster_version(v)
WHERE a.attnum > 0
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, cluster_version.v
) AS s1
) AS s2
JOIN pg_am am ON s2.relam = am.oid WHERE am.amname = 'btree'
) as sub
JOIN pg_class c ON c.oid=sub.table_oid
WHERE sub.relpages > 2
ORDER BY 2,3,4;
I left commented some debug code to help diagnosing bad estimations.
**Update** Following the comment of Michael Banck, I updated the query and added a warning on top of the query. The query does not require to be superuser anymore, but you have to make sure the role you are using is able to access all your precious tables and indexes!
Known bug
While testing the query, I found a weird bug where negative bloats show up on some system indexes. As instance:
postgres@pagila=# \i btree_bloat.sql
...
-[ RECORD 4 ]----+----------------------------------------------------
current_database | pagila
schemaname | pg_catalog
tablename | pg_attribute
indexname | pg_attribute_relid_attnam_index
real_size | 122880
estimated_size | 262144
bloat_size | -139264
bloat_ratio | -113.3333333333333333
...
After hunting for some time on this, I found that this was related to the
name
type. This type has a fixed size of 64 bytes in stats, but
they become a simple cstring in index with a variable length depending on real
string values!
postgres@pagila=# \d pg_attribute_relid_attnam_index
Index "pg_catalog.pg_attribute_relid_attnam_index"
Column | Type | Definition
----------+---------+------------
attrelid | oid | attrelid
attname | cstring | attname
unique, btree, for table "pg_catalog.pg_attribute"
postgres@pagila=# \d pg_catalog.pg_attribute
Table "pg_catalog.pg_attribute"
Column | Type | Modifiers
---------------+-----------+-----------
attrelid | oid | not null
attname | name | not null
[...]
Indexes:
"pg_attribute_relid_attnam_index" UNIQUE, btree (attrelid, attname)
"pg_attribute_relid_attnum_index" UNIQUE, btree (attrelid, attnum)
postgres@pagila=# SELECT pg_column_size(attname), avg(length(attname)) FROM pg_catalog.pg_attribute GROUP BY 1;
pg_column_size | avg
----------------+--------------------
64 | 9.5131713992473486
As the query rely on theses stats to compute the estimated size of the index depending on indexed fields and number of lines in the table, this difference with the real value size in indexes make the stats completely wrong.
I’m not sure how I should handle this. Maybe by defining some incompatible
types with this query? Moreover, I am curious about why name
is a
fixed length type…
As usual, any feedback, help or answers on these two last posts is appreciated :)
Comments Leave a comment by email or Disqus
Note: I only publish your name/pseudo, mail subject and content. I will NOT publish your email address.
Interesting query, but there are several things that should be updated.
See this index bloat query: https://gist.github.com/gullevek/32881d6b4c5b1ed0135c (original here: https://gist.github.com/mbanck/9976015)
Hi Clemens,
The main purpose of this query is supervision, so I need raw values, but adding some more fields with human values is clearly useful.
I decided to keep all the objects found in the database, even system ones. They can be removed easily by filtering out on the schemaname column if needed
Agree with that, easy to add.
I know this query, it has been written by Josh Berkus based on my previous post. As I commented in his blog, I don’t want to break compatibility with older release of PostgreSQL by using CTEs, just for human readability. I need it to monitor old releases (sadly, like 8.2, but even a 7.4!).
I’ll update the query in my gist and this blog post with your comments.
Cheers,
Re: name being a fixed length type
See the bottom of this page: http://www.postgresql.org/docs/9.3/static/datatype-character.html
I think ‘name’ being fixed length is a tradeoff of space vs. time in the system catalogs. It gives predictable row layouts in the critical catalogs, for initialization, recovery, etc.
Hi Joe,
Thank you for this answer and doc URL! I would have save a bit of time by reading this part of the doc first ;)
Cheers,
Your query (still) requires superuser privileges due to using pg_statistic. This is not really great for monitoring queries, and can be easily avoided, as shown here (based on Josh Berkus’ version of your earlier query): https://gist.github.com/mbanck/9976015/revisions.
Hi Michael,
That’s true. However, note that using @pg_stats@, you must make sure you are using a supervision role that is always granted to read all tables. If supervision role can not read some of the tables, the query just ignore them.
I updated my query (here and on github) based on your diff and just added a comment about that. See: https://gist.github.com/ioguix/c29d5790b8b93bf81c27/revisions
Thanks,