I left commented some debug code to help diagnosing bad estimations.
**Update** Following the comment of Michael Banck, I
updated the query and added a warning on top of the query. The query does not
require to be superuser anymore, but you have to make sure the role you are
using is able to access all your precious tables and indexes!
Known bug
While testing the query, I found a weird bug where negative bloats show up on
some system indexes. As instance:
After hunting for some time on this, I found that this was related to the
name type. This type has a fixed size of 64 bytes in stats, but
they become a simple cstring in index with a variable length depending on real
string values!
As the query rely on theses stats to compute the estimated size of the index
depending on indexed fields and number of lines in the table, this difference
with the real value size in indexes make the stats completely wrong.
I’m not sure how I should handle this. Maybe by defining some incompatible
types with this query? Moreover, I am curious about why name is a
fixed length type…
As usual, any feedback, help or answers on these two last posts is appreciated :)
the bytes output should be converted into human readable format (mb,gb,etc)
The main purpose of this query is supervision, so I need raw values, but adding some more fields with human values is clearly useful.
internal postgresql tables shouldn’t be shown
I decided to keep all the objects found in the database, even system ones. They can be removed easily by filtering out on the schemaname column if needed
I know this query, it has been written by Josh Berkus based on my previous post. As I commented in his blog, I don’t want to break compatibility with older release of PostgreSQL by using CTEs, just for human readability. I need it to monitor old releases (sadly, like 8.2, but even a 7.4!).
I’ll update the query in my gist and this blog post with your comments.
I think ‘name’ being fixed length is a tradeoff of space vs. time in the system catalogs. It gives predictable row layouts in the critical catalogs, for initialization, recovery, etc.
Your query (still) requires superuser privileges due to using pg_statistic. This is not really great for monitoring queries, and can be easily avoided, as shown here (based on Josh Berkus’ version of your earlier query): https://gist.github.com/mbanck/9976015/revisions.
That’s true. However, note that using @pg_stats@, you must make sure you are using a supervision role that is always granted to read all tables. If supervision role can not read some of the tables, the query just ignore them.
Comments Leave a comment by email or Disqus
Note: I only publish your name/pseudo, mail subject and content. I will NOT publish your email address.
Interesting query, but there are several things that should be updated.
See this index bloat query: https://gist.github.com/gullevek/32881d6b4c5b1ed0135c (original here: https://gist.github.com/mbanck/9976015)
Hi Clemens,
The main purpose of this query is supervision, so I need raw values, but adding some more fields with human values is clearly useful.
I decided to keep all the objects found in the database, even system ones. They can be removed easily by filtering out on the schemaname column if needed
Agree with that, easy to add.
I know this query, it has been written by Josh Berkus based on my previous post. As I commented in his blog, I don’t want to break compatibility with older release of PostgreSQL by using CTEs, just for human readability. I need it to monitor old releases (sadly, like 8.2, but even a 7.4!).
I’ll update the query in my gist and this blog post with your comments.
Cheers,
Re: name being a fixed length type
See the bottom of this page: http://www.postgresql.org/docs/9.3/static/datatype-character.html
I think ‘name’ being fixed length is a tradeoff of space vs. time in the system catalogs. It gives predictable row layouts in the critical catalogs, for initialization, recovery, etc.
Hi Joe,
Thank you for this answer and doc URL! I would have save a bit of time by reading this part of the doc first ;)
Cheers,
Your query (still) requires superuser privileges due to using pg_statistic. This is not really great for monitoring queries, and can be easily avoided, as shown here (based on Josh Berkus’ version of your earlier query): https://gist.github.com/mbanck/9976015/revisions.
Hi Michael,
That’s true. However, note that using @pg_stats@, you must make sure you are using a supervision role that is always granted to read all tables. If supervision role can not read some of the tables, the query just ignore them.
I updated my query (here and on github) based on your diff and just added a comment about that. See: https://gist.github.com/ioguix/c29d5790b8b93bf81c27/revisions
Thanks,