Let's play with PostgreSQL network dumps!
A bit more than a year ago, I start messing with tshark pdml output to be able to extract queries from a network dump. As I was writing a PoC, I did it in PHP, before moving it quickly to a more appropriate language: Perl.
As it was relying on tshark, I called it « pgShark ». pgShark worked and was useful a couple of time, but had some drawbacks:
- really, really slow
- PHP is not the best language to parse and mess with data
- one useless step: pcap -> XML/PDML -> parsing
- I have no fun coding in PHP
- support only a limited part of the PostgreSQL protocol
So I decided to rewrite it from scratch using Perl and the Net::Pcap
module.
Why Perl? First, probably the most frequent reasons people are using Perl are: portable and fast about parsing data. Then, I discovered the useful Net::Pcap module, directly binded to the libpcap. And finally, I wanted to learn a bit more about Perl than just quick scripting and I needed a challenging project :)
At first I wanted a very simple program, dealing with frontend messages only and using a plugin architecture to process them and output useful and various informations. But then, I realized it could go way further, crazy ideas about plugins popped up and I realized that I had to support the whole PostgreSQL’s protocol, not just a small part of it.
The project started in early February 2011 and eat all my open source related personal and professional time until now. I finally managed to add the last missing PostgreSQL messages on Monday 28th march. I guess the pgShark::Core module is now stable enough to focus on plugins. I’ll have to split this project in two pieces at some point and release this core on its own.
One of the most exciting plugin is “Fouine”. Yeah, if you know pgFouine, you understand this plugin’s purpose (and if you don’t, here is the pgfouine project). Having Fouine as a plugin of pgShark has many advantages. One of the most important one is the ability to extract way much more statistics from network. Think about errors, notices, session time, busy ratio, amount of data per sessions, connections statistics, you name it…
Another really important point is that you don’t need to tweak your PostgreSQL
log behaviour. No need to log every query, no need to change the
log_line_header
parameter or using syslog if you don’t want to. You just
leave your PostgreSQL configuration the way you love it. Even better: working
on network dumps means you can snif anywhere in between the frontend and the
backend, having 0% performance penalty on your PostgreSQL box.
Check the existing plugin from the pgshark repository: https://github.com/dalibo/pgshark/tree/master/bin
The plugins are all in alpha development stage and need some more work for accuracy or bug fix, but they already spit some useful datas!
Here is my TODO list for plugins:
- fix SQL plugin in regards with named portals (prepared statements with binded datas)
- work on the TODO list for the Fouine plugin (includes accuracy of statistics)
- make graphical reports (HTML) from the Fouine plugin
- graph various stats from the Fouine plugin
I have another really useful plugin idea, but really challenging. Hopefully I’ll be able to start it and blog about it soon! However, I suspect I’ll have to take some of my time on phpPgAdmin very soon, as I was pretty much idle on it for a while now and stuff are staking on this TODO list as well.
Don’t forget the pgShark’s home: https://github.com/dalibo/pgshark. Stay tuned!
Comments Leave a comment by email or Disqus
Note: I only publish your name/pseudo, mail subject and content. I will NOT publish your email address.
Have you seen OmniTI’s “pgsniff”? It uses Perl + pcap to sniff the network of active db connections. I’ve played with it a couple of times recently and it looks pretty interesting.
https://labs.omniti.com/labs/pgtreats/wiki/pgsniff
@bma: yes, I heard about this project, but unfortunately after I started mine…
It look like an interesting project as well. However, as far as I can see, it doesn’t go as far as I want in its output.
However, I probably have some things to learn from this project about code and Perl-isms :)
I’ve played around with your tool and it works really well. One option that pgsniff has is that is allows you to specify a format where it prepends the log_line_prefix to the output, which is useful when you are trying to get timestamps or to feed the output file into pgfouine itself.
Nice work on this!
@bma: Thank you !
This is a nice idea to add a line prefix option. I’ll put this on the TODO list.
@bma: Check out the git repository and README, I just added the
--line-prefix
option to the SQL plugin.