[clug] Tuning Postgres for large amounts of incoming data
stephen.hocking at gmail.com
Wed Feb 1 02:03:12 UTC 2017
As some of you know, I've created a setup with a few raspberry pis around
the state that record plane positions, this is then logged in a Postgres DB
on a small VM, with about 1.5-2 million records being logged each day.
I've written some code that pulls down the ADS-B Exchange's daily archive
and logs them into my own DB. This is about 120M records per day, being
positions of aircraft all over the world. I'm doing this on a stand-alone
machine, a 6 core AMD box with 32GB of memory. Doing it with a single
threaded steam takes about 2.5 days to log a full day's data, keeping one
core busy, with 60% Postgres & 40% my code Wait on I/O is about 2%.
Splitting the data up into 6 separate streams (one for each core) take
about 14.5 hours to log a day's data. A days' data seems to take up about
30GB in the DB. There's about 20% wait I/O and 54% idle time, which leads
me to suspect I could tune various Postgres parameters.
Has anyone else had experience with this? Googling turns up a bunch of
interesting pages, but some of the tweaks mentioned therein seem more
appropriate for workloads with intensive read queries, which are, as yet,
not on my horizon.
"I and the public know
what all schoolchildren learn
Those to whom evil is done
Do evil in return" W.H. Auden, "September 1, 1939"
More information about the linux