Plotting a quick histogram in gnuplot using the raw data

From Docswiki
Jump to navigation Jump to search

You can use the below script to plot a quick and dirty histogram of your data. There are three methods here, only method 1 is uncommented. The only variables you need to set are the bin_width and the file containing the raw data in a single column - output.dat in the script below. Copy the below script into a file histo.plt in the same directory as your data, and then launch gnuplot, and run:

load 'histo.plt'

Here is the script itself:

# Making histograms within gnuplot - 
# cunning trick using gnuplot's "smooth frequency"
#
# method 1:
#
# to make a histogram with vertical axis equal to the count
# in a bin:
#
bin_width = 1.0; ## edit this 
bin_number(x) = floor(x/bin_width)
rounded(x) = bin_width * ( bin_number(x) + 0.5 )
UNITY = 1
## column number of data to be histogrammed is here assumed to be 1
## - change $1 to another column if desired
plot 'output.dat' u (rounded($1)):(UNITY) t 'data' smooth frequency w histeps

pause -1 'ready'
#
# method 2:
#
# to make a histogram with *area* of a bin equal to the count
# in a bin, so the area under the curve is the number of data:
#
#bin_width = 0.3
#bin_number(x) = floor(x/bin_width)
#rounded(x) = bin_width * ( bin_number(x) + 0.5 )
#UNITY = 1
#plot 'output.dat' u (rounded($1)):(UNITY/bin_width) t 'data' smooth frequency w histeps

#pause -1 'ready'
#
# method 2, second example, which illustrates why you might sometimes
# prefer method 2 to method 1 (namely, to make it possible to
# superpose two histograms of the same data):
#
#bin_width = 0.3
#bin_number(x) = floor(x/bin_width)
#rounded(x) = bin_width * ( bin_number(x) + 0.5 )
#bin_width2 = 1.0
#bin_number2(x) = floor(x/bin_width2)
#rounded2(x) = bin_width2 * ( bin_number2(x) + 0.5 )
#UNITY = 1
#plot 'output.dat' u (rounded($1)):(UNITY/bin_width) t 'width 0.3' smooth frequency w histeps,\
#  'dat' u (rounded2($1)):(UNITY/bin_width2) t 'width 1' smooth frequency w histeps 3

WARNING: this method is limited - you can potentially fill the memory of the machine you're working on very quickly, so make sure you keep and eye on top while running it. I have tested it with 10000000 data points, and it was fine on clust. Also it does take a bit of time to process and bin the data, but not too long :)