Fityk 0.8.8 - User's Manual


Table of Contents

1. Introduction
What is the program for?
How to read this manual
GUI vs CLI
2. Getting started
The minimal example
Invoking fityk
Graphical interface
Plots and other windows
Mouse usage
3. Reference
General syntax
Data from experiment
Loading data
Active and inactive points
Standard deviation (or weight)
Data transformations
Functions and variables in data transformation
Working with multiple datasets
Exporting data
Model
Model - Introduction
Variables
Function types and functions
User-defined functions (UDF)
Speed of computations
Model, F and Z
Guessing peak location
Displaying information
Fitting
Nonlinear optimization
Fitting related commands
Levenberg-Marquardt
Nelder-Mead downhill simplex method
Genetic Algorithms
Settings
Other commands
plot: viewing data
info: show information
commands, dump, sleep, reset, quit, !
4. Using and extending
Use cases
Extensions
How to add your own built-in function
A. List of functions
B. Command shortenings
C. License
D. About this manual
Bibliography

List of Equations

A.1. Gaussian
A.2. SplitGaussian
A.3. GaussianA
A.4. Lorentzian
A.5. LorentzianA
A.6. Pearson VII (Pearson7)
A.7. Split-Pearson-VII (SplitPearson7)
A.8. Pearson-VII-Area (Pearson7A)
A.9. Pseudo-Voigt (PseudoVoigt)
A.10. Pseudo-Voigt-Area (PseudoVoigtA)
A.11. Voigt
A.12. VoigtA
A.13. Exponentially Modified Gaussian (EMG)
A.14. Doniach-Sunjic (DoniachSunjic)
A.15. Polynomial5

Chapter 1. Introduction

What is the program for?

Fityk is a program for nonlinear fitting of analytical functions (especially peak-shaped) to data (usually experimental data). The most concise description: peak fitting software. There are also people using it to remove the baseline from data, or to display data only.

It is reportedly used in crystallography, chromatography, photoluminescence and photoelectron spectroscopy, infrared and Raman spectroscopy, to name but a few. Although the author has a general understanding only of experimental methods other than powder diffraction, he would like to make it useful to as many people as possible.

Fityk offers various nonlinear fitting methods, simple background subtraction and other manipulations to the dataset, easy placement of peaks and changing of peak parameters, support for analysis of series of datasets, automation of common tasks with scripts, and much more. The main advantage of the program is flexibility - parameters of peaks can be arbitrarily bound to each other, e.g. the width of a peak can be an independent variable, the same as the width of another peak, or can be given by complex (and general for all peaks) formula.

Fityk is free software; you can redistribute and modify it under the terms of the GPL, version 2 or (at your option) any later version. See Appendix C, License for details. You can download the latest version of fityk from http://www.unipress.waw.pl/fityk. or http://fityk.sf.net. To contact the author, visit the same page.

How to read this manual

After this introduction, you may read the Chapter 2, Getting started. If you are using the GUI version you can look at the screenshots-based tutorial (in preparation) and postpone reading Chapter 3, Reference until you need to write a script, put constraints on variables, add user-defined function or understand better how the program works.

In case you are not familiar with the term weighted sum of squared residuals or you are not sure how it is weighted, have a look at the section called “Nonlinear optimization ”. Remember that you must set correctly standard deviations of y's of points, otherwise you will get wrong results.

GUI vs CLI

The program comes in two versions: the GUI (Graphical User Interface) version - more comfortable for most users, and the CLI (Command Line Interface) version (named cfityk to differentiate, Unix only).

If the CLI version was compiled with the GNU Readline Library, command line editing and command history as per bash will be available. Especially useful is TAB-expanding. Data and curves fitted to data are visualized with gnuplot (if it is installed).

The GUI version is written using the wxWidgets library and can be run on Unix species with GTK+ and on MS Windows. There are also people using it on MacOS X (have a look at the fityk-users mailing list archives for details).

Chapter 2. Getting started

The minimal example

Let us analyze a diffraction pattern of NaCl. Our goal is to determine the position of the center of the highest peak. It is needed for calculating the pressure under which the sample was measured, but this later detail in the processing is irrelevent for the time being.

The data file used in this example is distributed with the program and can be found in the samples directory.

First load data from file nacl01.dat. You can do this by typing @0 < nacl01.dat in the CLI version (or in the GUI version in the input box - at the bottom, just above the status bar). In the GUI, you can also select DataLoad File from the menu and choose the appropriate file.

If you use the GUI, you can zoom-in to the biggest peak using left mouse button on the auxiliary plot (the plot below the main plot). To zoom out, press the View whole toolbar button. Other ways of zooming are described in the section called “Mouse usage”. If you want the data to be drawn with larger points or a line, or if you want to change the color of the line or background, press right mouse button on the main plot and use Data point size or Color menu from the pop-up menu. To change the color of data points, use the right-hand panel.

Now all data points are active. Because only the biggest peak is of interest for the sake of this example, the remaining points can be deactivated. Type: a = (23.0 < x < 26.0) or change to range mode (press Data-Range Mode button on toolbar) and select range to be deactivated with right mouse button.

As our example data has no background to worry about, our next step is to define a peak with reasonable initial values and fit it to the data. We will use Gaussian. To see its formula, type: info Gaussian or look for it in the documentation (in Appendix A, List of functions). Incidentally, most of the commands can be abbreviated, e.g. you can type: i Gaussian.

To define peak, type: %p = Gaussian(~60000, ~24.6, ~0.2); F = %p or %p = guess Gaussian or select Gaussian from the list of functions on the toolbar and press the auto-add toolbar button. There are also other ways to add peak in GUI such as add-peak mode. These mouse-driven ways give function a name like %_1, %_2, etc.

Now let us fit the function. Type: fit or select FitRun from the menu (or press the toolbar button).

When fitting, the weighted sum of squared residuals (see the section called “Nonlinear optimization ”) is being minimized.

Note

The default weights of points are not equal.

To see the peak parameters, type: info+ %p or (in the GUI) move the cursor to the top of the peak and try out the context menu (right button), or use the right-hand panel.

That's it! To do the same a second time (for example to a similar data set) you can write all the commands to file (you can do it now using command commands > filename), and use it as script: commands < nacl01.fit or select SessionExecute script from menu, or run program with the name of the script: bash$ fityk nacl01.fit

Invoking fityk

On startup, the program executes a script from the $HOME/.fityk/init file (on MS Windows XP: C:\Documents and Settings\USERNAME\.fityk\init). Following this, the program executes command passed with --cmd option, if given, and processes command line arguments:

  • if the argument starts with "=->", string following =-> is regarded as a command and executed (otherwise, it is regarded as a filename).

  • if the filename has extension ".fit" or the file begins with a "# Fityk" string, it is assumed to be a script and is executed.

  • otherwise, it is assumed to be a data file and is loaded. It is possible to specify columns in data file in this way: file.xy:1:4::. Multiple y columns can be specified (file.xy:1:3,4,5:: or file.xy:1:3..5::) - it will load each y column as a separate dataset, with the same values of x.

There are also other parameters to the CLI and GUI versions of the program. Option "-h" (on MS Windows "/h") gives the full listing.

     wojdyr@ubu:~/fityk/src$ ./fityk -h
     Usage: fityk [-h] [-V] [-c <str>] [-I] [-r] [script or data file...]
      -h, --help            show this help message
      -V, --version         output version information and exit
      -c, --cmd=<str>       script passed in as string
      -g, --config=<str>    choose GUI configuration
      -I, --no-init         don't process $HOME/.fityk/init file
      -r, --reorder         reorder data (50.xy before 100.xy)
    

The example of non-interactive using CLI version on Linux:

    wojdyr@ubu:~/foo$ cfityk -h
    Usage: cfityk [-h] [-V] [-c <str>] [script or data file...]
      -h, --help            show this help message
      -V, --version         output version information and exit
      -c, --cmd=<str>       script passed in as string
      -I, --no-init         don't process $HOME/.fityk/init file
      -q, --quit            don't enter interactive shell
    wojdyr@ubu:~/foo$ ls *.rdf
    dat_a.rdf  dat_r.rdf  out.rdf
    wojdyr@ubu:~/foo$ cfityk -q -I "=-> set verbosity=quiet, autoplot=never" \
    > *.rdf "=-> i+ min(x if y > 0) in @*"
    in @0 dat_a: 1.8875
    in @1 dat_r: 1.5105
    in @2 out: 1.8305
    

Graphical interface

Plots and other windows

The GUI window of fityk consists of (from the top): menu bar, toolbar, main plot, auxiliary plot, output window, input field, status bar and of sidebar at right-hand side. The input field allows you to type and execute commands in a similar way as is done in the CLI version. The output window (which is configurable through a pop-up menu) shows the results. Incidentally, all GUI commands are converted into text and are visible in the output window, providing a simple way to learn the syntax.

The main plot can display data points, model that is to be fitted to the data and component functions of the model. Use the pop-up menu (click right button on the plot) to configure it. Some properties of the plot (e.g. colors of data points) can be changed using the sidebar.

One of the most useful things which can be displayed by the auxiliary plot is the difference between the data and the model (also controlled by a pop-up menu). Hopefully, a quick look at this menu and a minute or two's worth of experiments will show the potential of this auxiliary plot.

Configuration of the GUI (visible windows, colors, etc.) can be saved using GUISave current config. Two different configurations can be saved, which allows easy changing of colors for printing. On Unix platforms, these configurations are stored in a file in the user's home directory. On Windows - they are stored in the registry (perhaps in the future they will also be stored in a file).

Mouse usage

The usage of the mouse on menu, dialog windows, input field and output window is (hopefully) intuitive, so the only remaining topic to be discussed here is how to effectively use the mouse on plots.

Let us start with the auxiliary plot. The right button displays a pop-up menu with a range of options, while the left allows you to select the range to be displayed on the x-axis. Clicking with the middle button (or with left button and Shift pressed simultaneously) will zoom out to display all data.

On the main plot, the meaning of the left and right mouse button depends on current mode (selected using either the toolbar or menu). There are hints on the status bar. In normal mode, the left button is used for zooming and the right invokes the pop-up menu. The same behaviour can be obtained in any mode by pressing Ctrl (or Alt.). The middle button can be used to select a rectangle that you want to zoom in to. If an operation has two steps, such as rectangle zooming (i.e. first you press a button to select the first corner, then move the mouse and release the button to select the second corner of the rectangle), this can be cancelled by pressing another button when the first one is pressed.

Chapter 3. Reference

General syntax

Basically, there is one command per line. If for some reason it is more comfortable to place more than one command on one line, they can be separated with a semicolon (;).

Most of the commands can have arguments separated by a comma (,), e.g. delete %a, %b, %c.

Most of the commands can be shortened: e.g. you can type inf or in or i instead of info. See Appendix B, Command shortenings for details.

The symbol '#' starts a comment - everything from the hash (#) to the end of the line is ignored.

Data from experiment

Loading data

The basic file format is ascii text file with every line corresponding to one data point. If there are more than two columns of numbers, it can be specified which columns corresponds to x and y, and, optionally, also sigma. Numbers in line can be separated by whitespace, commas or semicolons. Lines that can't be read as numbers are ignored.

The modified version of xylib library is used to read data from file. New formats can be easily added.

Points are loaded from files using the command

dataslot < filename [:xcol:ycol:scol:block ] [filetype options...]

where dataslot should be replaced with @0, unless many datasets are to be used simultaneously (for details see: the section called “Working with multiple datasets”), filetype and options usually can be omitted (in most of the cases the filetype can be detected automatically, all supported filetypes are listed at the end of this section), xcol, ycol, scol (supported only in text file) are columns corresponding to x, y and std. dev. of y. A column number of 0 generates a number increasing (from zero) with each point. block is supported by formats with multiple blocks of data.

If the filename contains blank characters, a semicolon or comma, it should be put inside single quotation marks (together with colon-separated indices, if any).

Multiple y columns and/or blocks can be specified, see the examples below.

      @0 < foo.vms
      @0 < foo.fii text first-line-header
      @0 < foo.dat:1:4:: # x,y - 1st and 4th columns
      @0 < foo.dat:1:3,4:: # load two dataset (with y in columns 3,4)
      @0 < foo.dat:1:3..5:: # load three dataset (with y in columns 3,4,5)
      @0 < foo.dat:1:4..6,2:: # load four dataset (y: 4,5,6,2)
      @0 < foo.dat:1:2..:: # load 2nd and all the next columns as y
      @0 < foo.dat:1:2:3: # read std. dev. of y from 3rd column
      @0 < foo.dat:0:1:: # x - 0,1,2,..., y - first column
      @0 < foo.raw::::0,1 # load two first blocks of data (as one dataset)
     

Supported filetypes

text

ASCII format. If option first-line-header is given, the first line is read as title.

dbws

format used by DBWS (program for Rietveld analysis) and DMPLOT.

cpi

Sietronics Sieray CPI format

uxd

Siemens/Bruker UXD format (powder diffraction data)

bruker_raw

Simens-Bruker RAW format (version 1,2,3)

canberra_mca

Spectral data stored by Canberra MCA systems

rigaku_dat

Rigaku dat format (powder diffraction data)

vamas

VAMAS ISO-14976 (only experiment modes: "SEM" or "MAPSV" or "MAPSVDP" and only "REGULAR" scan mode are supported)

philips_udf

Philips UDF (powder diffraction data)

philips_rd

Philips RD raw scan format V3 (powder diffraction data)

spe

Princeton Instruments WinSpec SPE format (only 1-D data is supported)

pdcif

CIF for powder diffraction

...

what else would you like to have here?

Information about loaded data can be obtained with: info data in dataslot

Active and inactive points

We often have the situation that only a part of the data from a file is of interest. We should be able to exclude selected points from fitting and all computations. Every point can be either active or inactive. This can be done with the command A=... (see the section called “Data transformations” for details) or with a mouse-click in the GUI. The idea of active and inactive points is simple: only the active ones are subject to fitting and peak-finding, inactive ones are neglected in these cases.

Standard deviation (or weight)

When fitting data, we assume that only the y coordinate is subject to statistical errors in measurement. This is a common assumption. To see how the y standard deviation sigma influences fitting (optimization), look at the weighted sum of squared residuals formula in the section called “Nonlinear optimization ”. We can also think about weights of points - every point has a weight assigned, that is equal wi=1/sigma^2

Standard deviation of points can be read from file together with the x and y coordinates. Otherwise, it is set either to max(sqrt(y), 1.0) or to 1, depending on the value of data-default-sigma option. Setting std. dev. as a square root of the value is common and has theoretical ground when y is the number of independent events. You can always change standard deviation, e.g. make it equal for every point with command: S=1. See the section called “Data transformations” for details.

Note

You can not set data errors (standard deviations) as unknown.

Data transformations

Every data point has four properties: x coordinate, y coordinate, standard deviation of y and active/inactive flag. Lower case letters x, y, s, a stand for these properties before transformation, and upper case X, Y, S, A for the same properties after transformation. M stands for the number of points. Data can be transformed using assignments. Command Y=-y will change the sign of the y coordinate of every point. You can also apply transformation to selected points: Y[3]=1.2 will change point with index 3 (which is 4th point, because first has index 0), and Y[3..6]=1.2 will do the same for points with indices 3, 4, 5, but not 6. Y[2...]=1.2 will apply the transformation to points with index 2 and above. You can guess what Y[..6]=1.2 does. Most of operations are executed sequentially for points from the first to the last one. n stands for the index of currently transformed point. The sequance of commands: M=500; x=n/100; y=sin(x) will generate the sinusoid dataset with 500 points.

If you have more than one dataset, you have to specify explicitly which dataset transformation applies to. See the section called “Working with multiple datasets” for details.

Note

Points are kept sorted according to their x coordinate, so changing x coordinate of points will also change the order and indices of points.

Expressions can contain real numbers in normal or scientific format (e.g. 1.23e5), constant pi, binary operators: +, -, *, /, ^, one argument functions: sqrt, exp, log10, ln, sin, cos, tan, sinh, cosh, tanh, atan, asin, acos, erf, erfc, gamma, lgamma (=ln(|gamma|)), abs, round (rounds to the nearest integer), two argument functions: min2, max2 (e.g. max2(3,5) will give 5), randuniform(a, b) (random number from interval (a, b)), randnormal(mu, sigma) (random number from normal distribution), voigt(a, b) (see below) and ternary ?: operator: condition ? expression1 : expression2, which performs expression1 if condition is true and expression2 otherwise. Conditions can be built using boolean operators and comparisions: AND, OR, NOT, >, >=, <, <=, ==, != (or <>), TRUE, FALSE.

The voigt function above has formula: K(x,y)=y/pi * integral...

The value of a data expression can be shown using the command info, see examples at the end of this section.

t[x=expression], where t=x,y,s,a,X,Y,S,A gives a linear interpolation of t between two points (or the value of first/last point if the given x is outside the current data range).

Note

All operations are performed on real numbers.

Two numbers that differ less than epsilon i.e. abs(a-b)<epsilon, are considered equal. Indices are also computed in real number domain, and then rounded to the nearest integer.

Transformations can be joined with comma (,), e.g. X=y, Y=x swaps axes.

Before and after executing transformations, points are always sorted according to their x coordinate. You can change the order of points using order=t, where t is one of x, y, s, a, -x, -y, -s, -a. Clearly, this only makes sense for a sequence of transformations (joined with comma) as after finishing each transformation, points will be reordered again.

Points can be deleted using the following syntax: delete[index-or-range] or delete(condition) and created simply by increasing value of M.

There are two parametrized functions: spline and interpolate. The general syntax is: parametrizedfunc [param1, param2](expression) e.g. spline[22.1, 37.9, 48.1, 17.2, 93.0, 20.7](x) will give the value of a cubic spline interpolation through points (22.1, 37.9), (48.1, 17.2), ... in x. Function interpolation is similar, but gives a polyline interpolation. Spline function is used for manual background subtraction via the GUI.

There are also aggragate functions: min, max, sum, avg, stddev, darea. They have two forms. In the simpler one: aggragatefunc (expression), the value of expression in brackets is calculated for all points. min gives the smallest value, max the largest, sum, avg and stddev give the sum of all values, arithmetic mean and standard deviation, respectively. True value in data expression is represented numerically by 1., and false by 0, so sum can be also used to count points that fulfil given criteria.

darea gives the sum of expressions calculated using formulae: t*(x[n+1]-x[n-1])/2, where t is the value of the expression in brackets. darea(y) gives the area under interpolated data points, and can be used to normalize the area.

The second form: aggragatefunc (expression if condition) takes into account only points for which the condition is true.

A few examples:

     Y[1...] = Y[n-1] + y[n] # integrate

     x[...-1] = (x[n]+x[n+1])/2;  # reduces
     y[...-1] = y[n]+y[n+1];      # two times
     delete(n%2==1)               # number of points

     delete(not a) # delete inactive points

     X = 4*pi * sin(x/2*pi/180) / 1.54051 # changes x scale (2theta -> Q)

     # make equal step, keep the number of points the same
     X = x[0] + n * (x[M-1]-x[0]) / (M-1),  Y = y[x=X], S = s[x=X], A = a[x=X]

     # take the first 2000 points, average them and subtract as background
     Y = y - avg(y if n<2000)

     # fityk can also be used as a simple calculator
     i 2+2 #4
     i sin(pi/4)+cos(pi/4) #1.41421
     i gamma(10) #362880

     # examples of aggregate functions
     i max(y) # the largest y value
     i sum(y>avg(y)) # the number of points which have y value greater than arithmetic mean
     Y = y / darea(y) # normalize data area
     i darea(y-F(x) if 20<x<25)
     

There is also another kind of transformations, dataset tranformations, which operate on a whole dataset, not single points. The syntax (for one dataset) is: @0 = dstransformation @0, where dstransformation can be one of:

sum_same_x

Merges points which distance in x is smaller than epsilon. x of a merged point is the average, and y and sigma are sums of components.

avg_same_x

The same as sum_same_x, but y and sigma of a merged point is set as an average of components.

shirley_bg

Calculates Shirley background (useful in X-ray photoelectron spectroscopy).

rm_shirley_bg

Calculates data with removed Shirley background.

Functions and variables in data transformation

information in this section are not often used in practice. Read it after reading the section called “Model ”.

Variables ($foo) and functions (%bar) can be used in data transformations, and a current value of data expression can be assigned to the variable. Values of the function parameters (e.g. %fun.a0) and pseudo-parameters Center, Height, FWHM and Area (e.g. %fun.Area) can also be used. Pseudo-parameters are supported only by functions, which know how to calculate these properties.

Some properties of functions can be calculated using functions numarea, findx and extremum.

numarea(%f, x1, x2, n) gives area integrated numerically from x1 to x2 using trapezoidal rule with n equal steps.

findx(%f, x1, x2, y) finds x in interval (x1, x2) such that %f(x)=y using bisection method combined with Newton-Raphson method. It is a requirement that %f(x1) < y < %f(x2).

extremum(%f, x1, x2) finds x in interval (x1, x2) such that %f'(x)=0 using bisection method. It is a requirement that %f'(x1) and %f'(x2) have different signs.

A few examples:

      $foo = {y[0]} # data expression can be used in variable assignment
      $foo2 = {y[0] in @0}  # dataset can be given if necessary
      Y = y / $foo  # and variables can be used in data transformation

      Y = y - %f(x) # subtracts function %f from data

      Y = y - @0.F(x) # subtracts all functions in F

      Z += Constant(~0)  # fit constant x-correction (this can be caused...
      fit                # ...by a shift in scale of the instrument collecting data),
      X = x + @0.Z(x)  # ...remove it from the dataset,
      Z = 0            # ...and clear the x-correction in the model

      info numarea(%fun, 0, 100, 10000) # shows area of function %fun
      info %fun.Area  # it is not always supported

      info %_1(extremum(%_1, 40, 50)) # shows extremum value

      # calculate FWHM numerically, value 50 can be tuned
      $c = {%f.Center}
      i findx(%f, $c, $c+50, %f.Height/2) - findx(%f, $c, $c-50, %f.Height/2)
      i %f.FWHM # should give almost the same.
     

Working with multiple datasets

Let us call a set of data that usually comes from one file - a dataset. All operations described above assume only one dataset. If there are more datasets created, it must be explicitly stated which dataset the command is being applied to, e.g. M=500 in @0. Datasets have numbers and are referenced by '@' with the number, e.g. @3. @* means all datasets, (e.g. Y=y/10 in @*).

To load dataset from file, use one of commands:

@n < filename [:xcol:ycol:scol:block ] [filetype options...]

@+ < filename [:xcol:ycol:scol:block ] [filetype options...]

The first one uses existing data slot and the second one creates a new slot. Using @+ increases the number of datasets, and command delete @n decreases it.

The syntax

@n = [dataset_transformation] @m [ + @k [ + ...]]

@+ = [dataset_transformation] @m [ + @k [ + ...]]

can be used to duplicate a dataset (@+ = @n), to create new dataset as a sum of two or more existing sets (@+ = @n + @m + ...), to perform dataset transformations (@n = dataset_transformation @n), etc. A sum of datasets contains all points from all component datasets. If you want to merge points with the same x value, use one of dataset transformations: @+ = sum_same_x @n + @m + ....

Each dataset has a separate model, that can be fitted to the data. This is explained in the next chapter.

Each dataset also has a title (it does not have to be unique, however). When loading file, a title is automatically created, either using the filename or by reading it from the file (depending on the format of the file). Titles can be changed using the command set @n.title=new-title . To see the current title of the dataset, use info title in @n.

It is possible to show values of a data expression calculated for each dataset. Example: i+ avg(y) in @*.

Exporting data

Command

info dataslot (expression, ...) > filename

can export data to an ASCII TSV (tab separated values) file. To export data in a 3-column (x, y and standard deviation) format, use info @n (x, y, s) > file. If a is not listed in the list of columns, such as in this example, only the active points are exported.

All expressions that can be used on the right-hand side of data transformations can also be used in the column list. Additionally, F and Z can be used with dataset prefix, e.g. info @0 (n+1, x, y, F(x), y-F(x), Z(x), %foo(x), a, sin(pi*x)+y^2) > bar.tsv.

Model

Model - Introduction

The model S (the function that is fitted to the data) is computed as a sum of component functions, like Gaussians or polynomials. To avoid confusion we will always use name model when referring to the total function fitted to data. The name function will be used only when referring to a component function. S = \sumi fi, where fi is a function of x, and depends on a vector of parameters a. This vector contains all fitted parameters. Because we often have the situation, that the error in the x coordinate of data points can be modeled with function z(x; a), we introduce this term to the model:

S(x;a) = \sumi fi (x+z(x;a);a)

where z(x;a) = \sumj zj(x;a) . Note that the same x-correction z(x) is used in all functions fi.

Now we will have a closer look at fi functions. Every function fi has a type chosen from the function types available in the program. The same is true about functions zi. One of these types is the Gaussian. It has the following formula:

height exp[-ln(2) ((x-center)/hwhm)^2]

There are three parameters of Gaussian. These parameters do not depend on x. There must be one variable bound to each parameter.

Variables

Variables in Fityk have names prefixed with the dollar symbol ($). A variable is created by assigning a value to it, e.g. $foo=~5.3 or $c=3.1 or $bar=5*sin($foo). $foo is here a so-called simple variable. It is created by assigning to it real number prefixed with ~. The `~' means that the value assigned to the variable can be changed when fitting the model to the data. For people familiar with optimization techniques: the number of defined simple variables is the number of dimensions of space we are looking for the optimum in. In the above example, the variable $c is actually a constant. $bar depends on the value of $foo. When $foo changes, the value of $bar also changes. Compound variables can be build using operators +, -, *, /, ^ and the functions sqrt, exp, log10, ln, sin, cos, tan, sinh, cosh, tanh, atan, asin, acos, erf, erfc, lgamma, abs, voigt. This is a subset of the functions used in data transformations.

Every simple parameter has a value and, optionally, domain. The domain is used only by the fitting algorithms which need to randomly initialize or change variables. Genetic Algorithms are a good example.

Variables can be used in data tranformations, e.g. Y=y/$a.

The value of the data expression can be used in the variable definition, but it must be inside braces, e.g. $bleh={M} or, to create a simple variable: $bleh=~{M}.

Sometimes it is useful to freeze a variable, i.e. to prevent it from changing while fitting. There is no special syntax for it, but it can be done using data expressions in this way:

      $a = ~12.3 # $a is fittable
      $a = {$a}  # $a is not fittable
      $a = ~{$a}  # $a is fittable again
     

It is also possible to define a variable as e.g. $bleh=~9.1*exp(~2). In this case two simple variables (with values 9.1 and 2) will be created automatically. Automatically created variables are named $_1, $_2, $_3, and so on.

Variables can be deleted using the command delete $variable.

Some fitting algorithms need to randomize the parameters of the fitted function (i.e. simple variables). For this purpose, the simple variable can have a specified domain. Note that the domain does not imply any constraints on the value the variable can have -- it is only a hint for fitting methods such as the Nelder-Mead simplex or Genetic Algorithms. Further information on how the domain is used in these methods is contained in the appropriate fitting description. The syntax is as follows:

      $a = ~12.3 [11 +- 5] # center and width of the domain is given

      $b = ~12.3 [ +- 5] # if the center of the domain is not specified,
                         # current value of the variable is used
     

If the domain is not specified, the value of variable-domain-percent option is used (domain is +/- value-of-variable * value-of-the-option / 100)

Function types and functions

Let us go back to functions. Function types have names that start with upper case letter, e.g. Linear or Voigt. Functions (i.e. function instances) have names prefixed with a percent symbol, e.g. %func. Every function has a type and variables bound to its parameters.

To see a list of available function types, use the command info types. You can also use the command info typename, e.g. info Pearson7 to see the names of the parameters, default values and formulae.

Functions can be created by giving the type and the correct number of comma-separated variables in brackets, e.g. %f = Gaussian(~66254., ~24.7, ~0.264) or %f = Gaussian(~6e4, $ctr, $b+$c). Every expression which is valid on the right-hand side of a variable assignment, can also be used as a variable. If it is not simply a name of a variable, an automatic variable is created. In the last example two variables are created (value 60000 and the sum).

The second way is to give named parameters of a function, in any order, e.g. %f = Gaussian(height=~66254., hwhm=~0.264, center=~24.7) Function types can can have specified default values for some parameters, so this assignment is also valid: %f = Pearson7(height=~66254., center=~24.7, fwhm=~0.264) , although the shape parameter of Pearson7 is not given.

A deep copy of function (i.e. all variables that it depends on are also copied) can be made using the command %function =copy(%anotherfunction)

Functions can be also created with the command guess, as described in the section called “Guessing peak location ”.

You can change a variable bound to any of the function parameters in this manner:

      =-> %f = Pearson7(height=~66254., center=~24.7, fwhm=~0.264)
      New function %f was created.
      =-> %f.center=~24.8
      =-> $h = ~66254
      =-> %f.height=$h
      =-> info %f
      %f = Pearson7($h, $_5, $_3, $_4)
      =-> $h = ~60000 # variables are kept by name, so this also changes %f
      =-> %p1.center = %p2.center + 3 # keep fixed distance between %p1 and %p2
     

Functions can be deleted using the command delete %function.

User-defined functions (UDF)

User-defined function types can be created using command define, and then used in the same way as built-in functions. The name of new type must start with an upper-case letter, contain only letters and digits, have at least two characters and must not be the same as the name of built-in function. Defined functions can be undefined using command undefine.

The name of a UDF should be followed by parameters in brackets (see examples). Names of parameters should contain only lower-case alphanumeric characters and the underscore (_), and start with lowercase letter. The name "x" is reserved, do not put it into parameter list, just use it on the right-hand side of the definition.

Each parameter can have a specified default value. To allow adding a peak with the command guess, the default value is given as an expression which can then be calculated for a known "height", "center", "fwhm" and "area". If the name itself is one of the following: "height", "center", "fwhm, "area" or "hwhm", default value is deduced (in case of "hwhm" it is "fwhm/2").

UDFs can be defined either by giving a full formula, or as a sum of already defined functions, with possible re-parametrization (see GaussianArea and GLSum below for the example of the latter). When giving a full formula, right-hand side of the equality sign is similar to the definiton of variable, but the formula can also depend on x. Hopefully the examples below will make the syntax clear.

How it works (you can skip this paragraph): the formula is parsed, derivatives of the formula are calculated symbolically, all expressions are simplified (but there is a lot of space for optimization here), bytecode is created for a kind of virtual machine, and when fitting, the VM calculates the value of the function and derivatives for every point. Common Subexpression Elimination is not implemented yet, I suppose it will noticeably speed up UDFs.

Hint: use the init file for often-used definitions. See the section called “Invoking fityk ” for details.

Examples:

  # first how some built-in functions could be defined
  define MyGaussian(height, center, hwhm) = height*exp(-ln(2)*((x-center)/hwhm)^2)
  define MyLorentzian(height, center, hwhm) = height/(1+((x-center)/hwhm)^2)
  define MyCubic(a0=height,a1=0, a2=0, a3=0) = a0 + a1*x + a2*x^2 + a3*x^3

  # supersonic beam arrival time distribution
  define SuBeArTiDi(c, s, v0, dv) = c*(s/x)^3*exp(-(((s/x)-v0)/dv)^2)/x


  # area-based Gaussian can be defined as modification of built-in Gaussian
  # (it is the same as built-in GaussianA function)
  define GaussianArea(area, center, hwhm) = Gaussian(area/fwhm/sqrt(pi*ln(2)), center, hwhm)

  # sum of Gaussian and Lorentzian, a.k.a PseudoVoigt (should be in one line)
  define GLSum(height, center, hwhm, shape) = Gaussian(height*(1-shape), center, hwhm)
  + Lorentzian(height*shape, center, hwhm)

  # to change definition of UDF, first undefine previous definition
  undefine GaussianArea
     

Speed of computations

With default settings, the value of every function is calculated at every point. Functions such as Gaussian often have non-neglectible values only in a small fraction of all points. To speed up the calculation, set the option cut-function-level to a non-zero value. For each function the range with values greater than cut-function-level will be estimated, and all values outside of this range are considered to be equal zero. Note that not all functions support this optimization.

If you have a number of loaded dataset, and the functions in different datasets do not share parameters, it is faster to fit the datasets sequentially (fit @0; fit @1; ...) then parallelly (fit @*).

Each defined simple-variable slows down the fitting, although this is often negligible.

Model, F and Z

As already discussed, each dataset has a separate model that can be fitted to the data. As can be seen from the formula above, the model is defined as a set functions fi and the set of functions zi. These sets are named F and Z respectively. The model is constructed by specifying names of functions in these two sets.

In many cases x-correction Z can safely be ignored. The fitted curve is thus the sum of all functions in F.

Command F += %function adds %function to F, command Z += %function adds %function to Z. To remove %function from F (or Z) either do F -= %function or delete %function (del %function). If there is more than one dataset, F and Z must be prefixed with the dataset number (e.g. @1.F += %function ). The following syntax is also valid:

  # create and add funtion to F
  %g = Gaussian(height=~66254., hwhm=~0.264, center=~24.7)
  @0.F += %g
  # create automatically named funtion and add it to F
  @0.F += Gaussian(height=~66254., hwhm=~0.264, center=~24.7)
  # clear F
  @0.F = 0
  # clear F and put three functions in it
  @0.F = %a + %b + %c
  # show info about the first and the last function in @0.F
  info @0.F[0], @0.F[-1]
  # the same as %bcp = copy(%b)
  %bcp = copy(@0.F[1])
  # make @1.F the exact (shallow) copy of @0.F
  @1.F = @0.F
  # make @1.F a deep copy of @0.F (all functions and variables
  # are duplicated).
  @1.F = copy(@0.F) 

The model can be exported as data points, using the syntax described in the section called “Exporting data”, or as mathematical formulae, using the command info formula in @n > filename. Some primitive simplifications are applied to the formula. To prevent it, put plus sign (+) after "info". The style of the formula output, governed by the formula-export-style option, can be either "normal" (exp(-x^2)) or "gnuplot" (exp(-x**2)).

Peak parameters can be exported using the command info peaks in @n > filename. Put the plus sign (+) after "info" to also export symmetric errors of the parameters. "@*" will export formulae or parameters used in all datasets to the same file.

It is often required to keep the width or shape of peaks constant for all peaks in the dataset. To change the variables bound to parameters with a given name for all functions in F, use the command: F.param=variable . Examples:

  F.hwhm=$foo # hwhm's of all functions in F that have parameter hwhm will be
              # equal to $foo. (hwhm here means half-width-at-half-maximum)
  F.shape=%_1.shape  # variable bound to shape of peak %_1 is bound
                     # also to shapes of all functions in F
  F.hwhm=~0.2  # For every function in F a new variable is created and bound
               # to parameter hwhm. All parameters are independent. 

Guessing peak location

It is possible to guess peak location and add it to F with the command: %name = guess PeakType [x1:x2] in @n , e.g. guess Gaussian [22.1:30.5] in @0. If the range is omitted, the whole dataset will be searched. Name of the function is optional. Some of parameters can be specified with syntax parameter=variable, e.g. guess PseudoVoigt [22.1:30.5] center=$ctr, shape=~0.3 in @0.

As an exception, if the range is omitted and the parameter center is given, the peak is searched around the center, +/- value of the option guess-at-center-pm.

Fityk offers only a primitive algorithm for peak-detection. It looks for the highest point in a given range, and than tries to find the width of the peak.

If the highest point is found near the boundary of the given range, it is very probable that it is not the peak top, and, if the option can-cancel-guess is set to true, the guess is cancelled.

There are two real-number options related to guess: height-correction and width-correction. The default value of them is 1. The guessed height and width are multiplied by the values of these options respectively.

Displaying information

If you are using the GUI, most of the available information can be displayed with mouse clicks. Alternatively, you can use the info command. Using info+ instead of info sometimes displays more verbose information.

Below is the list of arguments of info+ related to this chapter. The full list is in the section called “info: show information”

info guess range

shows where the guess command would find a peak.

info functions

lists all defined functions

info variables

lists all defined variables

info @n.F

shows information about F

info @n.Z

shows information about Z

info formula in @n

shows the mathematical formulae of the fitted functions,

info @n.dF(x)

compares the symbolic and numerical derivatives in x (useful for debugging).

Fitting

Nonlinear optimization

This is the core. We have a set of observations (data points), to which we want to fit a model that depends on adjustable parameters. Let me quote Numerical Recipes, chapter 15.0, page 656 (if you do not know the book, visit http://www.nr.com):

The basic approach in all cases is usually the same: You choose or design a figure-of-merit function (merit function, for short) that measures the agreement between the data and the model with a particular choice of parameters. The merit function is conventionally arranged so that small values represent close agreement. The parameters of the model are then adjusted to achieve a minimum in the merit function, yielding best-fit parameters. The adjustment process is thus a problem in minimization in many dimensions. [...] however, there exist special, more efficient, methods that are specific to modeling, and we will discuss these in this chapter. There are important issues that go beyond the mere finding of best-fit parameters. Data are generally not exact. They are subject to measurement errors (called noise in the context of signal-processing). Thus, typical data never exactly fit the model that is being used, even when that model is correct. We need the means to assess whether or not the model is appropriate, that is, we need to test the goodness-of-fit against some useful statistical standard. We usually also need to know the accuracy with which parameters are determined by the data set. In other words, we need to know the likely errors of the best-fit parameters. Finally, it is not uncommon in fitting data to discover that the merit function is not unimodal, with a single minimum. In some cases, we may be interested in global rather than local questions. Not, "how good is this fit?" but rather, "how sure am I that there is not a very much better fit in some corner of parameter space?"

Our function of merit is WSSR - the weighted sum of squared residuals, also called chi-square:

chi2 = sumi=1N [(yi - y(xi;a)) /sigmai]2 = sumi=1N wi [yi - y(xi;a)] 2

Weights are based on standard deviations, wi=1/sigma^2. You can learn why squares of residuals are minimized e.g. from chapter 15.1 of Numerical Recipes. So we are looking for a global minimum of chi2. This field of numerical research (looking for a minimum or maximum) is usually called optimization; it is non-linear and global optimization. Fityk implements three very different optimization methods. All are well-known and described in many standard textbooks.

The standard deviations of the best-fit parameters are given by the square root of the corresponding diagonal elements of the covariance matrix. The covariance matrix is based on standard deviations of data points. Formulae can be found e.g. in GSL Manual , chapter Linear regression. Overview (weighted data version).

Note

Some programs scale errors with square root of reduced chi2 (i.e. with sqrt(WSSR/DoF), where DoF is the number of degrees of freedom, i.e. the number of active data points minus the number of parameters). Fityk is not doing this.

Fitting related commands

To fit model to data, use command

fit [+] [number-of-iterations] [in @n, ...]

The plus sign (+) prevents initialization of the fitting method. It is used to continue the previous fitting where it left off. All non-linear fitting methods are iterative. number-of-iterations is the maximum number of iterations. There are also other stopping criteria, so that the number of executed iterations can be smaller.

fit [...] in @* fits all datasets simultaneously.

Fitting methods can be set using the set command: set fitting-method = method, where method is one of: Levenberg-Marquardt, Nelder-Mead-simplex, Genetic-Algorithms.

All non-linear fitting methods are iterative, and there are two common stopping criteria. The first is the number of iterations and can be specified after the fit command. The second is the number of evaluations of the objective function (WSSR), specified by the value of option max-wssr-evaluations (0=unlimited). It is approximately proportional to time of computations, because most of time in fitting process is taken by evaluating WSSR. There are also other criteria, different for each method.

If you give too small n to fit command, and fit is stopped because of it, not because of convergence, it makes sense to use fit+ command to process further iterations. [TODO: how to stop fit interactively]

Setting set autoplot = on-fit-iteration will draw a plot after every iteration, to visualize progress. (see autoplot)

Information about goodness-of-fit can be displayed using info fit. To see symmetric errors use info errors, and info+ errors additionally shows the variance-covariance matrix.

Available methods can be mixed together, e.g. it is sensible to obtain initial parameter estimates using the Simplex method, and then fit it using Levenberg-Marquardt.

Values of all parameters are stored before and after fitting (if they changed). This enables simple undo/redo functionality. If in the meantime some functions or variables where added or removed, the program can still load the old parameters, but the result can be unexpected. The following history-related commands are provided:

fit undo

move back to the previous parameters (undo fitting).

fit redo

move forward in the parameter history

info fit-history

show number of items in the history

fit history n

load the n-th set of parameters from history

fit history clear

clear the history

Levenberg-Marquardt

This is a standard nonlinear least-squares routine, and involves computing the first derivatives of functions. For a description of the L-M method see Numerical Recipes, chapter 15.5 or Siegmund Brandt Data Analysis, chapter 10.15. Essentially, it combines an inverse-Hessian method with a steepest descent method by introducing a lambda factor. When lambda is equal to 0, the method is equivalent to the inverse-Hessian method. When lambda increases, the shift vector is rotated toward the direction of steepest descent and the length of the shift vector decreases. (The shift vector is a vector that is added to the parameter vector.) If a better fit is found on iteration, lambda is decreased - it is divided by the value of lm-lambda-down-factor option (default: 10). Otherwise, lambda is multiplied by the value of lm-lambda-up-factor (default: 10). The initial lambda value is equal to lm-lambda-start (default: 0.0001).

The Marquardt method has two stopping criteria other than the common criteria. If it happens twice in sequence, that the relative change of the value of the objective function (WSSR) is smaller then the value of the lm-stop-rel-change option, the fit is considered to have converged and is stopped. Additionally, if lambda is greater than the value of the lm-max-lambda option (default: 10^15), - usually when due to limited numerical precision WSSR is no longer changing, the fitting is also stopped.

Nelder-Mead downhill simplex method

To quote chapter 4.8.3, p. 86 of Peter Gans Data Fitting in the Chemical Sciences by the Method of Least Squares

A simplex is a geometrical entity that has n+1 vertices corresponding to variations in n parameters. For two parameters the simplex is a triangle, for three parameters the simplex is a tetrahedron and so forth. The value of the objective function is calculated at each of the vertices. An iteration consists of the following process. Locate the vertex with the highest value of the objective function and replace this vertex by one lying on the line between it and the centroid of the other vertices. Four possible replacements can be considered, which I call contraction, short reflection, reflection and expansion.[...]

It starts with an arbitrary simplex. Neither the shape nor position of this are critically important, except insofar as it may determine which one of a set of multiple minima will be reached. The simplex than expands and contracts as required in order to locate a valley if one exists. Then the size and shape of the simplex is adjusted so that progress may be made towards the minimum. Note particularly that if a pair of parameters are highly correlated, both will be simultaneously adjusted in about the correct proportion, as the shape of the simplex is adapted to the local contours.[...]

Unfortunately it does not provide estimates of the parameter errors, etc. It is therefore to be recommended as a method for obtaining initial parameter estimates that can be used in the standard least squares method.

This method is also described in previously mentioned Numerical Recipes (chapter 10.4) and Data Analysis (chapter 10.8).

There are a few options for tuning this method. One of these is a stopping criterium nm-convergence. If the value of the expression 2(M-m)/(M+m), where M and m are the values of the worst and best vertices respectively (values of objective functions of vertices, to be precise!), is smaller then the value of nm-convergence option, fitting is stopped. In other words, fitting is stopped if all vertices are almost at the same level.

The remaining options are related to initialization of the simplex. Before starting iterations, we have to choose a set of points in space of the parameters, called vertices. Unless the option nm-move-all is set, one of these points will be the current point - values that parameters have at this moment. All but this one are drawn as follows: each parameter of each vertex is drawn separately. It is drawn from a distribution that has its center in the center of the domain of the parameter, and a width proportional to both width of the domain and value of the nm-move-factor parameter. Distribution shape can be set using the option nm-distribution as one of: uniform, gaussian, lorentzian and bound. The last one causes the value of the parameter to be either the greatest or smallest value in the domain of the parameter - one of two bounds of the domain (assuming that nm-move-factor is equal 1).

Genetic Algorithms

[TODO]

Settings

Note

This chapter is not about GUI settings (things like colors, fonts, etc.), but about settings that are common for both CLI and GUI version.

Command info set shows the syntax of the set command and lists all possible options. set option shows the current value of the option, and set option = value changes it. It is also possible to change the value of the option for one command only by prepending the command with with option = value . The examples at the end of this chapter should clarify this.

autoplot

See the section called “plot: viewing data”.

can-cancel-guess

See the section called “Guessing peak location ”.

cut-function-level

See the section called “Speed of computations”.

data-default-sigma

See the section called “Standard deviation (or weight) ”.

epsilon

It is used for floating-point comparison: a and b are considered equal when |a-b|<epsilon. You may want to decrease it when you work with very small values, like 10^-10.

exit-on-warning

If the option exit-on-warning is set, any warning will also close the program. This ensures that no warnings can be overlooked.

fitting-method

See the section called “Fitting related commands ”.

formula-export-style

See the section called “Model, F and Z”.

guess-at-center-pm

See the section called “Guessing peak location ”.

height-correction

See the section called “Guessing peak location ”.

lm-*

Setting to tune Levenberg-Marquardt fitting method.

max-wssr-evaluations

See the section called “Fitting related commands ”.

nm-*

Setting to tune Nelder-Mead downhill simplex fitting method.

pseudo-random-seed

Some fitting methods and functions, such as randnormal in data expressions use a pseudo-random number generator. In some situations one may want to have repeatable and predictable results of the fitting, e.g. to make a presentation. Seed for a new sequence of pseudo-random numbers can be set using the option pseudo-random-seed. If it is set to 0, the seed is based on the current time and a sequence of pseudo-random numbers is different each time.

variable-domain-percent

See the section called “Variables”.

verbosity

Possible values: quiet, normal, verbose, debug.

width-correction

See the section called “Guessing peak location ”.

Examples:

     set fitting-method  # show info
     set fitting-method = Nelder-Mead-simplex # change default method
     set verbosity = verbose
     with fitting-method = Levenberg-Marquardt fit 10
     with fitting-method=Levenberg-Marquardt, verbosity=only-warnings fit 10
    

Other commands

plot: viewing data

In the GUI version there is hardly ever a need to use this command directly.

The command plot controls visualization of data and the model. It is used to plot a given area - in GUI it is plotted in the program's main window, in CLI the popular program gnuplot is used, if available.

plot [xrange [yrange] ] [in @n]

xrange and yrange have one of two following syntaxes:

{[} [min] : [max] {]}

.

The second is just a dot (.), and it implies that the appropriate range is not to be changed.

Examples:

   plot [20.4:50] [10:20] # show x from 20.4 to 50 and y from 10 to 20

   plot [20.4:] # x from 20.4 to the end,
                # y range will be adjusted to encompass all data

   plot . [:10] # x range will not be changed, y from the lowest point to 10
   plot [:] [:] # all data will be shown
   plot         # all data will be shown
   plot . .     # nothing changes
     

The value of the option autoplot changes the automatic plotting behaviour. By default, the plot is refreshed automatically after changing the data or the model. It is also possible to visualize each iteration of the fitting method by replotting the peaks after every iteration.

info: show information

First, there is an option verbosity (not related to command info) which sets the amount of messages displayed when executing commands.

If you are using the GUI, most information can be displayed with mouse clicks. Alternatively, you can use the info command. Using the info+ instead of info sometimes displays more detailed information.

The output of info can be redirected to file using info args > filename syntax to truncate the file or info args >> filename to append to the file.

The following arguments are recognized:

variables
$variable_name
types
TypeName
functions
%function_name
datasets
data [in @n]
title [in @n]
filename [in @n]
commands
commands [n:m]
view
set
fit [in @n]
fit-history
errors [in @n]
formula [in @n]
peaks [in @n]
guess [x-range] [in @n]
data-expression [in @n]
[@n.]F
[@n.]Z
[@n.]dF(data-expression)
der mathematic-function
version

info der shows derivatives of given function.

      =-> info der sin(a) + 3*exp(b/a)
      f(a, b) = sin(a)+3*exp(b/a)
      df / d a = cos(a)-3*exp(b/a)*b/a^2
      df / d b = 3*exp(b/a)/a
     

commands, dump, sleep, reset, quit, !

All commands given during program execution are stored in memory. They can be listed using the command: info commands [n:m] or written to file: info commands [n:m] > filename . To put all commands executed so far during the session into the file foo.fit, type info commands[:] > foo.fit. With the plus sign (+) (i.e. info+ commands [n:m] ) information about the exit status of each command will be added.

To log commands to a file when they are executed, use: commands > filename or, to log also the output: commands+ > filename . To stop logging, use: commands > /dev/null .

Scripts can be executed using the command: commands < filename . It is possible to execute only selected lines from the script: commands < filename[n:m]

There is also a command dump > filename, which writes the current state of the program together with all datasets to a single .fit file.

Command sleep sec makes the program wait sec seconds, before continuing.

The command quit works as expected. If this command is found in a script it quits the program, not only the script.

Commands that start with ! are passed (without '!') to system() call.

Chapter 4. Using and extending

Use cases

[TODO]

Extensions

How to add your own built-in function

Note

Add built-in function only if user-defined function (UDF) is too slow or too limited.

To add a built-in function, you have to change the source of the program and then recompile it. Users who want to do this should be able to compile the program from source and know the basics of C, C++ or another programming language.

The description that follows is not complete. If something is not clear, you can always send me e-mail, etc.

"fp" you can see in fityk source means a real (floating point) number (typedef double fp).

The name of your function should start with uppercase letter and contain only letters and digits. Let us add function Foo with the formula: Foo(height, center, hwhm) = height/(1+((x-center)/hwhm)^2). C++ class representing Foo will be named FuncFoo.

In src/func.cpp you will find a list of functions:

       ...
       FACTORY_FUNC(Polynomial6)
       FACTORY_FUNC(Gaussian)
       ...
      

Now, add:

       FACTORY_FUNC(Foo)
      

Then find another list:

       ...
       FuncPolynomial6::formula,
       FuncGaussian::formula,
       ...
      

and add the line

      FuncFoo::formula,
     

Note that in the second list all items but the last are followed by comma.

In the file src/bfunc.h you can now begin writing the definition of your class:

      class FuncFoo : public Function
      {
          DECLARE_FUNC_OBLIGATORY_METHODS(Foo)
     

If you want to make some calculations every time parameters of the function are changed, you can do it in method do_precomputations. This possibility is provided for calculating expressions, which do not depend on x. Write the declaration here:

     void do_precomputations(std::vector<Variable*> const &variables);
     

and provide a proper definition of this method in src/bfunc.cpp.

If you want to optimize the calculation of your function by neglecting its value outside of a given range (see option cut-function-level in the program), you will need to use the method:

      bool get_nonzero_range (fp level, fp &left, fp &right) const;
     

This method takes the level below which the value of the function can be approximated by zero, and should set the left and right variables to proper values of x, such that if x<left or x>right than |f(x)|<level. If the function sets left and right, it should return true.

If your function does not have a "center" parameter, and there is a center-like point where you want the peak top to be drawn, write:

      bool has_center() const { return true; }
      fp center() const { return vv[1]; }
     

In the second line, between return and the semicolon, there is an expression for the x coordinate of peak top; vv[0] is the first parameter of function, vv[1] is the second, etc.

Finally, close the definition of the class with:

      };
     

Now go to file src/bfunc.cpp.

Write the function formula in this way:

      const char *FuncFoo::formula
      = "Foo(height, center, hwhm) = height/(1+((x-center)/hwhm)^2)";
     

The syntax of the formula is the similar as that of the UDF, but for built-in functions only the left hand side of the formula is parsed. The right hand side is for documentation only.

Write how to calculate the value of the function:

      FUNC_CALCULATE_VALUE_BEGIN(Foo)
          fp xa1a2 = (x - vv[1]) / vv[2];
          fp inv_denomin = 1. / (1 + xa1a2 * xa1a2);
      FUNC_CALCULATE_VALUE_END(vv[0] * inv_denomin)
     

The expression at the end (i.e. vv[0]*inv_denomin) is the calculated value. xa1xa2 and inv_denomin are variables introduced to simplify the expression. Note the "fp" (you can also use "double") at the beginning and semicolon at the end of both lines. The meaning of vv has already been explained. Usually it is more difficult to calculate derivatives:

      FUNC_CALCULATE_VALUE_DERIV_BEGIN(Foo)
          fp xa1a2 = (x - vv[1]) / vv[2];
          fp inv_denomin = 1. / (1 + xa1a2 * xa1a2);
          dy_dv[0] = inv_denomin;
          fp dcenter = 2 * vv[0] * xa1a2 / vv[2] * inv_denomin * inv_denomin;
          dy_dv[1] = dcenter;
          dy_dv[2] = dcenter * xa1a2;
          dy_dx = -dcenter;
      FUNC_CALCULATE_VALUE_DERIV_END(vv[0] * inv_denomin)
     

You must set derivatives dy_dv[n] for n=0,1,...,(number of parameters of your function - 1) and dy_dx. In the last brackets there is a value of the function again.

If you declared do_precomputations or get_nonzero_range methods, do not forget to write definitions for them.

After compilation of the program check if the derivatives are calculated correctly using command "info dF(x)", e.g. i dF(30.1). You can also use numarea, findx and extremum (see the section called “Functions and variables in data transformation” for details) to verify center, area, height and FWHM properties.

Hope this helps. Do not hesistate to change this description or ask questions if you have any. Consider sharing your function with other users (using FitykWiki or mailing list).

Appendix A. List of functions

The list of all functions can be obtained using i+ types. Some formulae here have long parameter names (like "height", "center" and "hwhm") replaced with ai.

Equation A.1. Gaussian

Type in program "info Gaussian" to see the formula.


Equation A.2. SplitGaussian

Type in program "info SplitGaussian" to see the formula.


Equation A.3. GaussianA

Type in program "info GaussianA" to see the formula.


Equation A.4. Lorentzian

Type in program "info Lorentzian" to see the formula.


Equation A.5. LorentzianA

Type in program "info LorentzianA" to see the formula.


Equation A.6. Pearson VII (Pearson7)

Type in program "info Pearson7" to see the formula.


Equation A.7. Split-Pearson-VII (SplitPearson7)

Type in program "info SplitPearson7" to see the formula.


Equation A.8. Pearson-VII-Area (Pearson7A)

Type in program "info Pearson7A" to see the formula.


Equation A.9. Pseudo-Voigt (PseudoVoigt)

Type in program "info PseudoVoigt" to see the formula.


Pseudo-Voigt is a name given to the sum of Gaussian and Lorentzian. a3 parameters in Pearson VII and Pseudo-Voigt are not related.

Equation A.10. Pseudo-Voigt-Area (PseudoVoigtA)

Type in program "info PseudoVoigtA" to see the formula.


Equation A.11. Voigt

Type in program "info Voigt" to see the formula.


The Voigt function is a convolution of Gaussian and Lorentzian functions. a0 = heigth, a1 = center, a2 is proportional to the Gaussian width, and a3 is proportional to the ratio of Lorentzian and Gaussian widths. Voigt is computed according to R.J.Wells, “ Rapid approximation to the Voigt/Faddeeva function and its derivatives ”, Journal of Quantitative Spectroscopy & Radiative Transfer 62 (1999) 29-48. (See also: http://www.atm.ox.ac.uk/user/wells/voigt.html). Is the approximation exact enough for all possible uses of fityk program?

Equation A.12. VoigtA

Type in program "info VoigtA" to see the formula.


Equation A.13. Exponentially Modified Gaussian (EMG)

Type in program "info EMG" to see the formula.


Equation A.14. Doniach-Sunjic (DoniachSunjic)

Type in program "info DoniachSunjic" to see the formula.


Equation A.15. Polynomial5

Type in program "info Polynomial5" to see the formula.


Appendix B. Command shortenings

The pipe symbol (|) shows the minimum length of the command. "def|ine" means that the shortest version is "def", but "defi", "defin" and "define" are also valid and mean exactly the same. Arguments of "info" command can not be shortened, i.e. you must write "i fit", not "i f". Commands which cannot be shortened are not listed here.

c|ommands
def|ine
f|it
g|uess
i|nfo
p|lot
s|et
undef|ine
w|ith

Appendix C. License

Fityk is free software; you can redistribute and modify it under terms of GNU General Public License, version 2 or (at your option) any later version. There is no warranty. Text of the license is distributed with the program in the file COPYING.

Appendix D. About this manual

This manual is written in DocBook (XML) and converted to other formats. The fitykhelp.xml file is distributed with the program sources, and can be modified with any text editor. All changes, improvements, corrections, etc. are welcome.

Following people have contributed to this manual (in chronological order): Marcin Wojdyr (maintainer), Stan Gierlotka, Jaap Folmer, Michael Richardson.

This version of the manual is produced from fitykhelp.xml $Revision: 512 $, last modification: $Date: 2009-06-15 21:27:53 +0200 (pon, 15 cze 2009) $.

Bibliography

[1] William Press, Saul Teukolsky, William Vetterling, and Brian Flannery. Numerical Recipes in C. http://www.nr.com.

[2] Peter Gans. Data Fitting in the Chemical Sciences by the Method of Least Squares . John Wiley & Sons. 1992.

[3] Siegmund Brandt. Data Analysis. Springer Verlag. 1999.

[4] PeakFit 4.0 for Windows User's Manual. AISN Software. 1997.

[5] Zbigniew Michalewicz. Algorytmy genetyczne + struktury danych = programy ewolucyjne. WNT. 1996.