1 GETTING STARTED {#developer}
4 Shogun is
split up into libshogun which contains all the machine learning
5 algorithms and
'static interfaces' helpers,
6 the
static interfaces python_static, octave_static, matlab_static, r_static and
7 the modular interfaces python_modular, octave_modular and r_modular (all found
8 in the src/interfaces/ subdirectory with corresponding name). See src/INSTALL on
11 In
case one wants to extend shogun the best way is to start
using its library.
12 This can be easily done as a number of examples in examples/libshogun document.
14 The simplest libshogun based program would be
18 using namespace shogun;
20 int main(
int argc,
char** argv)
27 which could be compiled with g++ -lshogun minimal.cpp -o minimal and obviously
28 does nothing (apart form initializing and destroying a couple of global shogun
31 In
case one wants to redirect shoguns output functions
SG_DEBUG,
SG_INFO,
35 void print_message(FILE* target, const
char* str)
37 fprintf(target,
"%s", str);
40 void print_warning(FILE* target,
const char* str)
42 fprintf(target,
"%s", str);
45 void print_error(FILE* target,
const char* str)
47 fprintf(target,
"%s", str);
53 To
finally see some action one has to include the appropriate header files,
54 e.g. we create some features and a gaussian kernel
64 using namespace shogun;
66 void print_message(FILE* target,
const char* str)
68 fprintf(target,
"%s", str);
71 int main(
int argc,
char** argv)
77 for (int32_t i=0; i<6; i++)
93 kernel->
init(features, features);
100 for (int32_t i=0; i<3; i++)
111 Now you probably wonder why
this example does not leak memory. First of all,
112 supplying pointers to arrays allocated with
new[] will make shogun objects own
113 these objects and will make them take care of cleaning them up on
object
114 destruction. Then, when creating shogun objects they keep a reference counter
115 internally. Whenever a shogun
object is returned or supplied as an argument to
116 some
function its reference counter is increased,
for example in the example
121 increases the reference count of kernel and labels. On destruction the
122 reference counter is decreased and the
object is freed
if the counter is <= 0.
124 It is therefore your duty to prevent objects from destruction
if you keep a
125 handle to them globally *which you still intend to use later*. In the example
126 above accessing labels after the call to
SG_UNREF(svm) will cause a
127 segmentation fault as the Label
object was already destroyed in the SVM
128 destructor. You can do this by
SG_REF(obj). To decrement the reference count of
129 an
object, call
SG_UNREF(obj) which will also automagically destroy it if the
130 counter is <= 0 and set obj=NULL only in this case.
133 Generally, all shogun C++ Objects are prefixed with C, e.g.
CSVM and derived from
134 CSGObject. Since variables in the upper class hierarchy, need to be initialized
135 upon construction of the
object, the constructor of
base class needs to be
137 CClassifier which finally calls
CSGObject.
139 For example if you implement your own SVM called MySVM you would in the
142 class MySVM : public
CSVM
150 In
case you got your
object working we will happily integrate it into shogun
151 provided you follow a number of basic coding conventions detailed below (see
152 FORMATTING
for formatting instructions, MACROS on how to use and name macros,
153 TYPES on which types to use, FUNCTIONS on how functions should look like and
154 NAMING CONVENTIONS
for the naming scheme.
161 - indenting uses stroustrup style with tabsize 4, i.e.
for emacs use in your
164 (add-hook
'c-mode-common-hook
167 (setq indent-tabs-mode t)
168 (c-set-style "stroustrup")
173 set cindent " C style indenting
175 set sw=4 " shiftwidth
177 - for newlines use LF only; avoid CRLF and CR. Git can be configured to convert
178 all newlines to LF as source files are commited to the repo by:
180 git config --global core.autocrlf input
182 (for more information consult http://help.github.com/line-endings/)
184 - avoid trailing whitespace (spaces & tabs) at end of lines and never use spaces
185 for indentation; only ever use tabs for indentations.
189 (add-hook 'before-save-hook
'delete-trailing-whitespace)
191 for vim in ~/.vimrc (implemented as an autocmd, use wisely):
193 autocmd BufWritePre * :%s/\s\+$//e
195 - semicolons and commas ;, should be placed directly after a variable/statement
200 for (uint32_t i=0; i<10; i++)
203 - brackets () and (greater/lower) equal sign ><= should should not contain
204 unecessary spaces, e.g:
207 int32_t b=kernel->compute();
213 exceptions are logical subunits
215 if ( (a==1) && (b==1) )
219 - avoid the use of inline functions where possible (little to zero performance
220 impact). nowadays compilers automagically inline code when beneficial
221 and within the same linking process
223 - breaking long lines and strings
224 limit yourselves to 80 columns
226 for (int32_t vec=params->start; vec<params->end &&
227 !CSignal::cancel_computations(); vec++)
232 however exceptions are OK if readability is increased (as in function
235 - don't put multiple assignments on a single line
237 - functions look like
239 int32_t* fun(int32_t* foo)
244 and are separated by a newline, e.g:
246 int32_t* fun1(int32_t* foo1)
251 int32_t* fun2(int32_t* foo2)
256 - same
for if ()
else clauses,
while/
for loops
267 - one empty line between { } block, e.g.
269 for (int32_t i=0; i<17; i++)
278 - use macros sparingly
279 - avoid defining constants
using macros (bye bye typechecking), use
283 or enums (when defining several realted constants) instead
285 - use ifdefs sparingly (really limit yourself to the ones necessary) as their
286 extreme usage makes the code completely unreadable. to achieve that it may be
287 necessary to wrap a
function of (e.g.
for
288 pthread_create()/CreateThread()/thread_create() a wrapper
function to create a
289 thread and inside of it the ifdefs to
do it the solaris/win32/posix way)
290 -
if you need to use ifdefs always comment the corresponding #
else / #endif
291 in the following way:
302 - types (use only these!):
304 char (8bit char(maybe signed or unsigned))
305 uint8_t (8bit unsigned char)
306 uint16_t (16bit unsigned short)
307 uint32_t (32bit unsinged int)
312 floatmax_t (96bit or 128bit float depending on arch)
316 - classes must be (directly or indirectly) derived from
CSGObject
318 - don
't use fprintf/printf, but SG_DEBUG/SG_INFO/SG_WARNING/SG_ERROR/SG_PRINT
319 (if in a from CSGObject derived object) or the static SG_SDEBUG/... functions
324 - Functions should be short and sweet, and do just one thing. They should fit
325 on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all
326 know), and do one thing and do that well.
327 - Another measure of the function is the number of local variables. They
328 shouldn't exceed 5-10, or you
're doing something wrong. Re-think the
329 function, and split it into smaller pieces. A human brain can
330 generally easily keep track of about 7 different things, anything more
331 and it gets confused. You know you're brilliant, but maybe you
'd like
332 to understand what you did 2 weeks from now.
334 *GETTING / SETTING OBJECTS*
336 If a class stores a pointer to an object it should call SG_REF(obj) to increase
337 the objects reference count and SG_UNREF(obj) on class desctruction (which will
338 decrease the objects reference count and call the objects destructor if
339 ref_count()==0. Note that the caller (from within C++) of any get_* function
340 returning an object should also call SG_UNREF(obj) when done with the object.
341 This makes the swig wrapped interfaces automagically take care of object
344 If a class function returns a new object this has to be stated in the
345 corresponding swig .i file for cleanup to work, e.g. if apply() returns a new
346 CLabels then the .i file should contain `%newobject CClassifier::apply();`
348 *NAMING CONVENTIONS:*
351 - in classes are member variables are named like m_feature_vector (to avoid
352 shadowing and the often hard to find bugs shadowing causes)
353 - parameters (in functions) shall be named e.g. feature_vector
354 - don't use meaningless variable names, it is however fine to use short names
355 like i,j,k etc in loops
356 - class names start with
'C', each syllable/subword starts with a capital letter,
359 - constants/defined objects are UPPERCASE, i.e. REALVALUED
361 - function are named like get_feature_vector() and should be limited to as few arguments as
362 possible (no monster functions with > 5 arguments please)
364 - objects which can deal with features of type DREAL and class SIMPLE don
't need
365 to contain Real/Dense in class name
367 - others are required to clarify class/type they can handle, e.g.
368 CSparseByteLinearKernel, CSparseGaussianKernel
371 - variable and function names are all lowercase (except for class Con/Destructors)
372 syllables/subwords are separated by '_
', e.g. compute_kernel_value(), my_local_variable
374 - class member variables all start with m_, e.g. m_member (this is to avoid shadowing)
376 - features and preprocessors are prefixed with featureclass (e.g. Dense/Sparse) followed by featuretype (Real/Byte/...)
380 The git repo for the project is hosted on GitHub at
381 https://github.com/shogun-toolbox/shogun. To get started, create your own fork
382 and clone it ([howto](https://help.github.com/articles/fork-a-repo "GitHub help - Fork a repo")).
383 Remember to set the upstream remote to the main repo by:
385 git remote add upstream git://github.com/shogun-toolbox/shogun.git
387 Its recommended to create local branches, which are linked to branches from
388 your remote repository. This will make "push" and "pull" work as expected:
390 git checkout --track origin/master
391 git checkout --track origin/develop
393 Each time you want to develop new feature / fix a bug / etc consider creating
396 git checkout -b new_feature_name
398 While being on new_feature_name branch, develop your code, commit things and do
401 Once your feature is ready (please consider larger commits that keep shogun in
402 compileable state), rebase your new_feature_name branch on upstream/develop
407 git rebase upstream/develop
408 git checkout new_feature_name
411 Now you can push it to your origin repository:
415 And finally send a pull request (PR) to the develop branch of the shogun
416 repository in github.
421 What rebasing does is, in short, "Forward-port local commits to the updated
422 upstream head". A longer and more detailed illustration with nice figures
423 can be found at http://book.git-scm.com/4_rebasing.html. So rebasing (instead
424 of merging) makes the main "commit-thread" of the repo a simple series.
426 Rebasing before issuing a pull request also enable us to find and fix any
427 potential conflicts early at the developer side (instead of at the one who
428 merges your pull request).
430 - Multiple pull requests
432 You can have multiple pull requests by creating multiple branches. Github
433 only tracks the branch names you used for identify the pull request. So when
434 you push new commits to your remote branch at github, the pull request will
435 "update" accordingly.
437 - Non-fast-forward error
439 This error happens when:
441 1. `git checkout -b my-branch`
442 2. ... do something ...
444 4. `git push origin my-branch`
445 5. ... do more thing ...
447 7. `git push origin my-branch`
449 then git will complain about non-fast-forward error and not pushing into the remote
450 my-branch branch. This is because the first push has already created the my-branch
451 branch in origin. Later when you run rebasing, which is a destructive operation for
452 the local history. Since the local history is no longer the same as those in the remote
453 branch, pushing is not allowed.
455 Solution for this situation is to delete your remote branch by
457 git push origin :my-branch
461 git push origin my-branch
463 note deleting your remote branch will not delete your pull request associated with that
464 branch. And as long as you push your branch there again, your pull request will be OK.
466 - Unit testing/Pre-commit hook
467 As shogun-toolbox is getting bigger and bigger code-reviews of pull requests are getting
468 harder and harder. In order to avoid breaking the functionality of the existing code, we
469 highly encourage contributors of shogun to use the supplied unit testing, that is based
470 on Google C++ Mock Framework.
472 In order to be able to use the unit testing framework one will need to have
473 Google C++ Mock Framework installed on your machine. The gmock version is
474 1.7.0 and the gtest version is 1.6.0 (or it will have some errors).
476 - [Google Mock](https://code.google.com/p/googlemock/)
477 - [Google Test](https://code.google.com/p/googletest/)
479 Then use cmake/ccmake with the ENABLE_TESTING switching on.
483 cmake -DENABLE_TESTING=on ..
485 Once it's detected if you add new classes to the code please define some basic
486 unit tests for them under ./tests/unit (see some of the examples under that directory).
487 As one can see the naming convention for files that contains the unit tests are:
488 \<classname\>_unittest.cc
490 Before committing or sending a pull request please run
'make unit-tests' under root
491 directory in order to check that nothing has been broken by the modifications and
492 the library is still acting as it
's intended.
494 One possible way to do this automatically is to add into your pre-commit hook the
495 following code snippet (.git/hook/pre-commit):
499 # run unit testing for basic checks
500 # and only let commiting if the unit testing runs successfully
503 This way before each commit the unit testing will run automatically and if it
504 fails it won't let you commit until you don
't fix the problem (or remove the
507 Note that the script should be executable, i.e.
509 chmod +x .git/hook/pre-commit
511 You can also test all the examples in shogun/exapmles to check whether your configuration and environment is totally okay. Please note that some of the examples are dependent on data sets, which should be downloaded beforehand, and so that you can pass all the tests of those examples. Downloading data can be easily done by calling a git command (please refer to [README_data.md](https://github.com/shogun-toolbox/shogun/blob/develop/doc/md/README_data.md)). Afterwards, you can test the examples by:
515 To make a release, adjust the [NEWS](NEWS) file properly, i.e. date, release version (like 3.0.0), adjust the soname if required (cf. [README_soname](README_soname.md)) and if a new data version is required add that too. If parameters have been seen changes increase the parameter version too.