SHOGUN  3.2.1
 全部  命名空间 文件 函数 变量 类型定义 枚举 枚举值 友元 宏定义  
README_developer.md
浏览该文件的文档.
1 GETTING STARTED {#developer}
2 ===============
3 
4 Shogun is split up into libshogun which contains all the machine learning
5 algorithms and 'static interfaces' helpers,
6 the static interfaces python_static, octave_static, matlab_static, r_static and
7 the modular interfaces python_modular, octave_modular and r_modular (all found
8 in the src/interfaces/ subdirectory with corresponding name). See src/INSTALL on
9 how to install shogun.
10 
11 In case one wants to extend shogun the best way is to start using its library.
12 This can be easily done as a number of examples in examples/libshogun document.
13 
14 The simplest libshogun based program would be
15 
16  #include <shogun/base/init.h>
17 
18  using namespace shogun;
19 
20  int main(int argc, char** argv)
21  {
22  init_shogun();
23  exit_shogun();
24  return 0;
25  }
26 
27 which could be compiled with g++ -lshogun minimal.cpp -o minimal and obviously
28 does nothing (apart form initializing and destroying a couple of global shogun
29 objects internally).
30 
31 In case one wants to redirect shoguns output functions SG_DEBUG, SG_INFO,
32 SG_WARN, SG_ERROR, SG_PRINT etc, one has to pass them to init_shogun() as
33 parameters like this
34 
35  void print_message(FILE* target, const char* str)
36  {
37  fprintf(target, "%s", str);
38  }
39 
40  void print_warning(FILE* target, const char* str)
41  {
42  fprintf(target, "%s", str);
43  }
44 
45  void print_error(FILE* target, const char* str)
46  {
47  fprintf(target, "%s", str);
48  }
49 
50  init_shogun(&print_message, &print_warning,
51  &print_error);
52 
53 To finally see some action one has to include the appropriate header files,
54 e.g. we create some features and a gaussian kernel
55 
56  #include <shogun/labels/Labels.h>
60  #include <shogun/base/init.h>
61  #include <shogun/lib/common.h>
62  #include <shogun/io/SGIO.h>
63 
64  using namespace shogun;
65 
66  void print_message(FILE* target, const char* str)
67  {
68  fprintf(target, "%s", str);
69  }
70 
71  int main(int argc, char** argv)
72  {
73  init_shogun(&print_message);
74 
75  // create some data
76  SGMatrix<float64_t> matrix(2,3);
77  for (int32_t i=0; i<6; i++)
78  matrix.matrix[i]=i;
79 
80  // create three 2-dimensional vectors
81  // shogun will now own the matrix created
83  features->set_feature_matrix(matrix);
84 
85  // create three labels
86  CBinaryLabels* labels=new CBinaryLabels(3);
87  labels->set_label(0, -1);
88  labels->set_label(1, +1);
89  labels->set_label(2, -1);
90 
91  // create gaussian kernel with cache 10MB, width 0.5
92  CGaussianKernel* kernel = new CGaussianKernel(10, 0.5);
93  kernel->init(features, features);
94 
95  // create libsvm with C=10 and train
96  CLibSVM* svm = new CLibSVM(10, kernel, labels);
97  svm->train();
98 
99  // classify on training examples
100  for (int32_t i=0; i<3; i++)
101  SG_SPRINT("output[%d]=%f\n", i, svm->apply_one(i));
102 
103  // free up memory
104  SG_UNREF(svm);
105 
106  exit_shogun();
107  return 0;
108 
109  }
110 
111 Now you probably wonder why this example does not leak memory. First of all,
112 supplying pointers to arrays allocated with new[] will make shogun objects own
113 these objects and will make them take care of cleaning them up on object
114 destruction. Then, when creating shogun objects they keep a reference counter
115 internally. Whenever a shogun object is returned or supplied as an argument to
116 some function its reference counter is increased, for example in the example
117 above
118 
119  CLibSVM* svm = new CLibSVM(10, kernel, labels);
120 
121 increases the reference count of kernel and labels. On destruction the
122 reference counter is decreased and the object is freed if the counter is <= 0.
123 
124 It is therefore your duty to prevent objects from destruction if you keep a
125 handle to them globally *which you still intend to use later*. In the example
126 above accessing labels after the call to SG_UNREF(svm) will cause a
127 segmentation fault as the Label object was already destroyed in the SVM
128 destructor. You can do this by SG_REF(obj). To decrement the reference count of
129 an object, call SG_UNREF(obj) which will also automagically destroy it if the
130 counter is <= 0 and set obj=NULL only in this case.
131 
132 
133 Generally, all shogun C++ Objects are prefixed with C, e.g. CSVM and derived from
134 CSGObject. Since variables in the upper class hierarchy, need to be initialized
135 upon construction of the object, the constructor of base class needs to be
136 called in the constructor, e.g. CSVM calls CKernelMachine, CKernelMachine calls
137 CClassifier which finally calls CSGObject.
138 
139 For example if you implement your own SVM called MySVM you would in the
140 constructor do
141 
142  class MySVM : public CSVM
143  {
144  MySVM( ) : CSVM()
145  {
146  ...
147  }
148  };
149 
150 In case you got your object working we will happily integrate it into shogun
151 provided you follow a number of basic coding conventions detailed below (see
152 FORMATTING for formatting instructions, MACROS on how to use and name macros,
153 TYPES on which types to use, FUNCTIONS on how functions should look like and
154 NAMING CONVENTIONS for the naming scheme.
155 
156 CODING CONVENTIONS:
157 -------------------
158 
159 *FORMATTING:*
160 
161 - indenting uses stroustrup style with tabsize 4, i.e. for emacs use in your
162  ~/.emacs
163 
164  (add-hook 'c-mode-common-hook
165  (lambda ()
166  (show-paren-mode 1)
167  (setq indent-tabs-mode t)
168  (c-set-style "stroustrup")
169  (setq tab-width 4)))
170 
171  for vim in ~/.vimrc
172 
173  set cindent " C style indenting
174  set ts=4 " tabstop
175  set sw=4 " shiftwidth
176 
177 - for newlines use LF only; avoid CRLF and CR. Git can be configured to convert
178  all newlines to LF as source files are commited to the repo by:
179 
180  git config --global core.autocrlf input
181 
182  (for more information consult http://help.github.com/line-endings/)
183 
184 - avoid trailing whitespace (spaces & tabs) at end of lines and never use spaces
185  for indentation; only ever use tabs for indentations.
186 
187  for emacs:
188 
189  (add-hook 'before-save-hook 'delete-trailing-whitespace)
190 
191  for vim in ~/.vimrc (implemented as an autocmd, use wisely):
192 
193  autocmd BufWritePre * :%s/\s\+$//e
194 
195 - semicolons and commas ;, should be placed directly after a variable/statement
196 
197  x+=1;
198  set_cache_size(0);
199 
200  for (uint32_t i=0; i<10; i++)
201  ...
202 
203 - brackets () and (greater/lower) equal sign ><= should should not contain
204  unecessary spaces, e.g:
205 
206  int32_t a=1;
207  int32_t b=kernel->compute();
208 
209  if (a==1)
210  {
211  }
212 
213  exceptions are logical subunits
214 
215  if ( (a==1) && (b==1) )
216  {
217  }
218 
219 - avoid the use of inline functions where possible (little to zero performance
220  impact). nowadays compilers automagically inline code when beneficial
221  and within the same linking process
222 
223 - breaking long lines and strings
224  limit yourselves to 80 columns
225 
226  for (int32_t vec=params->start; vec<params->end &&
227  !CSignal::cancel_computations(); vec++)
228  {
229  //foo
230  }
231 
232  however exceptions are OK if readability is increased (as in function
233  definitions)
234 
235 - don't put multiple assignments on a single line
236 
237 - functions look like
238 
239  int32_t* fun(int32_t* foo)
240  {
241  return foo;
242  }
243 
244  and are separated by a newline, e.g:
245 
246  int32_t* fun1(int32_t* foo1)
247  {
248  return foo;
249  }
250 
251  int32_t* fun2(int32_t* foo2)
252  {
253  return foo2;
254  }
255 
256 - same for if () else clauses, while/for loops
257 
258  if (foo)
259  do_stuff();
260 
261  if (foo)
262  {
263  do_stuff();
264  do_more();
265  }
266 
267 - one empty line between { } block, e.g.
268 
269  for (int32_t i=0; i<17; i++)
270  {
271  // sth
272  }
273 
274  x=1;
275 
276 *MACROS & IFDEFS:*
277 
278 - use macros sparingly
279 - avoid defining constants using macros (bye bye typechecking), use
280 
281  const int32_t FOO=5;
282 
283  or enums (when defining several realted constants) instead
284 
285 - use ifdefs sparingly (really limit yourself to the ones necessary) as their
286  extreme usage makes the code completely unreadable. to achieve that it may be
287  necessary to wrap a function of (e.g. for
288  pthread_create()/CreateThread()/thread_create() a wrapper function to create a
289  thread and inside of it the ifdefs to do it the solaris/win32/posix way)
290 - if you need to use ifdefs always comment the corresponding #else / #endif
291  in the following way:
292 
293 
294  #ifdef HAVE_LAPACK
295  ...
296  #else //HAVE_LAPACK
297  ...
298  #endif //HAVE_LAPACK
299 
300 *TYPES:*
301 
302 - types (use only these!):
303 
304  char (8bit char(maybe signed or unsigned))
305  uint8_t (8bit unsigned char)
306  uint16_t (16bit unsigned short)
307  uint32_t (32bit unsinged int)
308  int32_t (32bit int)
309  int64_t (64bit int)
310  float32_t (32bit float)
311  float64_t (64bit float)
312  floatmax_t (96bit or 128bit float depending on arch)
313 
314  exceptions: file IO / matlab interface
315 
316 - classes must be (directly or indirectly) derived from CSGObject
317 
318 - don't use fprintf/printf, but SG_DEBUG/SG_INFO/SG_WARNING/SG_ERROR/SG_PRINT
319  (if in a from CSGObject derived object) or the static SG_SDEBUG/... functions
320  elsewise
321 
322 *FUNCTIONS:*
323 
324 - Functions should be short and sweet, and do just one thing. They should fit
325  on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all
326  know), and do one thing and do that well.
327 - Another measure of the function is the number of local variables. They
328  shouldn't exceed 5-10, or you're doing something wrong. Re-think the
329  function, and split it into smaller pieces. A human brain can
330  generally easily keep track of about 7 different things, anything more
331  and it gets confused. You know you're brilliant, but maybe you'd like
332  to understand what you did 2 weeks from now.
333 
334 *GETTING / SETTING OBJECTS*
335 
336 If a class stores a pointer to an object it should call SG_REF(obj) to increase
337 the objects reference count and SG_UNREF(obj) on class desctruction (which will
338 decrease the objects reference count and call the objects destructor if
339 ref_count()==0. Note that the caller (from within C++) of any get_* function
340 returning an object should also call SG_UNREF(obj) when done with the object.
341 This makes the swig wrapped interfaces automagically take care of object
342 destruction.
343 
344 If a class function returns a new object this has to be stated in the
345 corresponding swig .i file for cleanup to work, e.g. if apply() returns a new
346 CLabels then the .i file should contain `%newobject CClassifier::apply();`
347 
348 *NAMING CONVENTIONS:*
349 
350 - naming variables:
351  - in classes are member variables are named like m_feature_vector (to avoid
352  shadowing and the often hard to find bugs shadowing causes)
353  - parameters (in functions) shall be named e.g. feature_vector
354  - don't use meaningless variable names, it is however fine to use short names
355  like i,j,k etc in loops
356  - class names start with 'C', each syllable/subword starts with a capital letter,
357  e.g. CStringFeatures
358 
359 - constants/defined objects are UPPERCASE, i.e. REALVALUED
360 
361 - function are named like get_feature_vector() and should be limited to as few arguments as
362  possible (no monster functions with > 5 arguments please)
363 
364 - objects which can deal with features of type DREAL and class SIMPLE don't need
365  to contain Real/Dense in class name
366 
367 - others are required to clarify class/type they can handle, e.g.
368  CSparseByteLinearKernel, CSparseGaussianKernel
369 
370 
371 - variable and function names are all lowercase (except for class Con/Destructors)
372  syllables/subwords are separated by '_', e.g. compute_kernel_value(), my_local_variable
373 
374 - class member variables all start with m_, e.g. m_member (this is to avoid shadowing)
375 
376 - features and preprocessors are prefixed with featureclass (e.g. Dense/Sparse) followed by featuretype (Real/Byte/...)
377 
378 *VERSIONING SCHEME:*
379 
380 The git repo for the project is hosted on GitHub at
381 https://github.com/shogun-toolbox/shogun. To get started, create your own fork
382 and clone it ([howto](https://help.github.com/articles/fork-a-repo "GitHub help - Fork a repo")).
383 Remember to set the upstream remote to the main repo by:
384 
385  git remote add upstream git://github.com/shogun-toolbox/shogun.git
386 
387 Its recommended to create local branches, which are linked to branches from
388 your remote repository. This will make "push" and "pull" work as expected:
389 
390  git checkout --track origin/master
391  git checkout --track origin/develop
392 
393 Each time you want to develop new feature / fix a bug / etc consider creating
394 new branch using:
395 
396  git checkout -b new_feature_name
397 
398 While being on new_feature_name branch, develop your code, commit things and do
399 everything you want.
400 
401 Once your feature is ready (please consider larger commits that keep shogun in
402 compileable state), rebase your new_feature_name branch on upstream/develop
403 with:
404 
405  git fetch upstream
406  git checkout develop
407  git rebase upstream/develop
408  git checkout new_feature_name
409  git rebase develop
410 
411 Now you can push it to your origin repository:
412 
413  git push
414 
415 And finally send a pull request (PR) to the develop branch of the shogun
416 repository in github.
417 
418 
419 - Why rebasing?
420 
421  What rebasing does is, in short, "Forward-port local commits to the updated
422  upstream head". A longer and more detailed illustration with nice figures
423  can be found at http://book.git-scm.com/4_rebasing.html. So rebasing (instead
424  of merging) makes the main "commit-thread" of the repo a simple series.
425 
426  Rebasing before issuing a pull request also enable us to find and fix any
427  potential conflicts early at the developer side (instead of at the one who
428  merges your pull request).
429 
430 - Multiple pull requests
431 
432  You can have multiple pull requests by creating multiple branches. Github
433  only tracks the branch names you used for identify the pull request. So when
434  you push new commits to your remote branch at github, the pull request will
435  "update" accordingly.
436 
437 - Non-fast-forward error
438 
439  This error happens when:
440 
441  1. `git checkout -b my-branch`
442  2. ... do something ...
443  3. ... rebasing ...
444  4. `git push origin my-branch`
445  5. ... do more thing ...
446  6. ... rebasing ...
447  7. `git push origin my-branch`
448 
449  then git will complain about non-fast-forward error and not pushing into the remote
450  my-branch branch. This is because the first push has already created the my-branch
451  branch in origin. Later when you run rebasing, which is a destructive operation for
452  the local history. Since the local history is no longer the same as those in the remote
453  branch, pushing is not allowed.
454 
455  Solution for this situation is to delete your remote branch by
456 
457  git push origin :my-branch
458 
459  and push again by
460 
461  git push origin my-branch
462 
463  note deleting your remote branch will not delete your pull request associated with that
464  branch. And as long as you push your branch there again, your pull request will be OK.
465 
466 - Unit testing/Pre-commit hook
467  As shogun-toolbox is getting bigger and bigger code-reviews of pull requests are getting
468  harder and harder. In order to avoid breaking the functionality of the existing code, we
469  highly encourage contributors of shogun to use the supplied unit testing, that is based
470  on Google C++ Mock Framework.
471 
472  In order to be able to use the unit testing framework one will need to have
473  Google C++ Mock Framework installed on your machine. The gmock version is
474  1.7.0 and the gtest version is 1.6.0 (or it will have some errors).
475 
476  - [Google Mock](https://code.google.com/p/googlemock/)
477  - [Google Test](https://code.google.com/p/googletest/)
478 
479  Then use cmake/ccmake with the ENABLE_TESTING switching on.
480 
481  For example:
482 
483  cmake -DENABLE_TESTING=on ..
484 
485  Once it's detected if you add new classes to the code please define some basic
486  unit tests for them under ./tests/unit (see some of the examples under that directory).
487  As one can see the naming convention for files that contains the unit tests are:
488  \<classname\>_unittest.cc
489 
490  Before committing or sending a pull request please run 'make unit-tests' under root
491  directory in order to check that nothing has been broken by the modifications and
492  the library is still acting as it's intended.
493 
494  One possible way to do this automatically is to add into your pre-commit hook the
495  following code snippet (.git/hook/pre-commit):
496 
497  #!/bin/sh
498 
499  # run unit testing for basic checks
500  # and only let commiting if the unit testing runs successfully
501  make unit-tests
502 
503  This way before each commit the unit testing will run automatically and if it
504  fails it won't let you commit until you don't fix the problem (or remove the
505  pre-commit script :P
506 
507  Note that the script should be executable, i.e.
508 
509  chmod +x .git/hook/pre-commit
510 
511  You can also test all the examples in shogun/exapmles to check whether your configuration and environment is totally okay. Please note that some of the examples are dependent on data sets, which should be downloaded beforehand, and so that you can pass all the tests of those examples. Downloading data can be easily done by calling a git command (please refer to [README_data.md](https://github.com/shogun-toolbox/shogun/blob/develop/doc/md/README_data.md)). Afterwards, you can test the examples by:
512 
513  make test
514 
515 To make a release, adjust the [NEWS](NEWS) file properly, i.e. date, release version (like 3.0.0), adjust the soname if required (cf. [README_soname](README_soname.md)) and if a new data version is required add that too. If parameters have been seen changes increase the parameter version too.

SHOGUN 机器学习工具包 - 项目文档