home › Forums › # Technical Support › Execution speed
- This topic has 9 replies, 2 voices, and was last updated 7 years, 8 months ago by
Juan Rada-Vilela (admin).
-
AuthorPosts
-
December 30, 2015 at 05:55 #2018
Unknown
MemberHi,
Is there a way to improve execution speed of the fuzzylite library ? I can’t see bottlenecks of this library because symbols are not loaded from PDB file in debug mode.
A simple solution is to decrease resolution of defuzzifier.
I have commented FL_DBG instructions in fuzzylite source code, is it the good way to disable debugging output ?
Is there a mean to avoid dynamical allocation in function modify of Consequent class to increase execution speed ?:
Activated* term = new Activated(_conclusions.at(i)->term, activationDegree, activation);December 30, 2015 at 06:39 #2019Juan Rada-Vilela (admin)
KeymasterHi,
thank you for your post. I am not familiar with loading symbols from PDB file in debug mode, but you can always recompile fuzzylite. If you do, please let me know if I can add something to the CMake configuration to allow this. Also, let me know of any bottlenecks you may find.
I think removing some
dynamic_cast<>
(or replacing them withstatic_cast<>
) can improve performance, not sure by how much, though. For example, if you are not going to utilise rule chaining, you could removedynamic_cast<OutputVariable*>
inAntecedent::activationDegree()
. Alternatively, you could add method toVariable
to determine whether it isInput
orOutput
, hence replacingdynamic_cast<>
s totype() == OutputVariable
and then usingstatic_cast
. In addition, you could get rid of otherdynamic_cast<>
inAntecedent::activationDegree()
by adding boolean method toProposition
. I will review these cases in the current version in progress, but would be very helpful to know if you find significant performance improvements in doing so.Also, could you please post an example of your controller? Maybe I could suggest something on its design.
Cheers.
December 31, 2015 at 10:17 #2020Unknown
MemberHi,
I have loaded symbols from pdb files in visual studio 2013 to do performance analysis with cpu sampling.
see below for most time consuming functions:Nom de la fonction Échantillons inclusifs Échantillons exclusifs % d'échantillons inclusifs % d'échantillons exclusifs std::_Lockit::_Lockit 7 216 7 216 3,07 3,07 operator delete 7 145 7 145 3,04 3,04 std::_Lockit::~_Lockit 7 121 7 121 3,03 3,03 operator new 5 456 5 456 2,32 2,32 fl::Antecedent::activationDegree 78 861 5 094 33,56 2,17 _RTC_CheckStackVars 3 684 3 684 1,57 1,57 _RTC_CheckEsp 3 654 3 654 1,56 1,56 std::_Iterator_base12::_Orphan_me 3 216 3 216 1,37 1,37 fl::Operation::isEq 7 520 3 131 3,20 1,33 __RTDynamicCast 3 105 3 105 1,32 1,32 fl::Accumulated::membership 19 031 2 979 8,10 1,27 free 2 970 2 970 1,26 1,26 fl::Ramp::membership 13 160 2 931 5,60 1,25 std::_Iterator_base12::_Adopt 11 232 2 875 4,78 1,22 _BitBlt@36 2 814 2 814 1,20 1,20 fl::Operation::isNaN<double> 2 700 2 700 1,15 1,15 fl::Activated::membership 12 797 2 531 5,45 1,08 std::_Iterator_base12::operator= 11 659 1 616 4,96 0,69 std::_Iterator_base12::~_Iterator_base12 11 895 1 460 5,06 0,62 _VEC_memset 1 226 1 226 0,52 0,52 std::_Iterator_base12::_Iterator_base12 12 937 1 199 5,51 0,51 std::_Container_base12::_Orphan_all 1 127 1 127 0,48 0,48 fl::Centroid::defuzzify 20 968 954 8,92 0,41 std::vector<fl::Activated *,std::allocator<fl::Activated *> >::size 827 827 0,35 0,35 std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > > 14 616 730 6,22 0,31 std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >::~_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > > 14 033 669 5,97 0,28 std::reverse_iterator<std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > > >::reverse_iterator<std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > > > 16 290 596 6,93 0,25 std::_Iterator012<std::random_access_iterator_tag,fl::Hedge *,int,fl::Hedge * const *,fl::Hedge * const &,std::_Iterator_base12>::_Iterator012<std::random_access_iterator_tag,fl::Hedge *,int,fl::Hedge * const *,fl::Hedge * const &,std::_Iterator_base12> 13 700 586 5,83 0,25 fl::Operation::isLt 2 929 574 1,25 0,24
I have deleted lines related to other libraries.
Dynamical allocation is very time consuming.
I have replaced dynamic_cast by static_cast for Variable but there is no significant improvement.In Antecedent::activationDegree function, bottlenecks are at this lines :
28.9%: return conjunction->compute( this->activationDegree(conjunction, disjunction, fuzzyOperator->left), this->activationDegree(conjunction, disjunction, fuzzyOperator->right)); 10.4%: if (fuzzyOperator->name == Rule::andKeyword()) { 6.9%: for (std::vector<Hedge*>::const_reverse_iterator rit = proposition->hedges.rbegin(); 8.9%: rit != proposition->hedges.rend(); ++rit) { 2.1%: result = proposition->term->membership(inputVariable->getInputValue());
FFLL library is basic but fast, there are some explanations here :
I need a fast library because I optimize a parameter so there are a lot of call to engine::process function.
see below my controller:engine = new fl::Engine("fuzzy_engine"); osc = new fl::InputVariable("OSC", -300, 300); osc->setInputValue(0); osc->setEnabled(true); osc->addTerm(new fl::Ramp("UP", -134, -300)); osc->addTerm(new fl::Ramp("DOWN", 134, 300)); osc->addTerm(new fl::Ramp("EXIT_UP", -100, -300)); osc->addTerm(new fl::Ramp("EXIT_DOWN", 100, 300)); engine->addInputVariable(osc); dosc = new fl::InputVariable(); dosc->setName("DOSC"); dosc->setRange(-300, 300); dosc->setEnabled(true); dosc->addTerm(new fl::Ramp("DOWN", 0, -300)); dosc->addTerm(new fl::Ramp("UP", 0, 300)); engine->addInputVariable(dosc ); dvar2 = new fl::InputVariable("DVAR2", -100, 100); dvar2->setInputValue(0); dvar2->addTerm(new fl::Ramp("DOWN", -4.3, -100)); dvar2->addTerm(new fl::Ramp("UP", 4.3, 100)); engine->addInputVariable(dvar2); dvar3 = new fl::InputVariable(); dvar3->setName("DVAR3"); dvar3->setRange(-5, 5); dvar3->setEnabled(true); dvar3->addTerm(new fl::Ramp("DOWN", -0.8, -5)); dvar3->addTerm(new fl::Ramp("UP", 0.8, 5)); engine->addInputVariable(dvar3 ); dvar12 = new fl::InputVariable(); dvar12->setName("DVAR12"); dvar12->setRange(-10, 10); dvar12->setEnabled(true); dvar12->addTerm(new fl::Ramp("DOWN", 0, -10)); dvar12->addTerm(new fl::Ramp("UP", 0, 10)); engine->addInputVariable(dvar12); dvar1a = new fl::InputVariable(); dvar1a->setName("DVAR1A"); dvar1a->setRange(-30, 30); dvar1a->setEnabled(true); dvar1a->addTerm(new fl::Ramp("DOWN", -2, -10)); dvar1a->addTerm(new fl::Ramp("UP", 2, 10)); dvar1a->addTerm(new fl::Ramp("EXIT_DOWN", -17, -30)); dvar1a->addTerm(new fl::Ramp("EXIT_UP", 17, 30)); engine->addInputVariable(dvar1a ); dvarb1 = new fl::InputVariable(); dvarb1->setName("DVARB1"); dvarb1->setRange(-30, 30); dvarb1->setEnabled(true); dvarb1->addTerm(new fl::Ramp("UP", -2, -10)); dvarb1->addTerm(new fl::Ramp("DOWN", 2, 10)); dvarb1->addTerm(new fl::Ramp("EXIT_UP", -17, -30)); dvarb1->addTerm(new fl::Ramp("EXIT_DOWN", 17, 30)); engine->addInputVariable(dvarb1 ); entry = new fl::OutputVariable("ENTRY", -1, 1); entry->setDefaultValue(0.0); entry->setEnabled(true); entry->fuzzyOutput()->setAccumulation(new fl::Maximum); entry->setDefuzzifier(new fl::Centroid(200)); entry->addTerm(new fl::Ramp("ENTRY1", 0, -1)); entry->addTerm(new fl::Ramp("ENTRY2", 0, 1)); engine->addOutputVariable(entry); exit = new fl::OutputVariable("EXIT", -1, 1); exit->setDefaultValue(0.0); exit->setEnabled(true); exit->fuzzyOutput()->setAccumulation(new fl::Maximum); exit->setDefuzzifier(new fl::Centroid(200)); exit->addTerm(new fl::Ramp("EXIT2", 0, -1)); exit->addTerm(new fl::Ramp("EXIT1", 0, 1)); engine->addOutputVariable(exit); ruleblock = new fl::RuleBlock("rules"); ruleblock->setEnabled(true); ruleblock->setConjunction(new fl::Minimum()); ruleblock->setDisjunction(new fl::Maximum); ruleblock->setActivation(new fl::Minimum); ruleblock->addRule(fl::Rule::parse("if OSC is DOWN and DOSC is DOWN and DVAR2 is DOWN and DVAR3 is DOWN and DVARB1 is DOWN then ENTRY is ENTRY2", engine)); ruleblock->addRule(fl::Rule::parse("if OSC is UP and DOSC is UP and DVAR2 is UP and DVAR3 is UP and DVAR1A is UP then ENTRY is ENTRY1", engine)); ruleblock->addRule(fl::Rule::parse("if OSC is EXIT_DOWN and DOSC is DOWN and DVARB1 is EXIT_DOWN then EXIT is EXIT1", engine)); ruleblock->addRule(fl::Rule::parse("if OSC is EXIT_UP and DOSC is UP and DVAR1A is EXIT_UP then EXIT is EXIT2", engine)); engine->addRuleBlock(ruleblock);
Thanks a lot for your help!
Cheers
December 31, 2015 at 11:08 #2021Juan Rada-Vilela (admin)
KeymasterHi,
thank you for your performance check. I will check this in detail for the next version. However, I can see the problem is in
Antecedent::activationDegree()
, where there are a fewdynamic_cast<>
that could be removed by adding the necessary methods to check the type ofExpression
. I am not sure whatdynamic_cast
you changed forstatic_cast
, and this should be done very carefully. If you are interested in performing some changes that I have in mind, and then measuring the performance, I could provide more details. Let me know. The changes I am thinking of involve removing thedynamic_cast
fromAntecedent::activationDegree()
. They should improve the performance.Cheers.
December 31, 2015 at 15:09 #2022Juan Rada-Vilela (admin)
KeymasterHi,
I have removed the
dynamic_cast
I suggested, and got an average performance improvement of 7%. I further followed your suggestions, and I changed theAccumulated::vector<Activated*>
toAccumulated::vector<Activated>
, achieving a further 10% of performance. Overall, the changes I am testing have improved the average performance by 15%. I will continue reviewing the performance and I expect to push the changes later today to themaster
branch (v6.0).Thanks for your help.
Cheers.
January 1, 2016 at 02:53 #2024Unknown
MemberHi,
I have only replaced dynamic_cast by static_cast for Variable class. Yes, I can perform some change to do performance analysis
ThanksCheers
January 1, 2016 at 04:46 #2025Juan Rada-Vilela (admin)
KeymasterHi,
please check the following performance improvements: https://github.com/fuzzylite/fuzzylite/commit/2661878364685e17a2fc286e41b6d647066722b2.
However, have in mind that I need to undo many changes performed in commit https://github.com/fuzzylite/fuzzylite/commit/33119032d12e3337cbe8efa984086ce7379f1081, where I chose some methods over access to properties, which have had a significant impact in performance. I am still working on this.
Also, I am making these changes available for version 6.0, not for 5.x.
January 1, 2016 at 05:46 #2026Juan Rada-Vilela (admin)
KeymasterOh, and I forgot to mention: a quick way to significantly improve performance would be to recompile fuzzylite with the definition
-DFL_USE_FLOAT=ON
, as it will convert everyscalar
value fromdouble
tofloat
. I will measure its performance later.January 4, 2016 at 00:00 #2028Unknown
MemberHi
Thanks for your help. The changes have improved the performance by 6% and 11% with -DFL_USE_FLOAT=ON.
see below for the new performance report:Nom de la fonction Échantillons inclusifs Échantillons exclusifs % d'échantillons inclusifs % d'échantillons exclusifs std::_Lockit::_Lockit 164 400 164 400 3,86 3,86 std::_Lockit::~_Lockit 154 973 154 973 3,64 3,64 fl::Antecedent::activationDegree 1 827 003 142 519 42,92 3,35 std::_Iterator_base12::_Adopt 258 770 71 718 6,08 1,68 std::_Iterator_base12::_Orphan_me 69 654 69 654 1,64 1,64 fl::Aggregated::membership 370 004 57 360 8,69 1,35 fl::Operation::isEq 199 016 51 960 4,68 1,22 fl::Operation::isNaN<float> 47 240 47 240 1,11 1,11 fl::Ramp::membership 292 940 43 361 6,88 1,02 fl::Activated::membership 237 920 42 495 5,59 1,00 std::_Iterator_base12::operator= 251 390 36 560 5,91 0,86 std::vector<fl::Activated,std::allocator<fl::Activated> >::size 30 772 30 772 0,72 0,72 std::_Container_base12::_Orphan_all 29 263 29 263 0,69 0,69 std::_Iterator_base12::~_Iterator_base12 260 649 29 018 6,12 0,68 std::_Iterator_base12::_Iterator_base12 280 520 26 367 6,59 0,62 std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > > 318 347 16 826 7,48 0,40
In Antecedent::activationDegree function, bottlenecks are at this lines :
`37.6%: return conjunction->compute(
this->activationDegree(conjunction, disjunction, fuzzyOperator->left),
this->activationDegree(conjunction, disjunction, fuzzyOperator->right));14%: if (fuzzyOperator->name == Rule::andKeyword()) {
8.6%: for (std::vector<Hedge*>::const_reverse_iterator rit = proposition->hedges.rbegin();
11.3%: rit != proposition->hedges.rend(); ++rit) {3.3%: result = proposition->term->membership(inputVariable->getInputValue());`
Cheers
January 4, 2016 at 09:27 #2029Juan Rada-Vilela (admin)
KeymasterHi,
thanks for your feedback. The only way to further improve performance in
activationDegree
is as follows.(1) Change
if (fuzzyOperator->name == Rule::andKeyword()) {
forif (fuzzyOperator->name == "and") {
. After the performance studies I have performed, I was surprised to see a performance impact when using methods instead of class properties. For example, see this commit, where I changed the methodgetHeight()
for a call toTerm::_height
and improved performance. However, I will not change theRule::andKeyword()
method over performance given the flexibility it provides to rename theand
keyword.(2) if you are not using Hedges, you could enclose the for-loop statement `for (std::vector
::const_reverse_iterator rit = proposition->hedges.rbegin();
11.3%: rit != proposition->hedges.rend(); ++rit) {into an
if` statement. This could improve performance a bit more. See latest commit 1759d9159d9a046f3eff9855399f5da2ca5d0ff2(3) if you check the commit I mentioned earlier (namely 053052a850a81971e207d84ef88573a1c9543aea) you could improve performance by declaring
FL_IS_NAN(x)
instead of calling the methodOp::isNaN(x)
. This will slightly improve performance, too.Lastly, I am not sure how you are measuring the performance of your controller, but you could also check the code in
Console::benchmarkExamples
. Basically, measure the average of ten runs exporting your controller withFldExporter
for a resolution specified based on the number of input variables and amount of time you are willing to wait. Compiling withFL_CPP11=ON
in v5.1 (from branchrelease
), from console I just runfuzzylite benchmarks path/to/examples 10
and it will measure the performance over ten runs of almost every example included in theexamples/original
folder.Thanks again for your feedback, and let me know if you find other ways to further improve performance.
Cheers.
-
AuthorPosts
- You must be logged in to reply to this topic.