More Blocks >More CPU +XR Gate finally increases CPU hungriness

Discussion in 'REAKTOR' started by domomo, Feb 22, 2019.

  1. domomo

    domomo NI Product Owner

    Messages:
    417
    it's sometimes hard to be an enthusiastic ..

    what i did:
    i built a patch out of blocks with bypass option for all of them. (i am obliged to do such things due to my avx and not avx2 cpu)

    what i discovered:
    XR Gate bypass option is a trade off:
    due to the exponential increase of cpu usage of several modest blocks in patch i figured out that at busy cpu levels XR Gate (just the simple macro) increases cpu even more while actif.

    is there any chance that either one of those two behaviours wont continue with future reaktor updates?
    Philipp D @ NI

    i bypassed everything and i am literally at 90% with a busy but really easy patch (1 and a half screen huge - let's say laptop screen, i am not on a laptop). and the fun working with reaktor is really floating away

    and as much as i am an enthusiast, i cant do with it what i want!
    i am limited with this strange more-blocks-do-increase-exponentially-your-cpu "bug"
    and i dont feel creative, because all coding is now about it. and even the coding / XR Gates implementation gives me somehow much worse results

    thanks so much
     
  2. KoaN

    KoaN NI Product Owner

    Messages:
    145
    I have noticed a few times with very big structures that using routers to switch things off makes you save cpu when things are off "obviously" but if you turn everything on it takes a lot more cpu compared with all on and no routers,switches.
    I always had the impression corecell switches "even when properly done" are not as effective as primary switches.

    I did so many tests at times to try and save cpu...some things helped that i have no idea why,for example i always thought putting everything in the same corecell was better...and then i tried separating the filter and some FX from the Oscillators in a new corecell and suddenly gained 5-7%!
    I feel when the structure is very complex it becomes a bit unpredictable.

    I feel you...."my cpu is about the same as you"....and i feel my last project is going to take too much cpu....i am preparing mentally to maybe have to upgrade my motherboard and cpu...
     
    • Like Like x 1
  3. Laureano Lopez

    Laureano Lopez NI Product Owner

    Messages:
    133
    I avoid core routers like the plague. I mean you can't _avoid_ them but I do my very best to minimize them. I have to want a new function really hard if it implies a new router in the middle of a complex structure, especially if I'll run many instances at once. Routers are more or less inoffensive when just a few things happen before the branches merge without getting involved with anything else, but as soon as there's a bit of cross-crossing, you're fucked. It seems to me there's no particular optimization for things like an XR router bypassing a whole cell -that would be nice, but I don't expect it.
     
  4. Vadim @ NI

    Vadim @ NI NI Team NI Team

    Messages:
    246
    The routers indeed can make things inefficient if the branch merging is done in a "spaghetti" fashion, because in machine code each such merge creates another "hidden router". This also makes it heavier for the compiler to look for possible optimizations, sometimes the compiler even getting stuck (a known problem discussed elsewhere on this forum). Another potential issue has to do with simply using "too many routers", even if properly merged. Thus, if there is a way to express the same algorithm without a router and have a comparable CPU cost, it's preferable to do so. Note that the very same problem (including "exponential increase") exists in hand-written C++ code and is not specific to Reaktor Core, although hand-written code has more options of mitigating it.

    A single XR router bypassing the entire core cell shouldn't be a problem though and should be a very efficient optimization of the structure (not as efficient as a primary switch, but close). There are neither many routers nor "spaghetti routing" involved (provided the structure is clean otherwise). So if just a global XR router is causing trouble, there might be yet some other factor at play.
     
    • Like Like x 1
  5. Laureano Lopez

    Laureano Lopez NI Product Owner

    Messages:
    133
    I don't have an example right now, but I remember situations where I had a more or less complex structure involving some spaghetti, where all SR signals where driven by a single audio input, an placing a router right after the input as a whole-cell switch increased CPU usage not a lot but noticeably, say 2-3%. In going back and forth with this stuff I've got the impression that a simple rerouting before a complex part can make it be calculated in a different way, and something that's better in one place is worse in another. For example, I've cut down good amounts of usage avoiding duplicate routers between channels by routing in one and latching to the other, or routing SR.C and latching both, but sometimes the opposite happens, even if the rerouted part has no spaghetti per se and connects to the actually complex part by parameters (I had one such case yesterday).
     
  6. Vadim @ NI

    Vadim @ NI NI Team NI Team

    Messages:
    246
    A likely reason for a CPU increase could be that control signals are not fully latched everywhere (usually that'd be done via modulation macros) upon entering the audio path (which is then clocked by the gated SR). In this case rather than disabling the entire audio path we might be disabling only parts of it (as some other parts still receive triggering from control signals) and that could create extra hidden routing. I'm not saying this is necessarily happening in your case, but that could be the simplest explanation.
     
  7. Laureano Lopez

    Laureano Lopez NI Product Owner

    Messages:
    133
    Point, that could be the case for the whole-cell-switch cases (I wasn't so clocking-conscious back then) :D
     
  8. colB

    colB NI Product Owner

    Messages:
    2,843
    Working with computers, coding etc., has always been about working within the limits of the platform, and always will be. If that's going to stop you being creative, then that's a major problem for you... try to see limits as a creative opportunity.

    Remember, there will always be limits!
     
    • Like Like x 1
    • Funny Funny x 1
  9. Vadim @ NI

    Vadim @ NI NI Team NI Team

    Messages:
    246
    "The true creativity lies at the balance point between your self-expression and the properties and the limits of the media and the tools you're working with".
    [Guru mode off]
    :D :D :D
     
    • Like Like x 2
  10. Paule

    Paule NI Product Owner

    Messages:
    4,761
    With older sys and ism also
     
    • Like Like x 1
  11. domomo

    domomo NI Product Owner

    Messages:
    417
    alright so..
    glad to see Vadim here, too. So, I won't miss that chance ;)

    step by step:
    ty for pointing this out! I figured out exactly the same. Is there any official way reported by NI, how to do it right?
    And am I well advised to entirely translate a patch I like (patch out of blocks) into a single ENS?

    ty for feeling me and someone "feels" the same. makes me feeling less alone.
    however, if you discover cpu limits with a 44.1 rate as I do, you probably would use Reaktor in 2x or 4x of that SR with your new motherboard and cpu and might find yourself right back at the beginning :)

    :thumbsup:
    why exactly you dont expect it?

    I dont know much about C++, but I am glad you are mentioning the "exponential increase": Can we say for now that more blocks in a patch exponentially increase CPU hungriness? And if so, does this has to do with the core cell itself only and not the primary gui design? or do multiple core cells behave in exponential increase? or, as a summary of my questions so far: is there no increase at all with a single core cell/single ENS?

    :thumbsup:why is this actually happening?
    can we make use out of a simple practical example:
    let's take any factory block. No, to make it corporate, let's take bento box osc:
    If I add XR Gate into bento box osc (which runs on my system just itself and no other exponentional increasing blocks at 0.6% cpu usage) i run into a cpu usage of 0.8% with XR Gate ON. XR Gate OFF 0.4%

    This is what I call the trade-off. I make it actually worse, first. But then I gain by bypassing it actually my "trade off"
    What I discovered while analyzing how the latch reacts to XR Gate: it wasnt always the case that the latch decreased cpu usage. (!)
    So I decided not to do it everywhere, but especially on audio inputs.
    I dont understand what you mean by "usualy that'd be done via modulation macros".

    As for the example of bento box osc, how would you do this correctly? is there a practical example possible?
    Beside that XR Gate increases CPU just by adding it, is it actually possible to entirely (or really quiet well) shut down the block being calculated? and how

    Did you try it out? Id be glad to know how it worked out for you

    Well, now I am really curious! Because indeed the primary switch made it (cpu calculation) pretty much "dead".
    What I dont get neither is: wasnt the whole block thing in 2016 about to make users aware of core more than primary and now the primary switch works better than anything available in core?

    I agree totally to your philosophy but to a certain point: now, the amount of time I spend into making sound is far less than trying to figure out how to make my patch cpu efficient with a physical 6 core @ 4.5Ghz. And this is not anymore in balance since i am not a coder, first
    I know and discovered that limitations can be a great source of creativity. But not if it is about spending your time how to load your last and pretty cool snapshot stable and without overload. I dont think tool limitations are that inspiring in the end unless you really are a coder.
     
  12. colB

    colB NI Product Owner

    Messages:
    2,843
    You have to learn when to say "OK, that's not going to work, leave it and try something different!"

    That's life!

    I have various things over the years that have been left 'on the shelf' because there was no way within the limits of Reaktor, and I really couldn't be bothered to put in the massive extra effort required to implement it in c++ where it still might not have worked. C'est la vie. You win some you lose some.

    Shelve it. In a few years, tech will have moved on and you can revisit the idea. Assuming you're still interested - I've found that that is usually not the case for me ;)
     
    • Like Like x 2
  13. mosaic_

    mosaic_ NI Product Owner

    Messages:
    472
    You could also turn to the single-instrument, "monolithic" approach. You can make optimizations you can't with Blocks, like cutting smoothers or deciding that some modulations don't need to be audio rate. On top of that, polyphony is easier to realize, and you can manage the extra CPU hit by using the gate and/or amplitude envelope to XR Gate silent voices. And you can meet Blocks halfway in terms of flexibility if you go with a semimodular architecture.
     
    • Like Like x 2
  14. domomo

    domomo NI Product Owner

    Messages:
    417
    do we know each other ? i dont remember having met yet
     
    • Funny Funny x 1
  15. domomo

    domomo NI Product Owner

    Messages:
    417
    that's SAD!
     
  16. domomo

    domomo NI Product Owner

    Messages:
    417
    i'll ALWAYS agree with that!
     
  17. domomo

    domomo NI Product Owner

    Messages:
    417
    Wow
     
  18. domomo

    domomo NI Product Owner

    Messages:
    417
    I love dreamin about everything what you say! I just really dont understand a bit of it. Can you make it a step by step please? each of your phrase desires a step by step for me. but all phrases are promessing!!
     
  19. mosaic_

    mosaic_ NI Product Owner

    Messages:
    472
    Sure.

    You could also turn to the single-instrument, "monolithic" approach.
    By this I'm just referring to what you mentioned earlier in the thread, the idea of consolidating the functionality of several Blocks into an Instrument with (give or take) one main Core cell.

    You can make optimizations you can't with Blocks, like cutting smoothers or deciding that some modulations don't need to be audio rate.
    The requirements of the Blocks standard seem to be aimed at ensuring consistent sound quality no matter how you choose to connect the Blocks. However, this means that if you stick with the standard, you can't make certain compromises that can massively improve performance in big projects. The CPU cost of all those control smoothers, for example, is not negligible. So, in the consolidation process, you might decide it's worth it to remove some of those smoothers if the impact on sound quality is not too great. The same goes for modulation signals, which in the Blocks paradigm are consistently generated and processed at audio rate. You might decide it's more practical to run modulators like envelopes and LFOs at a lower control rate and interpolate them - only where necessary - with smoothers. (I keep bringing up smoothers... in my experience, they really are CPU drainers. So if you have multiple modulators running at a lower control rate, think about how you can minimize the number of smoothers you need...)

    On top of that, polyphony is easier to realize, and you can manage the extra CPU hit by using the gate and/or amplitude envelope to XR Gate silent voices.
    If you want polyphony, it's much easier to set up in a single Instrument than with Blocks. No need to duplicate the structure. Plus, inside the Core cell, if the amplitude envelope falls beneath a certain threshold, you can use that to stop the vast majority of audio and control processing (since you can't hear it :p). You can compare the amplitude to some small number and use the result to drive your XR Gates. Of course, this assumes a fairly standard synth architecture where the idea of a per-voice amplitude envelope is relevant, which may or may not not align with your goals.

    And you can meet Blocks halfway in terms of flexibility if you go with a semimodular architecture.
    If you have a synth like Razor, where you have a limited number of "slots" where you can swap out pseudo-modules with different characteristics, you can keep a lot of the sonic range of a modular synth while never processing more than a few modules.
     
    • Like Like x 1
    • Informative Informative x 1
  20. colB

    colB NI Product Owner

    Messages:
    2,843
    There does seem to be some Reaktor specific stuff going on as well though - The fact that multiple instances of identical Blocks use surprisingly large amounts of cpu, and that this can be partly resolved by adding dummy outputs/inputs to the internal core cell to create difference between the instances?

    In c++, only one binary instance would be needed, and the code would just be called multiple times with different input/state data...
    Also with c++, code can be so much more compact with normal use of functions that cache misses thrashing etc. will require much more advanced/larger code before the same problems will arise (I assume?)… unless the core compiler uses some sort of pattern matching to find multiple copies of identical modules and generates a single function that all those instances refer to? seems unlikely?