1. IMPORTANT:
    We launched a new online community and this space is now closed. This community will be available as a read-only resources until further notice.
    JOIN US HERE

Cpu-primary or core

Discussion in 'REAKTOR' started by mpl, Nov 1, 2016.

  1. mpl

    mpl NI Product Owner

    Messages:
    496
    Hi
    heres the tale.As i was wanting to expand the usage of my launchpad mini,I built a monitor/controler in core(as much as possible,I'm learning to use core so I've been trying to use it at all times now)
    So i was well chuffed when i had succeeded, then i looked at the cpu meter...12% eek. I had use rgb objects for the graphics which also entailed setting up 3 sets of selectors etc.So i redid the graphics with just a meter and a graphic file of 5 colours.and stripped out all the rgb stuff...cpu...11%.
    I then loaded up an ensemble from the user lib that does simmilar stuff and its cpu was 0.5%, it was done using primary.
    So i rebuilt my ensemble using only primary and now the cpu is...0.6%
    Now i'll be the first to admit my core skills are v basic but i had tried to use integer settings where possible and + and multiply instead of - and divide .
    Now that is some difference 0.6% against 11%
    I've included the 2 ens incase anyone wants to look.Have i done the core so inefficiently or is just the way it is?
    I don't want to start a core v primary,but the only conclusion i can take is that in certain situations one is better for the job that the other and in this case primary wins hands down.
    looking forward to your thoughts on the matter
    mike
     

    Attached Files:

  2. colB

    colB NI Product Owner

    Messages:
    3,969
    Its not always clear cut which is going to be more efficient, core or primary, but usually Primary wins where either there is a readymade object that does the job (ie its c++ vs core), or where there is polyphony - core doesn't make use of SSE where as I think some of the primary modules do for optimising multiple polyphonic voices.

    An initial look at your example shows lots of indicidual core cells - this is not ideal because the core to primary and primary to core transitions are not free. Much better to have one core cell...
     
    Last edited: Nov 1, 2016
  3. colB

    colB NI Product Owner

    Messages:
    3,969
    A close look shows 72 core cells each with three identical inputs from the same primary sources, all with some outputs the are eventually merged to one... this is expensive and totally unnecessary. 216 primary to core transitions where there could be 3. 72 core to primary transitions where there could be 1. ouch

    There are some other details, but not so important. Switching over to a single monolithic core cell is your main objective :)

    This is a hurdle for folks trying to migrate from primary to core - you only start getting advantages when you do large sections of code in core. But that brings a bunch of other things to get to grips with - like multiplexing control events, and handling initialisation over the core/primary boundary...
     
  4. mpl

    mpl NI Product Owner

    Messages:
    496
    @colB
    Many thanks for having a look,it looks as though my core is very inefficient,it did work though. I've nearly finished the full primary version,I'll Finnish it and then look at converting the core version to a single cell,it'll be good for my learning and I'm curious as to the out come.
    Thanks again
    Mike
     
  5. mpl

    mpl NI Product Owner

    Messages:
    496
    its now got me thinking about some of my other ensembles that have "bits" of core in them.........not really enjoying the conclusion.
     
  6. colB

    colB NI Product Owner

    Messages:
    3,969
    Don't worry, it's not always a problem.

    I think the primary to core connection costs about the same as a multiplication. However, if you have an event in primary connected to 100 core inputs, that's 100 multiplies, where as if you send the event to a single core input, then divide it in core, there will not be an overhead, each target in core will use the same core source with no extra processing, so you'd save 99 multiplies worth of cpu.

    There were a few other things I spotted:

    You were using a z-1, but not explicitly clocking it, so it was using its internal audio rate clock, however the signal is was applies to was only changing as the result of a gate or a clear, so most of the work that z-1 is doing (memory reads and writes) is redundant. If you clocked its lower input with an event merge fed by the gate and clear inputs, it would be more efficient.... there were 72 of those as well.... Also a redundant compare... you have a flipflop where the output can only be 1 or zero, but there are two output routes, one testing for 0 and the other testing for 1.... you can just test for 1 and assume a false means 0... again 72 times...
    I didn't spend time analysing the actual process, so there may be simpler or faster ways to achieve the same result - most efficiency savings are achieved by using a better algorithm rather than optimising the existing one...
     
  7. salamanderanagram

    salamanderanagram NI Product Owner

    Messages:
    3,454
    i believe z^-1 always uses the audio clock now, whether you tell it to or not. the bottom input now is the initial value in R6.

    i think for event rate uses it has been replaced by the latch-1 or something like that.

    unless your core cell is meant to run at audio rate, it probably shouldn't use the z^-1.
     
  8. colB

    colB NI Product Owner

    Messages:
    3,969
    oh, here's another one that I'm not completely sure about, but I suspect will make a difference when multiplied by 72:
    core efficiency2.PNG
    Look inside a latch and you will see a read and a write. These can be fast, but if there are enough of them, maybe not so fast. Your version at the top fires the latch for times for every even that arrives at select. My version only fires the latch for the valid selection. It's possible that the core compiler can optimise this so that they both cost the same - I know there are optimisations that involve the latch pattern, but I wouldn't bet on it, and it's good practice to try and position routers so that redundant processing is minimised.
     
  9. colB

    colB NI Product Owner

    Messages:
    3,969
    yes, you are correct. Need to use an 'original and best' z-1 then ;) or 'latch-1'
     
  10. mpl

    mpl NI Product Owner

    Messages:
    496
    Thanks for the analysis, lots for me to think about.and try out. I can seen the logic in the new selectors new structure, I just need to get that "efficient" style fixed in the mind.
    There used to be 2 more of those selectors for the rgb outputs x 72 and when I stripped that out and replaced rgb with a meter+graphic file, it reduced cpu by 1.5%.
    I'm just integrating the primary version into my seq ens,I've got the return communication from the seq still to do,so all is in sync.
    Then it's on to all this
    I find learning with a purpose a lot better/easier than just learning,if that makes sense.
    Thanks again, I do appreciate it.
    Mike
     
  11. salamanderanagram

    salamanderanagram NI Product Owner

    Messages:
    3,454
    i didn't look at the ensemble, but if it's supposed to run at event rate, i'll bet the z^-1 is the only thing taking up any meaningful amount of CPU - just replace them. one big core cell would be nice, but if it's only responding to events, it probably doesn't matter that much.
     
  12. mpl

    mpl NI Product Owner

    Messages:
    496
    You've just won that bet.!! .I replaced all z^-1's with latch[-1] and colB's more efficient selector and the cpu has dropped to 0.7% hurrah.
    thank you
    mike