Welcome to the #dominoforever Product Ideas Lab! The place where you can submit product ideas and enhancement request. We encourage you to participate by voting on, commenting on, and creating new ideas. All new ideas will be evaluated jointly by the IBM & HCL Product Management & Engineering teams, and the next steps will be communicated. While not all submitted ideas will be executed upon, community feedback will play a key role in influencing which ideas are and when they will be implemented.

For more information and upcoming events around #dominoforever, please visit our Destination Domino page.


Add possibility to run agent on cluster server(s)

a scheduled agent has to be set on cluster server S1 oder S2. But if S1 is not available the agent wouldn't run. Automatically let cluster server S2 run agents set to run on S1 if S1 is not available.

note. After restart of a cluster server it has to revalidate with all its cluster servers before to run any agents set to run on it-self.

  • Avatar32.5fb70cce7410889e661286fd7f1897de Guest
  • Jul 18 2018
  • Investigating
  • Attach files
  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    July 18, 2018 16:32

    Yes, this definitely needs to be addressed.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    July 20, 2018 11:41

    This is a nice suggestion and feature that will make the cluster servers a real cluster even for agents. 

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    August 29, 2018 07:34

    This would solve many problems I encounter regularly. Needs some thought about making it optional or default - maybe by introducing the possiblity to run on <ClusterName> instead run on <ServerName> ?

  • Admin
    Thomas Hampel commented
    22 Jan 08:30

    In case of a network hick-up both agents would run or how would servers detect that its not server outage but a connectivity issue?

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan 08:46

    @Thomas, good point. Maybe we should not make it automatic, but rather give the admin an option to fail over all agents to the cluster mate, using one command.

    So:

    • agents can be marked to run on 1 particular server, but with the possibility to activate failover,through a check-mark in the agent properties, for example. Eventually also add a second server field in the agent properties to choose one particular failover server.
    • when server A is down, the administrator can run a console command on the failover server so it takes over all agents which have the failover property set (and eventually this failover server in a second server field)

    With this, you should be able to tackle the potential network issue, with 2 servers running the same agent, modifying the same documents and creating replication conflicts after network gets restored.

    It still is manual, but only 1 command for all (failover activated) agents.

    I'm not sure about the second server field in the agent properties, but I think that might be handy in cases you have clusters of 3 or more servers and you want 1 particular server to be the failover for a particular agent, but potentially another server for another agent...

     

    And this doesn't necessarily needs to be limited to clusters. Let's say you have servers in different regions, with some applications replicating on schedule, it might be handy to use this feature also in case one of these servers is down for a longer period...

     

    Thibaud

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan 08:50

    To prevent a split-brain condition you could:

    #1 Define a master server to run the agent if there is no connection to the other hosts.

    #2 Use at least 3 nodes (servers or dedicated arbiter software), so that a quorum is possible. (See: https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/#client-quorum )

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan 08:51

    @Thomas Hampel
    The server could detect if the other server is still reachable via ICMP. This does of course not make for 100% doubtlessness. But it would be sufficient from my point of view. And: Network outages are very rare, Domino crashes not (unfortunately)
    This fail over should of course be configurable per agent!

  • Admin
    Thomas Hampel commented
    22 Jan 11:15

    If ping (or name resolution) doesn't work it doesn't mean the other server is down. It also doesn't mean that the other server does not run agents or has not already executed these agents. So its rather tricky to solve this request.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan 11:33

    That's not what I meant. When ping (or ARP) DOES work, but NRPC does NOT -> Domino server down. If ping (ARP) does NOT work -> network issue, state of server unclear. Yes, this would prevent fail over in cases where the server has a hardware fault or power failure. But these issues happen next to never. It would work well in case of a Domino crash - and that's by far the most often cause for server unavailability.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    28 Jan 16:44

     I thought of this as an enhancement to existing mechanisms: If you mark an agent as run on <cluster name>, in the agent design, the AMgr could lookup the cldbdir entry for this nsf to find out failover rules.

    The cldbdir could provide an easy to use admin interface to manage this feature similar to enabling/disabling cluster replication.  Of course there are issues to address - like timing on server startup between cldbdir replication and amgr initialization for run-on-cluster agents.
    Still far from simple - but it would let Domino Clustering shine even brighter

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    08 Feb 23:22

    Agents can mark their successful work in special logs on the administrative server. But I for manual switching.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    08 Feb 23:58

    As it would be ideally.

    We have several clusters. Each cluster includes 4 servers.
    Cluster-1: hub-1, hub-2, app1, app2.

    In Domino Directory there are 2 specific sched agents configuration documents, where for Cluster-1 is written:

    Fields of doc-1:
    Name: MAIN_AGENTS;
    Server: hub-1.

    Fields of doc-2:
    Name: OTHER_AGENTS;
    Server: hub-2.

    In all schedule agents not the server is selected, but a specific configuration for launching the agent. I.e MAIN_AGENTS or OTHER_AGENTS.

    Server hub-1 is down.
    We convinced that it would not up quickly.
    In the document MAIN_AGENTS we change hub-1 server to app2 (at balancer for users in app1 priority).

    Profit:
    1. Easy to manage - just one change.
    2. There is no need to change and resign design elements (agents).