Skip to main content
Auto-generated content — pending SME review

This content was auto-generated from Fusion SMB documentation and is pending SME review. Please verify accuracy before using in partner-facing contexts.

Architecture & Integration

This page covers Fusion SMB's clustering capabilities, high-availability topologies, and integration with infrastructure components. Use this material when designing production deployments or answering prospect questions about resilience and scalability.

Clustering Overview

Fusion SMB supports three complementary capabilities that work together for robust, reliable, and scalable deployments:

CapabilityWhat It Does
Continuous Availability (CA)Persistent file handle database allows clients to resume work seamlessly after a node failure
Scale-OutMultiple servers operate as a single SMB installation, sharing state via Corosync
High Availability (HA)Clustering software monitors node health and initiates failover

Deployment Topologies

Active-Passive

In an active-passive configuration, one node actively serves clients while a standby node takes over if the primary fails.

┌──────────┐ Floating IP ┌──────────┐
│ Active │◄──────────────────►│ Standby │
│ Node │ │ Node │
└─────┬─────┘ └─────┬────┘
│ │
└────────────┬───────────────────┘

┌────────▼────────┐
│ Shared Storage │
│ (ext4, XFS, │
│ ZFS, NTFS...) │
└─────────────────┘

Key characteristics:

  • Floating IP provides a single point of access for clients
  • Pacemaker manages health monitoring and failover
  • Shared storage holds data, config, and persistent state
  • All shares should use vfs = libc:force_sync for data integrity
  • Supports non-clustered file systems (ext4, XFS, ZFS)

Required components:

  • Persistent file handle database (ca = true, ca_path on shared storage)
  • Connection recovery database (tcp_tickle = true)

Active-Active

In an active-active configuration, multiple nodes serve clients simultaneously, distributing load and providing both performance scaling and fault tolerance.

┌──────────┐ ┌──────────┐ ┌──────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│(active) │ │(active) │ │(active) │
└─────┬────┘ └─────┬────┘ └─────┬────┘
│ │ │
│ Corosync Mesh │
│ (FSA state sharing) │
│ │ │
└─────────────┼─────────────┘

┌────────▼────────┐
│ Shared Storage │
│ (GlusterFS, │
│ WekaFS, CephFS, │
│ GPFS, GFS2...) │
└─────────────────┘

Key characteristics:

  • Each node has its own IP; DNS round-robin distributes clients
  • Corosync maintains shared FSA state across nodes
  • Requires a clustered file system that supports concurrent read-write mounts
  • Scale-out adds capacity by adding nodes

Required components:

  • Scale-out enabled (scale_out = true)
  • Persistent file handle database (CA)
  • Connection recovery database (TCP Tickle)
  • Corosync installed and configured on all nodes

Autonomous Mode

For specific workloads (read-heavy, sharded, or append-only), autonomous mode allows multiple servers in a cluster without shared FSA state. This eliminates coordination overhead but does not provide safe concurrent write access to the same files from different nodes.

caution

Autonomous mode can cause data corruption if multiple clients on different nodes write to the same files. Only use for workloads where this cannot occur.

Infrastructure Requirements

Shared Storage

TopologyFile System RequirementExamples
Active-PassiveAny Linux file systemext4, XFS, ZFS, NTFS by Tuxera
Active-ActiveMust support concurrent RW mountsGlusterFS, WekaFS, CephFS, GPFS, GFS2, OCFS2, Lustre

Shared storage holds:

  1. File data (the shares themselves)
  2. Fusion SMB configuration file (/etc/tsmb.conf)
  3. User database (if file-backed)
  4. Persistent file handle database
  5. Connection recovery database
  6. Privilege database

Best practices:

  • Mount points must be identical across all nodes
  • RAID and multipathing recommended for storage redundancy
  • Separate mount points for data vs. state are optional but supported

Networking

TopologyClient AccessInter-Node
Active-PassiveFloating IP (managed by Pacemaker)Dedicated network recommended
Active-ActiveDNS round-robin or load balancerDedicated network recommended

Best practices:

  • Separate client traffic from inter-node cluster traffic (VLANs or dedicated NICs)
  • Redundant network connections, switches, and routers
  • TCP port 445 open between all clients and all nodes

Clustering Software

Fusion SMB integrates with:

  • Corosync — reliable messaging and quorum between nodes
  • Pacemaker — resource management, health monitoring, failover orchestration

Fusion SMB does not bundle or mandate specific clustering software. It provides the SMB protocol layer and integrates with your chosen infrastructure.

Configuration Reference

Active-Passive Minimal Config

[global]
server_name = fusion-srv
userdb_type = ad
domain = yourdomain.com
runstate_dir = /mnt/shared/runstate
listen = ANY,0.0.0.0,IPv4,445,DIRECT_TCP

# Continuous availability
ca = true
ca_path = /mnt/shared/_ca

# Connection recovery
tcp_tickle = true
tcp_tickle_params = path=/mnt/shared/_recovery

# No scale-out for active-passive
scale_out = false
[/global]

[share]
netname = Data
path = /mnt/shared/data
vfs = libc:force_sync
permissions = everyone:full
[/share]

Active-Active Minimal Config

[global]
server_name = fusion-cluster
userdb_type = ad
domain = yourdomain.com
runstate_dir = /mnt/shared/runstate
listen = ANY,0.0.0.0,IPv4,445,DIRECT_TCP

# Scale-out with Corosync
scale_out = true

# Continuous availability
ca = true
ca_path = /mnt/shared/_ca

# Connection recovery
tcp_tickle = true
tcp_tickle_params = path=/mnt/shared/_recovery
[/global]

[share]
netname = Data
path = /mnt/shared/data
permissions = everyone:full
[/share]

Service Watchdog

Fusion SMB includes a built-in watchdog that monitors critical services (authentication, SMB server, etc.) using keep-alive requests. If a service fails to respond, the watchdog terminates all services, triggering the clustering software to initiate failover.

[global]
watchdog_interval = 10 # seconds between health checks
watchdog_timeout = 200 # milliseconds before declaring failure
[/global]

Rolling Upgrade

Fusion SMB supports zero-downtime rolling upgrades across the cluster:

Upgrade Procedure

  1. Stop Fusion service on a single node — clients automatically reconnect to remaining active nodes
  2. Update the Fusion package on that node
  3. Restart Fusion on the updated node — it operates in "old mode" compatible with the other nodes
  4. Repeat for each remaining node in the cluster
  5. Activation — upon the last node's upgrade, all nodes automatically enter new functionality mode

Key Properties

  • Zero client disruption — transparent failover during each node update
  • n-2 compatibility — nodes running different versions (within n-2 range) coexist during the rolling process
  • No node eviction — nodes are stopped and restarted in-place, no cluster membership changes needed
  • Automatic feature activation — new features become available cluster-wide after the final node is upgraded; clients pick up new capabilities dynamically or on reconnection
  • Simplest change control — lowest operational overhead for maintaining current versions

Integration Points

SystemIntegration Method
Active DirectorySSSD + Kerberos (domain join via adcli)
NFSRuns alongside — Fusion SMB handles SMB, NFS handles NFS
Windows MMCRemote management via IPC$ share + administrative shares
SystemdService file for boot-time startup and management
MonitoringLog output, watchdog health, Corosync status

Prospect Questions

Q: Does Fusion SMB require specific clustering software? No. It integrates with Corosync and Pacemaker but does not mandate them. Any clustering solution that can manage floating IPs, storage mounts, and service lifecycle can work.

Q: Can we mix active-active and active-passive? Not in a single cluster. Choose one topology per cluster. You can run separate clusters with different topologies.

Q: What happens to open files during failover? With CA enabled, the persistent file handle database allows clients to resume open files on the new node. With TCP Tickle, connection recovery is near-instantaneous — clients don't need to wait for TCP timeout.

Q: What file systems are supported? Any Linux file system for single-node or active-passive. For active-active, a clustered file system with concurrent RW support is required (GlusterFS, WekaFS, CephFS, GPFS, etc.).

Next Steps

Knowledge Check
1. What type of file system is required for active-active clustering?
2. What does the service watchdog do?
3. Why is autonomous clustering mode potentially dangerous?