Step-by-Step Proxmox and Ceph High Availability Setup Guide | Free High Availability Storage

Step 1: Prepare Proxmox Nodes

  1. Update and Upgrade Proxmox VE on all nodes:

apt update && apt full-upgrade -y

2. Ensure that all nodes have the same version of Proxmox VE:

pveversion

Step 2: Set Up the Proxmox Cluster

  1. Create a new cluster on the first node:
    • pvecm create my-cluster
  2. Add the other nodes to the cluster:
    • pvecm add <IP_of_first_node>
  3. Verify the cluster status:
    • pvecm status

Step 3: Install Ceph on Proxmox Nodes

  1. Install Ceph packages on all nodes:

install ceph ceph-mgr -y

Step 4: Create the Ceph Cluster

  1. Initialize the Ceph cluster on the first node:
    • pveceph init --network <cluster_network>
  2. Create the manager daemon on the first node:
    • pveceph createmgr

Step 5: Add OSDs (Object Storage Daemons)

  1. Prepare disks on each node for Ceph OSDs:
    • pveceph createosd /dev/sdX
  2. Repeat the process for each node and disk.

Step 6: Create Ceph Pools

  1. Create a Ceph pool for VM storage:
    • pveceph pool create mypool 128

Step 7: Configure Proxmox to Use Ceph Storage

  1. Add the Ceph storage to Proxmox:
    • Navigate to Datacenter > Storage > Add > RBD.
    • Enter the required details like ID, Pool, and Monitor hosts.
    • Save the configuration.

Step 8: Enable HA (High Availability)

  1. Configure HA on Proxmox:
    • Navigate to Datacenter > HA.
    • Add resources (VMs or containers) to the HA manager.
    • Configure the HA policy and set desired node priorities.

Step 9: Testing High Availability

  1. Simulate node failure: Power off one of the nodes and observe how the VMs or containers are automatically migrated to other nodes.

Step 10: Monitoring and Maintenance

  1. Use the Proxmox and Ceph dashboards to monitor the health of your cluster.
  2. Regularly update all nodes to ensure stability and security.

Optional: Additional Ceph Configuration

  1. Add Ceph Monitors for redundancy:bashKodu kopyalapveceph createmon
  2. Add more Ceph MDS (Metadata Servers) if using CephFS:bashKodu kopyalapveceph createmds
  3. Tune Ceph settings for performance and reliability based on your specific needs.

By following these steps, you will have a robust Proxmox VE and Ceph high availability setup, ensuring that your VMs and containers remain highly available even in the event of hardware failures.

Proxmox VM Live Migration | Migrate VM to another host without Downtime

  1. Cluster Setup: Ensure that your Proxmox hosts are part of the same cluster. A Proxmox cluster consists of multiple Proxmox VE servers (nodes) combined to offer high availability and load balancing to virtual machines. Nodes in a cluster share resources such as storage and can migrate VMs between each other.
  2. Shared Storage: Live migration requires shared storage accessible by both the source and target hosts. This shared storage can be implemented using technologies like NFS, iSCSI, or Ceph. Shared storage allows the VM’s disk images and configuration files to be accessed by any node in the cluster.
  3. Migration Prerequisites: Before initiating a live migration, ensure that the target host has enough resources (CPU, memory, storage) to accommodate the migrating VM. Proxmox will check these prerequisites before allowing the migration to proceed.
  4. Initiating Migration: In the Proxmox web interface (or using the Proxmox command-line interface), select the VM you want to migrate and choose the “Migrate” option. Proxmox will guide you through the migration process.
  5. Migration Process:
    • Pre-Copy Phase: Proxmox starts by copying the memory pages of the VM from the source host to the target host. This is done iteratively, with the majority of memory pages copied in the initial phase.
    • Stopping Point: At a certain point during the migration, Proxmox determines a stopping point. This is the point at which the VM will be paused briefly to perform a final synchronization of memory pages and state information.
    • Pause and Synchronization: The VM is paused on the source host, and any remaining memory pages and state information are transferred to the target host. This pause is usually very brief, minimizing downtime.
    • Completion: Once the final synchronization is complete, the VM is resumed on the target host. From the perspective of the VM and its users, the migration is seamless, and the VM continues to run without interruption on the target host.
  6. Post-Migration: After the migration is complete, the VM is running on the target host. You can verify this in the Proxmox web interface or using the command-line tools. The source host frees up resources previously used by the migrated VM.
  7. High Availability (HA): In a Proxmox cluster with HA enabled, if a host fails, VMs running on that host can be automatically migrated to other hosts in the cluster, ensuring minimal downtime.

Overall, Proxmox VM live migration is a powerful feature that enables you to move virtual machines between hosts in a Proxmox cluster with minimal downtime, providing flexibility and high availability for your virtualized environment.

Proxmox Cluster | Free Virtualization with HA Feature | Step by Step

    1. Cluster Configuration:
      • Nodes: A Proxmox cluster consists of multiple nodes, which are physical servers running Proxmox VE.
      • Networking: Nodes in a Proxmox cluster should be connected to a common network. A private network for internal communication and a public network for client access are typically configured.
      • Shared Storage: Shared storage is crucial for a Proxmox cluster to enable features like live migration and high availability. This can be achieved through technologies like NFS, iSCSI, or Ceph.
    2. High Availability (HA):
      • Proxmox VE includes a feature called HA, which ensures that critical VMs are automatically restarted on another node in the event of a node failure.
      • HA relies on fencing mechanisms to isolate a failed node from the cluster and prevent split-brain scenarios. This can be achieved through power fencing (e.g., IPMI, iLO, iDRAC) or network fencing (e.g., switch port blocking).
      • When a node fails, the HA manager on the remaining nodes detects the failure and initiates the restart of the affected VMs on healthy nodes.
    3. Corosync and Pacemaker:
      • Proxmox VE uses Corosync as the messaging layer and Pacemaker as the cluster resource manager. These components ensure that cluster nodes can communicate effectively and coordinate resource management.
      • Corosync provides a reliable communication channel between nodes, while Pacemaker manages the resources (VMs, containers, services) in the cluster and ensures they are highly available.
    4. Resource Management:
      • Proxmox clusters allow for dynamic resource allocation, allowing VMs and containers to use resources based on demand.
      • Memory and CPU resources can be allocated and adjusted for each VM or container, and live migration allows these resources to be moved between nodes without downtime.
    5. Backup and Restore:
      • Proxmox includes backup and restore functionality, allowing administrators to create scheduled backups of VMs and containers.
      • Backups can be stored locally or on remote storage, providing flexibility in backup storage options.
    6. Monitoring and Logging:
      • Proxmox provides monitoring and logging capabilities to help administrators track the performance and health of the cluster.
      • The web interface includes dashboards and graphs for monitoring resource usage, as well as logs for tracking cluster events.
    7. Updates and Maintenance:
      • Proxmox clusters can be updated and maintained using the web interface or command-line tools. Updates can be applied to individual nodes or the entire cluster.

    Install And Configure DHCP Server Cluster

    1. Preparing the Environment:

    • Ensure that both servers meet the hardware and software requirements for Windows Server and DHCP.
    • Assign static IP addresses to each server.
    • Ensure that DNS is properly configured and that both servers can resolve each other’s names.

    2. Installing the DHCP Server Role:

    • Open Server Manager on both servers.
    • Select “Add roles and features” and proceed with the installation wizard.
    • Select “DHCP Server” as the role to install.
    • Complete the DHCP Server installation wizard.

    3. Configuring DHCP Failover:

    • Open DHCP Manager on one of the servers.
    • Right-click on the DHCP server name and select “Configure Failover.”
    • Follow the wizard to configure DHCP failover.
    • Choose the partner server, configure the shared secret, and set the mode (Load Balance or Hot Standby) and relationship (Primary or Secondary).

    4. Installing the Failover Clustering Feature:

    • Open Server Manager on both servers.
    • Select “Add roles and features” and proceed with the installation wizard.
    • Select “Failover Clustering” as the feature to install.

    5. Creating the Cluster:

    • Open Failover Cluster Manager on one of the servers.
    • Click on “Create Cluster” and follow the wizard.
    • Add both servers to the cluster.
    • Configure cluster settings such as the cluster name and IP address.

    6. Configuring DHCP Server Role in the Cluster:

    • In Failover Cluster Manager, right-click on “Services and Applications” and select “Configure a Service or Application.”
    • Select “DHCP Server” as the service to configure.
    • Follow the wizard to add the DHCP server role to the cluster.

    7. Testing Failover:

    • Perform a failover test to ensure that the DHCP server cluster functions correctly.
    • Use the Failover Cluster Manager to initiate a failover and verify that DHCP services remain available during the failover process.

    8. Monitoring and Maintenance:

    • Regularly monitor the DHCP server cluster using Failover Cluster Manager to ensure it remains healthy.
    • Perform regular maintenance tasks, such as applying updates and patches, to keep the cluster secure and up-to-date.

    Note: Ensure that you have sufficient IP address ranges and leases configured to handle the increased demand that comes with clustering. Additionally, testing failover in a controlled environment is crucial to ensure proper functioning in a production environment.