Changes between Initial Version and Version 1 of OMFServices Deployment


Ignore:
Timestamp:
Mar 13, 2026, 1:13:50 PM (2 days ago)
Author:
editor
Comment:

Comprehensive OMF services deployment topology documentation

Legend:

Unmodified
Added
Removed
Modified
  • OMFServices Deployment

    v1 v1  
     1= OMF Services Deployment Topology =
     2
     3'''Last updated: 2026-03-13'''
     4
     5[[PageOutline(2-3)]]
     6
     7== 1. Overview ==
     8
     9The ORBIT/COSMOS testbed infrastructure spans two sites connected via IP tunnel:
     10
     11 * '''North Brunswick, NJ (Rutgers)''' -- Primary ORBIT site
     12 * '''New York City, NY (Columbia)''' -- Primary COSMOS site
     13
     14The platform runs '''19 Ruby/Sinatra microservices''', '''1 Python/Flask service''' (omf-array-mgmt), and '''1 CLI tool''' (omf-expctl) across '''23+ hosts''' with '''35+ service instances'''.
     15
     16All Ruby services share the {{{omf-common}}} git submodule (Sinatra base class with DSL for route definition, XML/JSON response formatting, Prometheus metrics, and configuration loading). Services are packaged as Debian packages via FPM and managed by systemd.
     17
     18The canonical entry point for all service API calls is the '''AM proxy''' at {{{am1:5054}}} ({{{omf-agg-mgr-proxy}}}), which provides service discovery and request routing. The '''cosmos-portal''' (React SPA on web1) provides the web UI, with Apache reverse-proxying API calls to backend services.
     19
     20== 2. Network Architecture ==
     21
     22=== IP Address Ranges ===
     23
     24||= Site =||= Range =||= Notes =||
     25|| North Brunswick (Rutgers) || {{{10.0.0.0 -- 10.63.255.255}}} || Primary ORBIT infrastructure ||
     26|| New York City (Columbia) || {{{10.64.0.0 -- 10.127.255.255}}} || Primary COSMOS infrastructure ||
     27
     28=== VLAN Structure ===
     29
     30 * '''Management VLAN''' -- Infrastructure servers, out-of-band management
     31 * '''Control VLAN''' (per domain) -- Service-to-node communication, PXE boot
     32 * '''Data VLAN''' (per domain) -- Experiment data traffic between testbed nodes
     33 * '''IP tunnel''' connecting North Brunswick and NYC sites
     34
     35=== DNS Domains ===
     36
     37 * {{{orbit-lab.org}}} -- ORBIT nodes and infrastructure
     38 * {{{cosmos-lab.org}}} -- COSMOS nodes and infrastructure
     39
     40DNS is served by BIND9 on mgmt1/mgmt2 with 2,563 forward A records across 25 zones (11 ORBIT + 14 COSMOS) and 53 reverse zones.
     41
     42== 3. North Brunswick Hosts ==
     43
     44=== am1 (10.50.0.41) -- Aggregate Manager / RF Services ===
     45
     46||= Service =||= Port =||= Version =||= Description =||
     47|| omf-agg-mgr-proxy || 5054 || -- || Service discovery proxy (routes API calls to backend services) ||
     48|| omf-rf-control || 5001 || v0-2 || RF signal generator control ||
     49|| omf-rf-switch || 5002 || v0-2 || RF switch matrix control ||
     50|| omf-xy-table || 5003 || -- || XY table positioning service ||
     51|| omf-array-mgmt || 5004 || -- || Antenna array management ('''Python/Flask''') ||
     52
     53'''Platform:''' Ubuntu 18.04, RVM Ruby 3.2.3, Python 3.6.9[[BR]]
     54'''Note:''' Requires {{{libruby-3.2}}} symlink + ldconfig for native extensions
     55
     56=== am4 -- Development Server ===
     57
     58Development machine only. All source repos at {{{/home/seskar/omf-*}}}. Not a production service host.
     59
     60=== am5 (10.50.0.45) -- Core Services ===
     61
     62||= Service =||= Port =||= Version =||= Description =||
     63|| omf-cmc || 5013 || v1-8 || Chassis management controller (power control via IPMI, HTTP CM, SNMP PDU) ||
     64|| omf-scheduler || 5016 || -- || Reservation scheduler and auto-approver (ActiveRecord + MySQL) ||
     65|| omf-rfmatrix || 5020 || v1-1 || RF matrix switch control ||
     66|| omf-status || 5021 || -- || Testbed status aggregation ||
     67
     68=== repository2 (10.50.0.22) -- Image & Account Services ===
     69
     70||= Service =||= Port =||= Version =||= Description =||
     71|| omf-account-mgmt || 5017 || v1-2 || User/group registration, approval, LDAP lifecycle (ActiveRecord + MySQL) ||
     72|| omf-frisbee || 5011 || v1-3 || Frisbee daemon management for disk image multicasting ||
     73|| omf-pxe || 5010 || v1-5 || PXE boot configuration (aggregate manager) ||
     74|| omf-saveimage || 5012 || v1-3 || Disk image save via netcat receiver ||
     75|| omf-user-stats || 5015 || -- || User disk usage and scheduler usage statistics ||
     76
     77=== Infrastructure Servers ===
     78
     79||= Host =||= IP =||= Role =||
     80|| mgmt1 || 10.250.0.8 || Primary DHCP (ISC DHCP4, 2,145 static hosts) + Primary DNS (BIND9, 2,563 A records) ||
     81|| mgmt2 || 10.250.0.9 || DHCP failover peer + DNS slave ||
     82|| db1 || 10.0.0.51 || LibreNMS monitoring (190 devices, SNMP polling every 5 min) ||
     83|| mysql1 || -- || Shared MySQL server for scheduler, account-mgmt, user-stats ||
     84|| amqp.orbit-lab.org || -- || RabbitMQ MQTT broker (MQTT 1883, WebSocket 15675) ||
     85|| web1 || -- || cosmos-portal (React SPA), Apache reverse proxy ||
     86|| gitlab.orbit-lab.org || 10.50.0.20 || GitLab (24 repos under {{{orbit/}}}) ||
     87|| ldap1.orbit-lab.org || -- || Primary OpenLDAP server (port 389) ||
     88|| ldap2.orbit-lab.org || -- || Secondary OpenLDAP server (port 389) ||
     89
     90=== ORBIT Console Servers (9 hosts) ===
     91
     92All consoles run '''omf-cmonitor''' on port 5000. Some also run '''omf-expctl'''.[[BR]]
     93'''Platform:''' Ubuntu 16.04, RVM Ruby 3.2.3, omf-cmonitor v1-1
     94
     95||= Console =||= omf-cmonitor =||= omf-expctl =||= Notes =||
     96|| grid.orbit-lab.org || Yes || Yes (v1-19) || Main 20x20 grid ||
     97|| sb1.orbit-lab.org || Yes || Yes (v1-19) || Sandbox 1 ||
     98|| sb2.orbit-lab.org || Yes || Yes (v1-19) || Sandbox 2 ||
     99|| sb3.orbit-lab.org || Yes || Yes (v1-19) || Sandbox 3 ||
     100|| sb4.orbit-lab.org || Yes || -- || Sandbox 4 ||
     101|| sb7.orbit-lab.org || Yes || -- || Sandbox 7 ||
     102|| sb9.orbit-lab.org || Yes || -- || Sandbox 9 ||
     103|| outdoor.orbit-lab.org || Yes || -- || Outdoor testbed ||
     104|| instrument.orbit-lab.org || Yes || -- || Instrument cluster ||
     105
     106=== Unreachable Consoles ===
     107
     108The following consoles are currently unreachable: {{{vgrid1-4.orbit-lab.org}}}, {{{instrument.cosmos-lab.org}}}
     109
     110== 4. New York City Hosts ==
     111
     112=== COSMOS Console Servers (9 hosts) ===
     113
     114All consoles run '''omf-cmonitor''' on port 5000.[[BR]]
     115'''Platform:''' Ubuntu 16.04, RVM Ruby 3.2.3, omf-cmonitor v1-1
     116
     117||= Console =||= omf-cmonitor =||= omf-expctl =||= Notes =||
     118|| osc.cosmos-lab.org || Yes || -- || Open-access sandbox ||
     119|| indigo.cosmos-lab.org || Yes || -- || ||
     120|| accord.cosmos-lab.org || Yes || -- || ||
     121|| sb1.cosmos-lab.org || Yes || Yes (v1-19) || Sandbox 1 ||
     122|| sb2.cosmos-lab.org || Yes || -- || Sandbox 2 ||
     123|| weeks.cosmos-lab.org || Yes || -- || ||
     124|| rrail.cosmos-lab.org || Yes || -- || ||
     125|| bed.cosmos-lab.org || Yes || -- || ||
     126|| nebula.cosmos-lab.org || Yes || -- || ||
     127
     128=== COSMOS Raspberry Pis ===
     129
     130||= Host =||= IP =||= Services =||
     131|| pi1-auden.sb1.cosmos-lab.org || 10.37.25.15 || omf-cosmos-cm (5018, v1-1), omf-auden (5019, v1-1) ||
     132|| pi2-auden.sb1.cosmos-lab.org || 10.37.25.16 || omf-cosmos-cm (5018, v1-1), omf-auden (5019, v1-1) ||
     133
     134=== XY Table Controllers ===
     135
     136||= Host =||= IP =||= Service =||= Notes =||
     137|| xytable1 || 10.1.37.221 || omf-xytable-ctrl (port 80) || Raspberry Pi, sb1.cosmos-lab.org ||
     138|| xytable2 || 10.1.37.222 || omf-xytable-ctrl (port 80) || Raspberry Pi, sb1.cosmos-lab.org ||
     139
     140MQTT telemetry published to {{{xy/<fqdn>/position}}} every 200ms via {{{amqp.orbit-lab.org:1883}}}.
     141
     142== 5. Shared Infrastructure ==
     143
     144||= Component =||= Host =||= Details =||
     145|| cosmos-portal || web1 || React 18 + Vite SPA, Tailwind CSS, static files served by Apache ||
     146|| GitLab || gitlab.orbit-lab.org (10.50.0.20) || 24 repos under {{{orbit/}}} namespace ||
     147|| NetBox || 10.50.0.93 || v2.9.10 (needs upgrade to 4.x), 290 devices, data from 2020-2021 ||
     148|| Proxmox (ORBIT) || mgmt-vmhost1..5 || 5x Dell R740 (48 cores, 187GB each), 98 VMs (57 running), Ceph RBD + NFS ||
     149|| Proxmox (COSMOS) || mgmt-vmhost1..5-co1 || 3x R430 + 2x R740, 13 VMs (8 running) ||
     150
     151== 6. Service Dependency Map ==
     152
     153This section documents which services call which other services.
     154
     155=== omf-expctl (Experiment Controller CLI) ===
     156
     157All calls routed via the AM proxy at {{{am1:5054}}}.
     158
     159 * {{{omf-expctl}}} -> {{{omf-cmc}}} -- Power control (on/off/reset nodes)
     160 * {{{omf-expctl}}} -> {{{omf-pxe}}} -- PXE boot setup (set boot image)
     161 * {{{omf-expctl}}} -> {{{omf-frisbee}}} -- Disk imaging (load images onto nodes)
     162 * {{{omf-expctl}}} -> {{{omf-saveimage}}} -- Save disk images from nodes
     163 * {{{omf-expctl}}} -> {{{omf-scheduler}}} -- Permission/reservation check
     164
     165=== Inter-Service Dependencies ===
     166
     167||= Caller =||= Callee =||= Purpose =||= Via =||
     168|| omf-cmc || omf-cmonitor || Wake-on-LAN packet generation || Direct HTTP (CM_wolurl) ||
     169|| omf-auden || omf-rf-control || RF signal generator setup || Direct (should use AM proxy) ||
     170|| omf-status || omf-cmc || Node power state || Direct ||
     171|| omf-status || omf-scheduler || Reservation info || Direct ||
     172|| omf-status || omf-frisbee || Imaging status || Direct ||
     173
     174=== External Dependencies ===
     175
     176||= Service =||= External System =||= Purpose =||
     177|| omf-scheduler || LDAP (ldap1/ldap2) || Host attribute management (LdapHostManager) ||
     178|| omf-scheduler || MySQL || Reservation persistence ||
     179|| omf-scheduler || SMTP (mail.orbit-lab.org:25) || Reservation notifications ||
     180|| omf-account-mgmt || LDAP (ldap1/ldap2) || User/group lifecycle management ||
     181|| omf-account-mgmt || MySQL || Account persistence ||
     182|| omf-account-mgmt || SMTP (mail.orbit-lab.org:25) || Account notifications ||
     183|| omf-user-stats || MySQL (multiple databases) || Usage data aggregation ||
     184|| omf-user-stats || LDAP || User lookups ||
     185
     186=== cosmos-portal (Web UI) ===
     187
     188All API calls proxied via Apache on web1:
     189
     190||= Portal Route =||= Backend =||= Service =||
     191|| {{{/account/*}}} || repository2:5017 || omf-account-mgmt ||
     192|| {{{/scheduler/*}}} || am5:5016 || omf-scheduler ||
     193|| {{{/rfmatrix/*}}} || am5:5020 || omf-rfmatrix ||
     194|| {{{/status/*}}} || am5:5021 || omf-status ||
     195|| {{{/inventory/*}}} || am5:5012 || omf-newinventory (legacy) ||
     196|| {{{/user-stats/*}}} || repository2:5015 || omf-user-stats ||
     197|| {{{/mqtt/ws}}} || amqp.orbit-lab.org:15675/ws || RabbitMQ WebSocket ||
     198
     199== 7. Port Registry ==
     200
     201||= Port =||= Service =||= Deployment =||
     202|| 5000 || omf-cmonitor || Console servers (18 hosts, per domain) ||
     203|| 5001 || omf-rf-control || am1 ||
     204|| 5002 || omf-rf-switch || am1 ||
     205|| 5003 || omf-xy-table || am1 ||
     206|| 5004 || omf-array-mgmt || am1 ||
     207|| 5010 || omf-pxe || repository2 ||
     208|| 5011 || omf-frisbee || repository2 ||
     209|| 5012 || omf-saveimage || repository2 ||
     210|| 5013 || omf-cmc || am5 ||
     211|| 5015 || omf-user-stats || repository2 ||
     212|| 5016 || omf-scheduler || am5 ||
     213|| 5017 || omf-account-mgmt || repository2 ||
     214|| 5018 || omf-cosmos-cm || COSMOS Pis ||
     215|| 5019 || omf-auden || COSMOS Pis ||
     216|| 5020 || omf-rfmatrix || am5 ||
     217|| 5021 || omf-status || am5 ||
     218|| 5054 || omf-agg-mgr-proxy || am1 ||
     219
     220'''Next available port: 5022'''
     221
     222== 8. Technology Stack ==
     223
     224=== Backend ===
     225
     226 * '''Ruby 3.2.x''' -- Primary language for all microservices
     227 * '''Sinatra''' (v4.x) -- Web framework (all services inherit from {{{OMFService}}} base class)
     228 * '''Puma''' -- Application server
     229 * '''ActiveRecord''' -- ORM for MySQL-backed services (scheduler, account-mgmt, user-stats)
     230 * '''Ox''' -- Fast XML parser/generator for OMF XML responses
     231 * '''Python 3.x / Flask''' -- omf-array-mgmt only
     232 * '''sinatra-param''' -- Request parameter validation via DSL
     233
     234=== Data Stores ===
     235
     236 * '''MySQL / MariaDB''' -- Persistence for scheduler, account-mgmt, user-stats, rfmatrix
     237 * '''OpenLDAP''' -- User/group directory (ldap1/ldap2, port 389)
     238 * '''RabbitMQ''' -- MQTT broker for node communication and XY table telemetry
     239
     240=== Frontend ===
     241
     242 * '''React 18''' -- cosmos-portal SPA
     243 * '''Vite''' -- Build tool
     244 * '''Tailwind CSS''' -- Styling
     245 * '''Apache''' -- Static file serving + reverse proxy on web1
     246
     247=== Operations ===
     248
     249 * '''Debian packaging''' via FPM ({{{make deb}}})
     250 * '''systemd''' service units (auto-enabled on package install)
     251 * '''Prometheus''' metrics at {{{/metrics}}} on all services
     252 * '''LibreNMS''' -- Network device monitoring (190 devices via SNMP)
     253 * '''Git submodules''' -- {{{omf-common}}} (shared framework), {{{omf-logging-db}}} (database helpers), {{{omf-ldap}}} (LDAP helpers)
     254
     255=== Configuration ===
     256
     257Configuration is merged in order (later files override earlier):
     258
     259 1. {{{default/config.yml}}} -- Built-in defaults (shipped with package)
     260 2. {{{/etc/omf-services/config.yml}}} -- Global settings
     261 3. {{{/etc/omf-services/<service>.yml}}} -- Service-specific (e.g., {{{cmonitor.yml}}})
     262 4. {{{./config.yml}}} -- Development override (not installed in production)
     263
     264=== PXE Boot Images ===
     265
     266 * '''omf-5.8''' (current) -- Alpine 3.23, kernel 6.18 LTS, Ruby 3.4.8, MQTT-based RC, 97MB initfs
     267 * '''omf-5.7''' (legacy) -- Alpine LTS 5.15, Ruby 3.1, XMPP-based RC, 41MB initfs
     268 * PXE images served from {{{root@repository2:/tftpboot/}}}