| | 1 | = OMF Services Deployment Topology = |
| | 2 | |
| | 3 | '''Last updated: 2026-03-13''' |
| | 4 | |
| | 5 | [[PageOutline(2-3)]] |
| | 6 | |
| | 7 | == 1. Overview == |
| | 8 | |
| | 9 | The ORBIT/COSMOS testbed infrastructure spans two sites connected via IP tunnel: |
| | 10 | |
| | 11 | * '''North Brunswick, NJ (Rutgers)''' -- Primary ORBIT site |
| | 12 | * '''New York City, NY (Columbia)''' -- Primary COSMOS site |
| | 13 | |
| | 14 | The platform runs '''19 Ruby/Sinatra microservices''', '''1 Python/Flask service''' (omf-array-mgmt), and '''1 CLI tool''' (omf-expctl) across '''23+ hosts''' with '''35+ service instances'''. |
| | 15 | |
| | 16 | All Ruby services share the {{{omf-common}}} git submodule (Sinatra base class with DSL for route definition, XML/JSON response formatting, Prometheus metrics, and configuration loading). Services are packaged as Debian packages via FPM and managed by systemd. |
| | 17 | |
| | 18 | The canonical entry point for all service API calls is the '''AM proxy''' at {{{am1:5054}}} ({{{omf-agg-mgr-proxy}}}), which provides service discovery and request routing. The '''cosmos-portal''' (React SPA on web1) provides the web UI, with Apache reverse-proxying API calls to backend services. |
| | 19 | |
| | 20 | == 2. Network Architecture == |
| | 21 | |
| | 22 | === IP Address Ranges === |
| | 23 | |
| | 24 | ||= Site =||= Range =||= Notes =|| |
| | 25 | || North Brunswick (Rutgers) || {{{10.0.0.0 -- 10.63.255.255}}} || Primary ORBIT infrastructure || |
| | 26 | || New York City (Columbia) || {{{10.64.0.0 -- 10.127.255.255}}} || Primary COSMOS infrastructure || |
| | 27 | |
| | 28 | === VLAN Structure === |
| | 29 | |
| | 30 | * '''Management VLAN''' -- Infrastructure servers, out-of-band management |
| | 31 | * '''Control VLAN''' (per domain) -- Service-to-node communication, PXE boot |
| | 32 | * '''Data VLAN''' (per domain) -- Experiment data traffic between testbed nodes |
| | 33 | * '''IP tunnel''' connecting North Brunswick and NYC sites |
| | 34 | |
| | 35 | === DNS Domains === |
| | 36 | |
| | 37 | * {{{orbit-lab.org}}} -- ORBIT nodes and infrastructure |
| | 38 | * {{{cosmos-lab.org}}} -- COSMOS nodes and infrastructure |
| | 39 | |
| | 40 | DNS is served by BIND9 on mgmt1/mgmt2 with 2,563 forward A records across 25 zones (11 ORBIT + 14 COSMOS) and 53 reverse zones. |
| | 41 | |
| | 42 | == 3. North Brunswick Hosts == |
| | 43 | |
| | 44 | === am1 (10.50.0.41) -- Aggregate Manager / RF Services === |
| | 45 | |
| | 46 | ||= Service =||= Port =||= Version =||= Description =|| |
| | 47 | || omf-agg-mgr-proxy || 5054 || -- || Service discovery proxy (routes API calls to backend services) || |
| | 48 | || omf-rf-control || 5001 || v0-2 || RF signal generator control || |
| | 49 | || omf-rf-switch || 5002 || v0-2 || RF switch matrix control || |
| | 50 | || omf-xy-table || 5003 || -- || XY table positioning service || |
| | 51 | || omf-array-mgmt || 5004 || -- || Antenna array management ('''Python/Flask''') || |
| | 52 | |
| | 53 | '''Platform:''' Ubuntu 18.04, RVM Ruby 3.2.3, Python 3.6.9[[BR]] |
| | 54 | '''Note:''' Requires {{{libruby-3.2}}} symlink + ldconfig for native extensions |
| | 55 | |
| | 56 | === am4 -- Development Server === |
| | 57 | |
| | 58 | Development machine only. All source repos at {{{/home/seskar/omf-*}}}. Not a production service host. |
| | 59 | |
| | 60 | === am5 (10.50.0.45) -- Core Services === |
| | 61 | |
| | 62 | ||= Service =||= Port =||= Version =||= Description =|| |
| | 63 | || omf-cmc || 5013 || v1-8 || Chassis management controller (power control via IPMI, HTTP CM, SNMP PDU) || |
| | 64 | || omf-scheduler || 5016 || -- || Reservation scheduler and auto-approver (ActiveRecord + MySQL) || |
| | 65 | || omf-rfmatrix || 5020 || v1-1 || RF matrix switch control || |
| | 66 | || omf-status || 5021 || -- || Testbed status aggregation || |
| | 67 | |
| | 68 | === repository2 (10.50.0.22) -- Image & Account Services === |
| | 69 | |
| | 70 | ||= Service =||= Port =||= Version =||= Description =|| |
| | 71 | || omf-account-mgmt || 5017 || v1-2 || User/group registration, approval, LDAP lifecycle (ActiveRecord + MySQL) || |
| | 72 | || omf-frisbee || 5011 || v1-3 || Frisbee daemon management for disk image multicasting || |
| | 73 | || omf-pxe || 5010 || v1-5 || PXE boot configuration (aggregate manager) || |
| | 74 | || omf-saveimage || 5012 || v1-3 || Disk image save via netcat receiver || |
| | 75 | || omf-user-stats || 5015 || -- || User disk usage and scheduler usage statistics || |
| | 76 | |
| | 77 | === Infrastructure Servers === |
| | 78 | |
| | 79 | ||= Host =||= IP =||= Role =|| |
| | 80 | || mgmt1 || 10.250.0.8 || Primary DHCP (ISC DHCP4, 2,145 static hosts) + Primary DNS (BIND9, 2,563 A records) || |
| | 81 | || mgmt2 || 10.250.0.9 || DHCP failover peer + DNS slave || |
| | 82 | || db1 || 10.0.0.51 || LibreNMS monitoring (190 devices, SNMP polling every 5 min) || |
| | 83 | || mysql1 || -- || Shared MySQL server for scheduler, account-mgmt, user-stats || |
| | 84 | || amqp.orbit-lab.org || -- || RabbitMQ MQTT broker (MQTT 1883, WebSocket 15675) || |
| | 85 | || web1 || -- || cosmos-portal (React SPA), Apache reverse proxy || |
| | 86 | || gitlab.orbit-lab.org || 10.50.0.20 || GitLab (24 repos under {{{orbit/}}}) || |
| | 87 | || ldap1.orbit-lab.org || -- || Primary OpenLDAP server (port 389) || |
| | 88 | || ldap2.orbit-lab.org || -- || Secondary OpenLDAP server (port 389) || |
| | 89 | |
| | 90 | === ORBIT Console Servers (9 hosts) === |
| | 91 | |
| | 92 | All consoles run '''omf-cmonitor''' on port 5000. Some also run '''omf-expctl'''.[[BR]] |
| | 93 | '''Platform:''' Ubuntu 16.04, RVM Ruby 3.2.3, omf-cmonitor v1-1 |
| | 94 | |
| | 95 | ||= Console =||= omf-cmonitor =||= omf-expctl =||= Notes =|| |
| | 96 | || grid.orbit-lab.org || Yes || Yes (v1-19) || Main 20x20 grid || |
| | 97 | || sb1.orbit-lab.org || Yes || Yes (v1-19) || Sandbox 1 || |
| | 98 | || sb2.orbit-lab.org || Yes || Yes (v1-19) || Sandbox 2 || |
| | 99 | || sb3.orbit-lab.org || Yes || Yes (v1-19) || Sandbox 3 || |
| | 100 | || sb4.orbit-lab.org || Yes || -- || Sandbox 4 || |
| | 101 | || sb7.orbit-lab.org || Yes || -- || Sandbox 7 || |
| | 102 | || sb9.orbit-lab.org || Yes || -- || Sandbox 9 || |
| | 103 | || outdoor.orbit-lab.org || Yes || -- || Outdoor testbed || |
| | 104 | || instrument.orbit-lab.org || Yes || -- || Instrument cluster || |
| | 105 | |
| | 106 | === Unreachable Consoles === |
| | 107 | |
| | 108 | The following consoles are currently unreachable: {{{vgrid1-4.orbit-lab.org}}}, {{{instrument.cosmos-lab.org}}} |
| | 109 | |
| | 110 | == 4. New York City Hosts == |
| | 111 | |
| | 112 | === COSMOS Console Servers (9 hosts) === |
| | 113 | |
| | 114 | All consoles run '''omf-cmonitor''' on port 5000.[[BR]] |
| | 115 | '''Platform:''' Ubuntu 16.04, RVM Ruby 3.2.3, omf-cmonitor v1-1 |
| | 116 | |
| | 117 | ||= Console =||= omf-cmonitor =||= omf-expctl =||= Notes =|| |
| | 118 | || osc.cosmos-lab.org || Yes || -- || Open-access sandbox || |
| | 119 | || indigo.cosmos-lab.org || Yes || -- || || |
| | 120 | || accord.cosmos-lab.org || Yes || -- || || |
| | 121 | || sb1.cosmos-lab.org || Yes || Yes (v1-19) || Sandbox 1 || |
| | 122 | || sb2.cosmos-lab.org || Yes || -- || Sandbox 2 || |
| | 123 | || weeks.cosmos-lab.org || Yes || -- || || |
| | 124 | || rrail.cosmos-lab.org || Yes || -- || || |
| | 125 | || bed.cosmos-lab.org || Yes || -- || || |
| | 126 | || nebula.cosmos-lab.org || Yes || -- || || |
| | 127 | |
| | 128 | === COSMOS Raspberry Pis === |
| | 129 | |
| | 130 | ||= Host =||= IP =||= Services =|| |
| | 131 | || pi1-auden.sb1.cosmos-lab.org || 10.37.25.15 || omf-cosmos-cm (5018, v1-1), omf-auden (5019, v1-1) || |
| | 132 | || pi2-auden.sb1.cosmos-lab.org || 10.37.25.16 || omf-cosmos-cm (5018, v1-1), omf-auden (5019, v1-1) || |
| | 133 | |
| | 134 | === XY Table Controllers === |
| | 135 | |
| | 136 | ||= Host =||= IP =||= Service =||= Notes =|| |
| | 137 | || xytable1 || 10.1.37.221 || omf-xytable-ctrl (port 80) || Raspberry Pi, sb1.cosmos-lab.org || |
| | 138 | || xytable2 || 10.1.37.222 || omf-xytable-ctrl (port 80) || Raspberry Pi, sb1.cosmos-lab.org || |
| | 139 | |
| | 140 | MQTT telemetry published to {{{xy/<fqdn>/position}}} every 200ms via {{{amqp.orbit-lab.org:1883}}}. |
| | 141 | |
| | 142 | == 5. Shared Infrastructure == |
| | 143 | |
| | 144 | ||= Component =||= Host =||= Details =|| |
| | 145 | || cosmos-portal || web1 || React 18 + Vite SPA, Tailwind CSS, static files served by Apache || |
| | 146 | || GitLab || gitlab.orbit-lab.org (10.50.0.20) || 24 repos under {{{orbit/}}} namespace || |
| | 147 | || NetBox || 10.50.0.93 || v2.9.10 (needs upgrade to 4.x), 290 devices, data from 2020-2021 || |
| | 148 | || Proxmox (ORBIT) || mgmt-vmhost1..5 || 5x Dell R740 (48 cores, 187GB each), 98 VMs (57 running), Ceph RBD + NFS || |
| | 149 | || Proxmox (COSMOS) || mgmt-vmhost1..5-co1 || 3x R430 + 2x R740, 13 VMs (8 running) || |
| | 150 | |
| | 151 | == 6. Service Dependency Map == |
| | 152 | |
| | 153 | This section documents which services call which other services. |
| | 154 | |
| | 155 | === omf-expctl (Experiment Controller CLI) === |
| | 156 | |
| | 157 | All calls routed via the AM proxy at {{{am1:5054}}}. |
| | 158 | |
| | 159 | * {{{omf-expctl}}} -> {{{omf-cmc}}} -- Power control (on/off/reset nodes) |
| | 160 | * {{{omf-expctl}}} -> {{{omf-pxe}}} -- PXE boot setup (set boot image) |
| | 161 | * {{{omf-expctl}}} -> {{{omf-frisbee}}} -- Disk imaging (load images onto nodes) |
| | 162 | * {{{omf-expctl}}} -> {{{omf-saveimage}}} -- Save disk images from nodes |
| | 163 | * {{{omf-expctl}}} -> {{{omf-scheduler}}} -- Permission/reservation check |
| | 164 | |
| | 165 | === Inter-Service Dependencies === |
| | 166 | |
| | 167 | ||= Caller =||= Callee =||= Purpose =||= Via =|| |
| | 168 | || omf-cmc || omf-cmonitor || Wake-on-LAN packet generation || Direct HTTP (CM_wolurl) || |
| | 169 | || omf-auden || omf-rf-control || RF signal generator setup || Direct (should use AM proxy) || |
| | 170 | || omf-status || omf-cmc || Node power state || Direct || |
| | 171 | || omf-status || omf-scheduler || Reservation info || Direct || |
| | 172 | || omf-status || omf-frisbee || Imaging status || Direct || |
| | 173 | |
| | 174 | === External Dependencies === |
| | 175 | |
| | 176 | ||= Service =||= External System =||= Purpose =|| |
| | 177 | || omf-scheduler || LDAP (ldap1/ldap2) || Host attribute management (LdapHostManager) || |
| | 178 | || omf-scheduler || MySQL || Reservation persistence || |
| | 179 | || omf-scheduler || SMTP (mail.orbit-lab.org:25) || Reservation notifications || |
| | 180 | || omf-account-mgmt || LDAP (ldap1/ldap2) || User/group lifecycle management || |
| | 181 | || omf-account-mgmt || MySQL || Account persistence || |
| | 182 | || omf-account-mgmt || SMTP (mail.orbit-lab.org:25) || Account notifications || |
| | 183 | || omf-user-stats || MySQL (multiple databases) || Usage data aggregation || |
| | 184 | || omf-user-stats || LDAP || User lookups || |
| | 185 | |
| | 186 | === cosmos-portal (Web UI) === |
| | 187 | |
| | 188 | All API calls proxied via Apache on web1: |
| | 189 | |
| | 190 | ||= Portal Route =||= Backend =||= Service =|| |
| | 191 | || {{{/account/*}}} || repository2:5017 || omf-account-mgmt || |
| | 192 | || {{{/scheduler/*}}} || am5:5016 || omf-scheduler || |
| | 193 | || {{{/rfmatrix/*}}} || am5:5020 || omf-rfmatrix || |
| | 194 | || {{{/status/*}}} || am5:5021 || omf-status || |
| | 195 | || {{{/inventory/*}}} || am5:5012 || omf-newinventory (legacy) || |
| | 196 | || {{{/user-stats/*}}} || repository2:5015 || omf-user-stats || |
| | 197 | || {{{/mqtt/ws}}} || amqp.orbit-lab.org:15675/ws || RabbitMQ WebSocket || |
| | 198 | |
| | 199 | == 7. Port Registry == |
| | 200 | |
| | 201 | ||= Port =||= Service =||= Deployment =|| |
| | 202 | || 5000 || omf-cmonitor || Console servers (18 hosts, per domain) || |
| | 203 | || 5001 || omf-rf-control || am1 || |
| | 204 | || 5002 || omf-rf-switch || am1 || |
| | 205 | || 5003 || omf-xy-table || am1 || |
| | 206 | || 5004 || omf-array-mgmt || am1 || |
| | 207 | || 5010 || omf-pxe || repository2 || |
| | 208 | || 5011 || omf-frisbee || repository2 || |
| | 209 | || 5012 || omf-saveimage || repository2 || |
| | 210 | || 5013 || omf-cmc || am5 || |
| | 211 | || 5015 || omf-user-stats || repository2 || |
| | 212 | || 5016 || omf-scheduler || am5 || |
| | 213 | || 5017 || omf-account-mgmt || repository2 || |
| | 214 | || 5018 || omf-cosmos-cm || COSMOS Pis || |
| | 215 | || 5019 || omf-auden || COSMOS Pis || |
| | 216 | || 5020 || omf-rfmatrix || am5 || |
| | 217 | || 5021 || omf-status || am5 || |
| | 218 | || 5054 || omf-agg-mgr-proxy || am1 || |
| | 219 | |
| | 220 | '''Next available port: 5022''' |
| | 221 | |
| | 222 | == 8. Technology Stack == |
| | 223 | |
| | 224 | === Backend === |
| | 225 | |
| | 226 | * '''Ruby 3.2.x''' -- Primary language for all microservices |
| | 227 | * '''Sinatra''' (v4.x) -- Web framework (all services inherit from {{{OMFService}}} base class) |
| | 228 | * '''Puma''' -- Application server |
| | 229 | * '''ActiveRecord''' -- ORM for MySQL-backed services (scheduler, account-mgmt, user-stats) |
| | 230 | * '''Ox''' -- Fast XML parser/generator for OMF XML responses |
| | 231 | * '''Python 3.x / Flask''' -- omf-array-mgmt only |
| | 232 | * '''sinatra-param''' -- Request parameter validation via DSL |
| | 233 | |
| | 234 | === Data Stores === |
| | 235 | |
| | 236 | * '''MySQL / MariaDB''' -- Persistence for scheduler, account-mgmt, user-stats, rfmatrix |
| | 237 | * '''OpenLDAP''' -- User/group directory (ldap1/ldap2, port 389) |
| | 238 | * '''RabbitMQ''' -- MQTT broker for node communication and XY table telemetry |
| | 239 | |
| | 240 | === Frontend === |
| | 241 | |
| | 242 | * '''React 18''' -- cosmos-portal SPA |
| | 243 | * '''Vite''' -- Build tool |
| | 244 | * '''Tailwind CSS''' -- Styling |
| | 245 | * '''Apache''' -- Static file serving + reverse proxy on web1 |
| | 246 | |
| | 247 | === Operations === |
| | 248 | |
| | 249 | * '''Debian packaging''' via FPM ({{{make deb}}}) |
| | 250 | * '''systemd''' service units (auto-enabled on package install) |
| | 251 | * '''Prometheus''' metrics at {{{/metrics}}} on all services |
| | 252 | * '''LibreNMS''' -- Network device monitoring (190 devices via SNMP) |
| | 253 | * '''Git submodules''' -- {{{omf-common}}} (shared framework), {{{omf-logging-db}}} (database helpers), {{{omf-ldap}}} (LDAP helpers) |
| | 254 | |
| | 255 | === Configuration === |
| | 256 | |
| | 257 | Configuration is merged in order (later files override earlier): |
| | 258 | |
| | 259 | 1. {{{default/config.yml}}} -- Built-in defaults (shipped with package) |
| | 260 | 2. {{{/etc/omf-services/config.yml}}} -- Global settings |
| | 261 | 3. {{{/etc/omf-services/<service>.yml}}} -- Service-specific (e.g., {{{cmonitor.yml}}}) |
| | 262 | 4. {{{./config.yml}}} -- Development override (not installed in production) |
| | 263 | |
| | 264 | === PXE Boot Images === |
| | 265 | |
| | 266 | * '''omf-5.8''' (current) -- Alpine 3.23, kernel 6.18 LTS, Ruby 3.4.8, MQTT-based RC, 97MB initfs |
| | 267 | * '''omf-5.7''' (legacy) -- Alpine LTS 5.15, Ruby 3.1, XMPP-based RC, 41MB initfs |
| | 268 | * PXE images served from {{{root@repository2:/tftpboot/}}} |