This document provides a comprehensive technical overview of the VisionixAI system, a command-line tool designed for real-time pose detection and zone-based activity analysis from video files.
VisionixAI is a hybrid system utilizing a Node.js Command Line Interface (CLI) to orchestrate a powerful Python-based machine learning core. Its primary function is to process a video stream, identify human poses within it, and track which zones of a predefined grid are occupied. The system reports the active/inactive status of each zone in real-time, making it suitable for applications like security monitoring, retail analytics, or interactive installations.
Core Features:
- CLI-Driven Operation: A simple and scriptable command-line interface (
visionix) for initiating analysis and setup. - Automated Environment Setup: A one-time
setup-mlcommand creates a dedicated Python virtual environment and installs all necessary dependencies. - Pose Detection: Leverages Google's MediaPipe library to accurately detect and track 33 different body landmarks in real-time.
- Grid-Based Zone Tracking: Divides the video frame into a configurable grid (e.g., 3x3) and monitors human presence within each zone.
- Real-time Activity Triggering: Reports the status of each zone as "ON" (active) or "OFF" (inactive) based on a configurable timeout, providing a simple mechanism for triggering external events.
- Visual Feedback: Renders the video with an overlay showing the grid, detected pose landmarks, and provides console output for zone status.
Technology Stack:
- CLI & Orchestration: Node.js,
child_process - ML & Video Processing: Python 3
- Computer Vision: OpenCV (
cv2) - Pose Estimation: MediaPipe
The system is composed of two main parts: a Node.js CLI layer and a Python ML Core. The CLI acts as the user-facing entry point and process manager, while the Python core handles all the heavy lifting of video processing and machine learning.
- Node.js CLI Layer (
cli/): This layer is responsible for parsing user commands, validating inputs (like file paths), and spawning the Python script as a separate process. It bridges the gap between the user's shell and the ML logic. - Python ML Core (
ml-core/): This self-contained module performs the video analysis. It reads the video file, processes each frame, runs the pose detection model, calculates zone activity, and prints the results to standard output.
The following diagram illustrates the high-level interaction between these components when a user initiates an analysis.
flowchart TD
%% STYLE DEFINITIONS
classDef input fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000;
classDef process fill:#fff3e0,stroke:#fb8c00,stroke-width:2px,color:#000;
classDef decision fill:#ffebee,stroke:#e53935,stroke-width:2px,color:#000;
classDef output fill:#e8f5e9,stroke:#43a047,stroke-width:2px,color:#000;
classDef user fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#000;
classDef tech fill:#ede7f6,stroke:#512da8,stroke-width:2px,color:#000;
classDef log fill:#f5f5f5,stroke:#9e9e9e,stroke-width:2px,color:#000;
%% SIMULATED STICK FIGURE USER (limited to flowchart)
U0(["👤 User"]):::user
%% USERS
subgraph UserRoles
A1["Admin / Developer"]:::user
A2["Classroom / Office Users"]:::user
end
%% INPUT MODULE
U0 --> B1["Configure Zones via CLI"]:::input
U0 --> B2["Start/Stop Camera Stream"]:::input
B1 --> C1["Grid Mapping Module"]:::process
B2 --> C2["Camera Input (Webcam/IP)"]:::input
%% PROCESSING MODULE
C2 --> D1["YOLOv8 Human Detection"]:::process
D1 --> D2["Centroid Tracker"]:::process
D2 --> D3["Zone Mapper"]:::process
D3 --> D4["Zone Timer Logic"]:::process
%% DECISION LOGIC
D4 --> E1{"Zone Occupied > 10s?"}:::decision
E1 -- Yes --> F1["Emit ON Signal"]:::output
E1 -- No --> F2{"Zone Empty > 10s?"}:::decision
F2 -- Yes --> F3["Emit OFF Signal"]:::output
%% DEVICE CONTROL
F1 --> G1["Device Controller / API"]:::output
F3 --> G1
G1 --> G2["Lights / Fans ON/OFF"]:::output
%% LOGGING & MONITORING
D4 --> H1["Zone Occupancy Logs"]:::log
H1 --> I1["CLI Status Display"]:::log
H1 --> I2["Web Dashboard (Future)"]:::log
%% FUTURE INTEGRATION
G1 --> J1["MQTT / Socket.IO"]:::tech
H1 --> J2["Flask / FastAPI Layer"]:::tech
%% USER FEEDBACK LOOP
I1 --> A1
G2 --> A2
%% TECH STACK
subgraph Technologies
T1["Python + OpenCV"]:::tech
T2["YOLOv8 (Ultralytics)"]:::tech
T3["Tkinter (CLI)"]:::tech
T4["Flask / FastAPI"]:::tech
T5["MQTT / Socket.IO"]:::tech
end
T1 --> D1
T2 --> D1
T3 --> I1
T4 --> J2
T5 --> J1
Component Breakdown:
| File Path | Language | Role |
|---|---|---|
cli/bin/visionix.js |
JS | Main CLI Entry Point. Parses analyze and setup-ml commands. Spawns the Python process. |
cli/bin/scripts/setup-ml.js |
JS | ML Environment Setup. Creates a Python virtual environment and installs dependencies via pip. |
cli/lib/runner.js |
JS | A utility module for running the Python analysis script. Provides path resolution and spawning logic. |
cli/ml-core/start.py |
Python | Python Bootstrapper. Receives the video path from the CLI and calls the main processing function. |
cli/ml-core/app.py |
Python | Core ML Logic. Contains all video processing, pose detection, and zone tracking algorithms. |
This section details the primary operational flows of the VisionixAI system.
Before any analysis can be run, the Python environment must be prepared. This is a one-time operation.
Execution Steps:
- The user runs the command
visionix setup-mlin their terminal. - The
visionix.jsscript identifies thesetup-mlcommand. - It then
requires and executes thecli/bin/scripts/setup-ml.jsscript. setup-ml.jsperforms two main actions usingchild_process.execSync: a. It creates a Python virtual environment atcli/.visionix-venv. b. It activates the environment'spipto install the packages listed incli/ml-core/requirements.txt.- Output from the setup process is streamed directly to the user's terminal.
// File: cli/bin/scripts/setup-ml.js
// Creates the virtual environment
execSync(`python3 -m venv ${venvDir}`, { stdio: 'inherit' });
// Installs dependencies from requirements.txt
execSync(`${venvDir}/bin/pip install -r ${requirements}`, { stdio: 'inherit' });This is the primary workflow for using the system.
Execution Sequence:
The following diagram illustrates the sequence of events when a user runs the analyze command.
%%{init: {"theme": "base", "themeVariables": {
"actorTextColor": "#ffffff",
"actorBorder": "#ff9f1c",
"actorBackground": "#ff9f1c",
"participantTextColor": "#ffffff",
"participantBackground": "#3a86ff",
"participantBorder": "#3a86ff",
"sequenceNumberColor": "#ff006e",
"primaryColor": "#8338ec",
"primaryTextColor": "#ffffff",
"tertiaryColor": "#ffbe0b",
"tertiaryTextColor": "#000000"
}}}%%
sequenceDiagram
actor User
participant CLI as CLI (visionix.js)
participant Python as Python (start.py)
participant Core as Core (app.py)
User->>CLI: `visionix analyze ./path/to/video.mp4`
CLI->>CLI: Parse `'analyze'` command and video path
CLI->>CLI: Validate video file existence
CLI->>Python: `spawn('python3', ['-u', scriptPath, videoPath])`
Python->>Python: Receive `videoPath` from `sys.argv`
Python->>Core: `run_stream(videoPath)`
Core->>Core: Open video with OpenCV
loop For each frame
Core->>Core: Read frame
Core->>Core: Detect pose with MediaPipe
Core->>Core: Calculate active zones
Core->>Core: Update `zone_last_seen` timestamps
Core->>Core: Check for inactive zones via `UNSEEN_TIMEOUT`
Core->>User: `print(f"Zone {zone_id} active...")`
Core->>User: `print(f"Zone {zone_id} inactive...")`
end
Core-->>Python: Loop ends (video finished)
Python-->>CLI: Process exits
CLI-->>User: Process exited with code `0`
Data Flow:
- Input: The absolute path to a video file is passed as a command-line argument from
visionix.jstostart.py. - Frame Processing:
app.pyusescv2.VideoCaptureto read the video frame by frame. Each frame is an OpenCV image (NumPy array). - Color Conversion: The frame is converted from BGR (OpenCV's default) to RGB, as this is the format MediaPipe expects.
# File: cli/ml-core/app.py rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) results = pose.process(rgb)
- Pose Landmarks: MediaPipe's
pose.process()returns aresultsobject. If a pose is detected,results.pose_landmarkscontains the normalized(x, y, z)coordinates for 33 body landmarks. - Zone Mapping: The normalized coordinates are converted to pixel coordinates. The
get_zone()function then maps these pixel coordinates to a specific grid cell ID (e.g.,"1-2"). - State Management: The
zone_last_seendictionary is the system's state. It stores the last time (time.time()) a landmark was detected in each zone. - Output: The system prints the status of every zone to
stdouton each frame.visionix.jsuses the{ stdio: 'inherit' }option inspawnto ensure this output is displayed directly in the user's terminal.
The app.py script is the heart of the system. It contains all the logic for computer vision and activity analysis.
These constants at the top of app.py allow for easy tuning of the system's behavior.
GRID_ROWS = 3: The number of horizontal divisions in the grid.GRID_COLS = 3: The number of vertical divisions in the grid.UNSEEN_TIMEOUT = 5: The number of seconds a zone can be empty before it's declared "inactive".
To change the grid to a 5x5 configuration, a developer would simply modify these values:
# File: cli/ml-core/app.py
GRID_ROWS = 5
GRID_COLS = 5
UNSEEN_TIMEOUT = 2 # Make the trigger more sensitiveThe system tracks the state of each zone (e.g., "0-0", "0-1", etc.) using a combination of a dictionary and a timeout.
zone_last_seen = {}: A dictionary mapping azone_idstring to the Unix timestamp of the last detection in that zone.active_zones = set(): A temporary set on each frame to hold zones where landmarks are currently detected.
The state logic for a single zone can be visualized as follows:
stateDiagram-v2
direction LR
[*] --> Inactive
Inactive --> Active: Landmark detected in zone
Active --> Inactive: time.time() - last_seen > UNSEEN_TIMEOUT
Active --> Active: Landmark detected in zone
State Logic Implementation:
This logic is executed for every zone on every frame.
# File: cli/ml-core/app.py
now = time.time()
for i in range(GRID_ROWS):
for j in range(GRID_COLS):
zone_id = f"{i}-{j}"
last_seen = zone_last_seen.get(zone_id, 0) # Get last seen time, default to 0
elapsed = now - last_seen
if elapsed > UNSEEN_TIMEOUT:
# If the time since last seen exceeds the timeout, the zone is OFF
print(f"⚠ Zone {zone_id} inactive for {int(elapsed)}s → Trigger: OFF")
else:
# Otherwise, the zone is considered ON
print(f" Zone {zone_id} active → Trigger: ON")The system can be run without a GUI window for server-side or automated environments. To enable headless mode, comment out the cv2.imshow and cv2.waitKey lines in app.py.
# File: cli/ml-core/app.py
# ... inside the while loop ...
# Comment below to run headless
# cv2.imshow("VisionixAI", frame)
# if cv2.waitKey(1) & 0xFF == ord('q'):
# breakThe Node.js scripts manage the user experience and the execution of the Python core.
This script is configured as an executable in package.json (not provided, but implied). It acts as a router for user commands.
- Command Parsing: It reads
process.argvto determine the command (analyze,setup-ml) and any associated arguments (e.g., video path). - Input Validation: Before launching the analysis, it performs critical checks using the
fsmodule to ensure the Python script and the input video file both exist. This prevents runtime errors in the Python process.// File: cli/bin/visionix.js if (!fs.existsSync(scriptPath)) { console.error(` start.py not found at: ${scriptPath}`); process.exit(1); } if (!input || !fs.existsSync(input)) { console.error(` Input video not found: ${input}`); process.exit(1); }
- Process Spawning: It uses
child_process.spawnto runstart.py. Key options used are:'-u': This flag is passed topython3for unbuffered binary stdout and stderr, ensuring that output from the Python script is printed in real-time rather than being held in a buffer.{ stdio: 'inherit' }: This is crucial. It pipes thestdin,stdout, andstderrof the child process directly to the main Node.js process, making the Python script's output appear seamlessly in the user's terminal.
The runner.js file provides a more modular way to execute the Python script. While visionix.js contains its own spawning logic, runner.js encapsulates this functionality. It demonstrates a robust way to resolve the path to start.py from its own location, making it resilient to where the script is called from.
// File: cli/lib/runner.js
function runAnalysis(videoPath) {
// Navigate two levels up from cli/lib/ to root, then into ml-core
const rootPath = path.resolve(__dirname, '..', '..');
const pythonPath = path.join(rootPath, 'ml-core', 'start.py');
// ... spawn logic ...
}This module could be used by other parts of a larger Node.js application that need to trigger the analysis programmatically.
-
Error:
start.py not found- Cause: The Node.js script cannot locate the Python entry point. This is usually due to a file being moved or an incorrect path resolution.
- Solution: Verify that the
ml-core/start.pyfile exists relative to thecli/directory. The error message prints the exact path that was checked.
-
Error:
Input video not found- Cause: The video path provided to the
analyzecommand is incorrect or the file does not exist. - Solution: Ensure you are providing a correct relative or absolute path to the video file.
- Cause: The video path provided to the
-
Error:
ModuleNotFoundError: No module named 'cv2'(ormediapipe)- Cause: The Python dependencies are not installed, or the script is not being run within the correct virtual environment.
- Solution: Run the setup command:
visionix setup-ml. This will create the.visionix-venvand install all required packages. The CLI should automatically use this environment, but if running manually, ensure it's activated.
-
Error:
Cannot open video.- Cause: This error comes from OpenCV within
app.py. It means that while the file path was valid, OpenCV was unable to read or decode the video stream. This can happen with corrupted files or unsupported video codecs. - Solution: Try converting the video to a standard format like H.264 MP4. Verify the file is playable in a standard video player.
- Cause: This error comes from OpenCV within