Skip to content

Artifacts

Artifacts are files produced by graph executions within a workspace.

Artifact Types

Type Extension Description
result .json Graph execution result
report .md Generated markdown report
screenshot .png Browser screenshots
pdf .pdf Generated PDF documents
data .json Structured data output
transcript .txt Video transcripts
embeddings .npy Vector embeddings

Artifact Model

@dataclass
class Artifact:
    artifact_id: str        # Unique identifier
    workspace_id: str       # Parent workspace
    run_id: str            # Parent run (optional)
    path: str              # Storage path
    artifact_type: str     # Type from table above
    size_bytes: int        # File size
    content_type: str      # MIME type
    created_at: datetime   # Creation timestamp
    metadata: dict         # Additional metadata
    file_url: str          # Download URL

Storage Layout

workspaces/{workspace_id}/
├── runs/
│   └── {run_id}/
│       ├── result.json         # Graph result
│       ├── report.md           # Generated report
│       └── artifacts/
│           ├── screenshots/
│           │   ├── 001_example_com.png
│           │   └── 002_example_org.png
│           ├── data/
│           │   ├── entities.json
│           │   └── chunks.json
│           └── pdfs/
│               └── document.pdf
└── outputs/
    ├── weekly_report.md
    └── entity_network.json

Working with Artifacts

List Artifacts

workspace = await manager.get("ws_abc123")
artifacts = await workspace.list_artifacts()

for artifact in artifacts:
    print(f"{artifact.path}")
    print(f"  Type: {artifact.artifact_type}")
    print(f"  Size: {artifact.size_bytes} bytes")
    print(f"  URL: {artifact.file_url}")

Filter by Type

# Get all screenshots
screenshots = await workspace.list_artifacts(
    artifact_type="screenshot"
)

# Get all from specific run
run_artifacts = await workspace.list_artifacts(
    run_id="run_xyz789"
)

Download Artifact

artifact = await workspace.get_artifact("art_abc123")

# Get URL
print(f"Download: {artifact.file_url}")

# Download content
content = await workspace.download_artifact("art_abc123")

Upload Artifact

# Upload file
artifact = await workspace.upload_artifact(
    path="reports/custom_report.md",
    content=report_content.encode(),
    artifact_type="report",
    metadata={"author": "analyst"}
)

Artifact Creation During Graph Execution

Automatic Artifacts

When graphs execute, these artifacts are created automatically:

# result.json - always created
{
  "graph_name": "research_pipeline",
  "stages_completed": 5,
  "outputs": {...}
}

# report.md - if to_report operation used
# Research Report
## Summary
...

Custom Artifacts

Use store_artifact operation:

- op: store_artifact
  input: data
  params:
    path: "analysis/entities.json"
    type: "data"
  output: artifact_ref

Artifact Metadata

Standard Metadata

{
  "artifact_id": "art_abc123",
  "created_at": "2024-01-15T10:30:00Z",
  "content_type": "image/png",
  "size_bytes": 125000,
  "checksum": "sha256:abc123..."
}

Custom Metadata

await workspace.upload_artifact(
    path="screenshots/homepage.png",
    content=image_data,
    artifact_type="screenshot",
    metadata={
        "url": "https://example.com",
        "viewport": "1920x1080",
        "full_page": True
    }
)

Artifact Lifecycle

stateDiagram-v2
    [*] --> Created: upload/auto-create
    Created --> Available: processing complete
    Available --> Indexed: index enabled
    Available --> Archived: archive workspace
    Indexed --> Archived: archive workspace
    Archived --> Available: unarchive
    Available --> [*]: delete
    Archived --> [*]: delete with workspace

Querying Artifacts

By Path Pattern

# Get all markdown reports
reports = await workspace.list_artifacts(
    path_pattern="**/*.md"
)

# Get screenshots from specific run
screenshots = await workspace.list_artifacts(
    path_pattern="runs/run_xyz789/artifacts/screenshots/*"
)

By Date Range

from datetime import datetime, timedelta

# Last 7 days
recent = await workspace.list_artifacts(
    created_after=datetime.utcnow() - timedelta(days=7)
)

By Metadata

# Screenshots of specific URL
screenshots = await workspace.list_artifacts(
    artifact_type="screenshot",
    metadata_filter={"url": "https://example.com"}
)

Artifact Size Limits

Artifact Type Max Size
screenshot 10 MB
pdf 50 MB
result 10 MB
report 5 MB
data 100 MB
embeddings 500 MB

Best Practices

  1. Use consistent paths - Follow the standard layout
  2. Add metadata - Makes querying easier
  3. Clean up - Delete unnecessary artifacts
  4. Monitor size - Watch workspace total size
  5. Use checksums - Verify artifact integrity

Error Handling

from cbintel.workspace import ArtifactNotFoundError, StorageError

try:
    artifact = await workspace.get_artifact("art_abc123")
except ArtifactNotFoundError:
    print("Artifact not found")
except StorageError as e:
    print(f"Storage error: {e}")