Artifacts¶

Artifacts are files produced by graph executions within a workspace.

Artifact Types¶

Type	Extension	Description
`result`	.json	Graph execution result
`report`	.md	Generated markdown report
`screenshot`	.png	Browser screenshots
`pdf`	.pdf	Generated PDF documents
`data`	.json	Structured data output
`transcript`	.txt	Video transcripts
`embeddings`	.npy	Vector embeddings

Artifact Model¶

@dataclass
class Artifact:
    artifact_id: str        # Unique identifier
    workspace_id: str       # Parent workspace
    run_id: str            # Parent run (optional)
    path: str              # Storage path
    artifact_type: str     # Type from table above
    size_bytes: int        # File size
    content_type: str      # MIME type
    created_at: datetime   # Creation timestamp
    metadata: dict         # Additional metadata
    file_url: str          # Download URL

Storage Layout¶

workspaces/{workspace_id}/
├── runs/
│   └── {run_id}/
│       ├── result.json         # Graph result
│       ├── report.md           # Generated report
│       └── artifacts/
│           ├── screenshots/
│           │   ├── 001_example_com.png
│           │   └── 002_example_org.png
│           ├── data/
│           │   ├── entities.json
│           │   └── chunks.json
│           └── pdfs/
│               └── document.pdf
└── outputs/
    ├── weekly_report.md
    └── entity_network.json

Working with Artifacts¶

List Artifacts¶

workspace = await manager.get("ws_abc123")
artifacts = await workspace.list_artifacts()

for artifact in artifacts:
    print(f"{artifact.path}")
    print(f"  Type: {artifact.artifact_type}")
    print(f"  Size: {artifact.size_bytes} bytes")
    print(f"  URL: {artifact.file_url}")

Filter by Type¶

# Get all screenshots
screenshots = await workspace.list_artifacts(
    artifact_type="screenshot"
)

# Get all from specific run
run_artifacts = await workspace.list_artifacts(
    run_id="run_xyz789"
)

Download Artifact¶

artifact = await workspace.get_artifact("art_abc123")

# Get URL
print(f"Download: {artifact.file_url}")

# Download content
content = await workspace.download_artifact("art_abc123")

Upload Artifact¶

# Upload file
artifact = await workspace.upload_artifact(
    path="reports/custom_report.md",
    content=report_content.encode(),
    artifact_type="report",
    metadata={"author": "analyst"}
)

Artifact Creation During Graph Execution¶

Automatic Artifacts¶

When graphs execute, these artifacts are created automatically:

# result.json - always created
{
  "graph_name": "research_pipeline",
  "stages_completed": 5,
  "outputs": {...}
}

# report.md - if to_report operation used
# Research Report
## Summary
...

Custom Artifacts¶

Use store_artifact operation:

- op: store_artifact
  input: data
  params:
    path: "analysis/entities.json"
    type: "data"
  output: artifact_ref

Artifact Metadata¶

Standard Metadata¶

{
  "artifact_id": "art_abc123",
  "created_at": "2024-01-15T10:30:00Z",
  "content_type": "image/png",
  "size_bytes": 125000,
  "checksum": "sha256:abc123..."
}

Custom Metadata¶

await workspace.upload_artifact(
    path="screenshots/homepage.png",
    content=image_data,
    artifact_type="screenshot",
    metadata={
        "url": "https://example.com",
        "viewport": "1920x1080",
        "full_page": True
    }
)

Artifact Lifecycle¶

stateDiagram-v2
    [*] --> Created: upload/auto-create
    Created --> Available: processing complete
    Available --> Indexed: index enabled
    Available --> Archived: archive workspace
    Indexed --> Archived: archive workspace
    Archived --> Available: unarchive
    Available --> [*]: delete
    Archived --> [*]: delete with workspace

Querying Artifacts¶

By Path Pattern¶

# Get all markdown reports
reports = await workspace.list_artifacts(
    path_pattern="**/*.md"
)

# Get screenshots from specific run
screenshots = await workspace.list_artifacts(
    path_pattern="runs/run_xyz789/artifacts/screenshots/*"
)

By Date Range¶

from datetime import datetime, timedelta

# Last 7 days
recent = await workspace.list_artifacts(
    created_after=datetime.utcnow() - timedelta(days=7)
)

By Metadata¶

# Screenshots of specific URL
screenshots = await workspace.list_artifacts(
    artifact_type="screenshot",
    metadata_filter={"url": "https://example.com"}
)

Artifact Size Limits¶

Artifact Type	Max Size
screenshot	10 MB
pdf	50 MB
result	10 MB
report	5 MB
data	100 MB
embeddings	500 MB

Best Practices¶

Use consistent paths - Follow the standard layout
Add metadata - Makes querying easier
Clean up - Delete unnecessary artifacts
Monitor size - Watch workspace total size
Use checksums - Verify artifact integrity

Error Handling¶

from cbintel.workspace import ArtifactNotFoundError, StorageError

try:
    artifact = await workspace.get_artifact("art_abc123")
except ArtifactNotFoundError:
    print("Artifact not found")
except StorageError as e:
    print(f"Storage error: {e}")