Question:

A system with multiple products like Jira, Confluence, and BitBucket. Each customer may have purchased licenses for several of these products. Every product has different types of entities, such as Sprints, Features, and Tasks in Jira, documents in Confluence, and Code Repositories in BitBucket. The owner can assign hashtags to these entities.

Function Requirements: 

  • The basic functionality we need here is that users should be able to add product tags.
  • Now to add any tag first user should be able to create a tag with a name and description.
  • Another important functionality required here is that while the user is typing the hashtag there should be suggestions similar to those hashtags, and the user should be able to use it. This will avoid creating duplicate tags.
  • The user should be able to search all the Jiras, documents with this tag. 

Non-Functional Requirements:

  • The system should be highly scalable
  • Low latency during auto-complete
  • The system should save tags efficiently, i.e. avoid saving duplicate tags
  • Code should be maintainable
  • Consistency: Tags created should be shown to other users as soon as possible.

Cost Estimation:

To estimate the traffic for this tagging system, let’s assume:

Total users: 1 million (1M)
Active users per day: 10% (100K users)
Peak concurrency: 5% of active users at a time (5K concurrent users)
Average API calls per user per day: 10 (search, create, autocomplete, fetch entities, etc.)
Read-heavy system: 80% reads (search, autocomplete, fetching entities), 20% writes (tag creation, assignments)
Query per second: (100K users * 10 query per day) / 86400 sec =  10 QPS

The system should be a read-heavy system.

Data storage estimation:

Hashtag Storage:
Daily tag creation: 100k daily user * 20% writes = 20K tag creation daily
Monthly = 20K * 30 = 600K, Yearly = 600 * 12 = 7200 K = 7.2 M ~10 M
Average tag size: Name(20 char) = 20 Bytes, Description(50 char): 50 Bytes, 
                                                                              Meta Data(30 char): 30 Bytes = 100 Bytes
Storage Needed for 1 year tag created: 100 Byte * 10 M = 1 GB
Storage Needed for 5 year tag created: 100 Byte * 50 M = 5 GB

DB Choice: SQL
Cache: Redis
Cache Storage
Hot usage tag: 10% of yearly tags created = 1M
Size of each tag = 100Byte
Total storage required = 1M * 100Byte = 100 MB ~ 1GB redis storage needed

Why SQL over NoSQL here?

  • The data consists of products, entities and tags, and to fetch all the products of a particular tag JOINS is required and is better served with SQL database instead of NoSQL.
  • Tags should be unique to avoid duplicates. SQL enforces uniqueness using indexes and ensures strong consistency & preventing duplicate tags.
  • B-Tree indexes helps fast retrieval of data.

Why Redis for Frequent queries? 

  • High speed read: Redis is in-memory, making it 100x faster than querying a database. Ideal for autocomplete and most-used hashtags queries.
  • Least Recently Used (LRU) eviction β†’ Ensures old, unused tags are removed automatically.
  • Hot Tags are cached for faster querying through redis instead of reaching till DB.

Schema Design:

Products schema
-- Products Table: Stores different products (Jira, Confluence, BitBucket)
CREATE TABLE products (
 id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
 name VARCHAR(255) NOT NULL UNIQUE,  -- Unique product names
 created_at TIMESTAMP DEFAULT NOW()
);
Tags Schema
-- Tags Table: Stores hashtags with a unique constraint
CREATE TABLE tags (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(100) NOT NULL UNIQUE,  -- Unique hashtags
    description TEXT DEFAULT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);
Entities Schema
-- Entities Table: Stores different entity types across products
CREATE TABLE entities (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID REFERENCES products(id) ON DELETE CASCADE,
    customer_id UUID REFERENCES customers(id) ON DELETE CASCADE,
    type VARCHAR(50) NOT NULL,  -- E.g., Sprint, Task, Document, Repo
    data JSONB NOT NULL,  -- Flexible data structure for different products
    created_at TIMESTAMP DEFAULT NOW()
);
Entity-Tag Mapping Table
-- Entity-Tag Mapping Table: Many-to-Many relationship
	CREATE TABLE entity_tags (
    entity_id UUID REFERENCES entities(id) ON DELETE CASCADE,
    tag_id UUID REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (entity_id, tag_id)
);
Tag Usage Table
-- Tag Usage Table: Keeps track of tag frequency for "Top-10 Tags"
CREATE TABLE tag_usage (
    tag_id UUID REFERENCES tags(id) ON DELETE CASCADE,
    usage_count INT DEFAULT 0,  -- Number of times a tag has been used
    last_used TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (tag_id)
);
Indexes:
--Indexes for Fast Searching & Query Optimization
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entities_customer_product ON entities(customer_id, product_id);

APIs: 

Create a Tag
Endpoint: POST /tags
Request Body: 
{
    "name": "#project-xyz",
    "description": "Tags all items related to Project XYZ"
}
Success Response Body:
HTTP/1.1 201 Created
Content-Type: application/json
	
{
    "tag_id": "abc123",
    "name": "#project-xyz",
    "message": "Tag created successfully"
}

Failure Response Body:
HTTP/1.1 409 Conflict
Content-Type: application/json

{
    "error": "Tag already exists"
}
Entity Tagging
Endpoint: POST /entities/{entity_id}/tags
Request Body: 
{
    "tag_id": "abc123"
}
Response Body:
{
    "message": "Tag assigned successfully"
}
Auto-Suggest Tags
Endpoint: GET /tags/suggest?query=pro
Response:
[
    {"tag_id": "abc123", "name": "#project-xyz"},
    {"tag_id": "xyz789", "name": "#product-launch"}
]
Remove a Tag from an Entity
Endpoint: DELETE /entities/{entity_id}/tags/{tag_id}
	Response:
	{
    "message": "Tag removed successfully"
}
Tag-Based Search APIs:
Endpoint: GET /tags/{tag_name}/entities
Response:
[
    {
        "entity_id": "e1",
        "product": "Jira",
        "type": "Sprint",
        "name": "Sprint 23"
    },
    {
        "entity_id": "e2",
        "product": "Confluence",
        "type": "Document",
        "name": "Design Doc"
    }
]
Get Top 10 Hashtags
Endpoint: GET /tags/top10
	Response: 
	[
    {"tag_name": "#bugfix", "count": 200},
    {"tag_name": "#project-xyz", "count": 150},
    {"tag_name": "#release-v1", "count": 100}
]

Concurrency control:

Not required as tag names are made UNIQUE in DB so duplicate tags will be rejected automatically, still locking can be used.

Fault Tolerance:

  • Database Fault Tolerance:
    Solution: Replication & Failover
    Enable Automatic Failover
    Sharding or Partitioning for large datasets (e.g., hashtags per product).
  • API & Server Fault Tolerance:
    Solution: Load Balancing & Auto Scaling
  • Caching & Rate Limiting:
    Solution: Redis for Frequent Queries
    – Store most used hashtags in Redis.
    – Use Rate Limiting to prevent spammy requests.
  • Eventual Consistency & Asynchronous Processing:
    Solution: Queue for Background Processing
    Use Kafka or RabbitMQ for processing tag assignments asynchronously.
    Helps when many users are tagging entities at the same time.

Categorized in:

HLD, Must Know HLD,