[{"data":1,"prerenderedAt":985},["ShallowReactive",2],{"meaningless-code-image-content-product-search":3},{"id":4,"title":5,"body":6,"description":967,"extension":968,"meta":969,"navigation":970,"path":971,"seo":972,"sitemap":977,"slug":981,"stem":982,"tags":983,"__hash__":984},"meaninglessCode\u002Fmeaningless-code\u002Fimage-content-product-search.md","Architecting AI-Powered Image Content Product Search in 5 Hours",{"type":7,"value":8,"toc":963},"minimark",[9,16,20,26,29,40,49,54,56,62,93,96,101,107,124,129,149,180,182,187,209,211,216,223,225,232,292,294,299,305,307,320,328,330,335,341,343,348,353,390,392,397,404,410,412,419,471,479,575,577,585,590,618,623,652,654,659,785,787,792,829,831,836,946,948],[10,11,12],"slice-meta",{},[13,14,15],"p",{},"Last Updated: 2026-07-04 | Tags: System Design, AWS, Bedrock, OpenSearch, Vector Search, AI Architecture",[17,18],"space",{"height":19},16,[21,22],"top-image",{"alt":23,"caption":24,"src":25},"Abstract high tech network node architecture visualization","Architecting scalable AI image processing and hybrid search pipelines.","https:\u002F\u002Fimages.unsplash.com\u002Fphoto-1451187580459-43490279c0fa?auto=format&fit=crop&w=1600&q=80",[17,27],{"height":28},12,[30,31,32],"article-section",{},[13,33,34,35,39],{},"Just ",[36,37,38],"strong",{},"1.5 years into my software engineering career",", I was presented with a senior-level system design interview challenge: designing an end-to-end backend architecture to analyze over 100,000 product images in AWS S3 and enable fine-grained, natural-language visual search.",[30,41,42],{},[13,43,44,45,48],{},"I had ",[36,46,47],{},"only around ~5 hours of time"," on my end to analyze the problem, formulate the technical architecture, design asynchronous processing pipelines, define database schemas, map search execution flows, and evaluate core trade-offs.",[30,50,51],{},[13,52,53],{},"This article captures the complete problem statement and the exact system architecture document I produced during that ~5-hour effort.",[17,55],{"height":19},[57,58,59],"sub-title",{},[13,60,61],{},"The Interview Problem Statement",[63,64,65,68,81,84],"quote",{},[13,66,67],{},"We have approximately 100,000 product images stored in AWS S3. We need to design a backend solution that can:",[69,70,71,75,78],"ol",{},[72,73,74],"li",{},"Analyze each product image using an AI image analysis service.",[72,76,77],{},"Extract and store searchable metadata in a datastore.",[72,79,80],{},"Enable users and administrators to search for products based on the image content.",[13,82,83],{},"For example, a user should be able to perform a search such as:",[85,86,87,90],"ul",{},[72,88,89],{},"\"Show all dresses with red flowers\"",[72,91,92],{},"\"Green t shirt with no prints\"",[17,94],{"height":95},20,[57,97,98],{},[13,99,100],{},"The Architecture Solution",[102,103,104],"sub-sub-title",{},[13,105,106],{},"1. Overview",[30,108,109],{},[13,110,111,114,115,119,120,123],{},[36,112,113],{},"Objective:"," Design a scalable, fault-tolerant backend that analyses ~100,000 product images already stored in S3, extracts AI-generated metadata, persists it, and serves natural-language product search by image content (e.g., ",[116,117,118],"em",{},"\"dresses with red flowers\"",", ",[116,121,122],{},"\"green t-shirt with no prints\"",").",[30,125,126],{},[13,127,128],{},"The solution must:",[130,131,132],"list",{},[85,133,134,137,140,143,146],{},[72,135,136],{},"Process product images stored in S3 using an AI image analysis service.",[72,138,139],{},"Extract structured attributes and a semantic embedding per image.",[72,141,142],{},"Store metadata in a durable source of truth and a search index.",[72,144,145],{},"Support natural-language search, including exact filters and negation.",[72,147,148],{},"Remain scalable, asynchronous, and fault tolerant.",[30,150,151],{},[13,152,153,156,157,159,160,164,165,119,168,171,172,175,176,179],{},[36,154,155],{},"Key Insight:"," The query ",[116,158,122],{}," contains a negation (",[161,162,163],"code",{},"no prints",") plus exact constraints (",[161,166,167],{},"green",[161,169,170],{},"t-shirt","). Pure vector search fails here—an embedding of that text sits close to t-shirts that ",[116,173,174],{},"do"," have prints. The design therefore pairs precise structured attributes (for filtering and negation) with embeddings (for fuzzy intent like ",[116,177,178],{},"\"red flowers\"","). That hybrid model is the spine of the solution.",[17,181],{"height":19},[102,183,184],{},[13,185,186],{},"2. Assumptions",[130,188,189],{},[85,190,191,194,197,200,203,206],{},[72,192,193],{},"Approximately 100,000 product images are already present in AWS S3.",[72,195,196],{},"A product can have multiple images; images are the individual unit of analysis.",[72,198,199],{},"AI analysis runs asynchronously, completely decoupled from any upload path.",[72,201,202],{},"Aurora PostgreSQL is the source of truth. OpenSearch is the search index, rebuildable from Aurora at any time.",[72,204,205],{},"A controlled vocabulary (categories, colours, patterns) is shared by the extraction and query parsing prompts.",[72,207,208],{},"Amazon Bedrock provides both the attribute model (VLM) and the embedding model (Titan Multimodal).",[17,210],{"height":19},[102,212,213],{},[13,214,215],{},"3. High Level Architecture",[13,217,218],{},[219,220],"img",{"alt":221,"src":222},"High Level Architecture","\u002Fimages\u002Fcontent-product-search\u002Fcontent-product-search-high-level-architecture.png",[17,224],{"height":28},[30,226,227],{},[13,228,229],{},[36,230,231],{},"Components:",[130,233,234],{},[85,235,236,242,248,262,268,274,280,286],{},[72,237,238,241],{},[36,239,240],{},"S3:"," Stores the raw product images; emits an event when a new image arrives.",[72,243,244,247],{},[36,245,246],{},"SQS (+ DLQ):"," Buffers and decouples analysis jobs, caps concurrency against model quotas, and isolates failures.",[72,249,250,253,254,257,258,261],{},[36,251,252],{},"Worker Lambda:"," Consumes the queue and runs AI analysis for each image (idempotent on ",[161,255,256],{},"s3_key"," + ",[161,259,260],{},"content_hash","), invokes Bedrock, stores metadata in Aurora, and indexes OpenSearch.",[72,263,264,267],{},[36,265,266],{},"Bedrock:"," Runs the VLM that extracts structured attributes and the Titan model that produces the image embedding; called synchronously and returns both to the caller.",[72,269,270,273],{},[36,271,272],{},"Bedrock Batch Inference + Loader Lambda:"," One-time backfill path: Batch Inference extracts attributes for the 100k seed asynchronously (~50% cheaper, within quota); the Loader Lambda then generates embeddings and writes to Aurora and OpenSearch.",[72,275,276,279],{},[36,277,278],{},"Aurora PostgreSQL + pgvector:"," Relational source of truth for products, images, and analysis; also holds the durable vector copy.",[72,281,282,285],{},[36,283,284],{},"OpenSearch:"," Search index combining exact filters, BM25 full-text, and kNN vector ranking.",[72,287,288,291],{},[36,289,290],{},"API Gateway + Product Service:"," API Gateway is the public entry point; a Node.js microservice (ECS\u002FFargate) runs query understanding (Bedrock) and executes the hybrid search against OpenSearch (chosen over a Lambda for persistent connection pooling and microservice stack fit).",[17,293],{"height":19},[102,295,296],{},[13,297,298],{},"4. Image Processing Flow",[13,300,301],{},[219,302],{"alt":303,"src":304},"Image Processing Flow","\u002Fimages\u002Fcontent-product-search\u002Fcontent-product-search-image-processing-flow.png",[17,306],{"height":28},[30,308,309],{},[13,310,311,314,315,257,317,319],{},[36,312,313],{},"Why Asynchronous:","\nAI analysis is slow (hundreds of milliseconds to seconds per image) and rate-limited by model quotas, so it must not block any request path. A queue-based pipeline lets the system absorb the one-time 100k backfill and ongoing additions at a controlled rate, retry transient model\u002Fthrottling errors with backoff, and route permanent failures to a dead-letter queue without losing work. Because processing is idempotent (keyed on ",[161,316,256],{},[161,318,260],{},") and tracks per-image status, retries are safe and coverage is always known.",[30,321,322],{},[13,323,324,327],{},[36,325,326],{},"Backfill Note:","\nThe same SQS + Worker path seeds the catalogue. As a cost optimization, the one-time 100k extraction can run through Bedrock Batch Inference (asynchronous, ~50% cheaper), with a loader then writing results and embeddings.",[17,329],{"height":19},[102,331,332],{},[13,333,334],{},"5. Search Flow",[13,336,337],{},[219,338],{"alt":339,"src":340},"Search Flow","\u002Fimages\u002Fcontent-product-search\u002Fcontent-product-search-search-flow.png",[17,342],{"height":28},[30,344,345],{},[13,346,347],{},"A natural-language query is parsed by a Bedrock LLM (prompted with the shared vocabulary) into structured filters plus a residual semantic query. The Product Service maps that JSON to a query—the model never touches the datastore directly.",[30,349,350],{},[13,351,352],{},"Three search modes combine:",[130,354,355],{},[85,356,357,363,373],{},[72,358,359,362],{},[36,360,361],{},"1. Keyword search:"," BM25 over the model-generated description for lexical matches.",[72,364,365,368,369,372],{},[36,366,367],{},"2. Semantic search:"," kNN over the embedding; the query text is embedded with the ",[116,370,371],{},"same"," Titan model used at indexing time (vectors only compare within one model's space, so upgrading the embedding model means re-embedding all images).",[72,374,375,378,379,382,383,385,386,389],{},[36,376,377],{},"3. Hybrid search:"," Hard constraints (category, colours, ",[161,380,381],{},"has_print",") are enforced as exact filters, giving reliable negation, while descriptive intent (",[116,384,178],{},") is ranked by the combined semantic + keyword score. Results are collapsed by ",[161,387,388],{},"product_id"," to return one hit per product.",[17,391],{"height":19},[102,393,394],{},[13,395,396],{},"6. Data Model",[30,398,399],{},[13,400,401],{},[36,402,403],{},"Aurora PostgreSQL Schema (Source of Truth):",[13,405,406],{},[219,407],{"alt":408,"src":409},"Aurora PostgreSQL ER Diagram","\u002Fimages\u002Fcontent-product-search\u002Fcontent-product-search-er-diagram.png",[17,411],{"height":28},[30,413,414],{},[13,415,416],{},[36,417,418],{},"Tables at a Glance:",[130,420,421],{},[85,422,423,432,440],{},[72,424,425,431],{},[36,426,427,430],{},[161,428,429],{},"products",":"," Catalogue entity, the merchant-facing product record and its CRUD surface.",[72,433,434,439],{},[36,435,436,430],{},[161,437,438],{},"product_images"," Raw image rows (one per image), stays stable and carries the idempotency hash.",[72,441,442,447,448,119,451,119,454,119,457,459,460,463,464,257,467,470],{},[36,443,444,430],{},[161,445,446],{},"image_analysis"," AI-derived metadata, 1:1 with an image and fully regenerable. The four fields the example queries filter on (",[161,449,450],{},"category",[161,452,453],{},"colors",[161,455,456],{},"pattern",[161,458,381],{},") are first-class columns; the rest live in ",[161,461,462],{},"attributes"," (JSONB) and are covered by ",[161,465,466],{},"description",[161,468,469],{},"embedding"," for semantic search.",[30,472,473],{},[13,474,475,478],{},[36,476,477],{},"Searchable Attributes:","\nThe AI extracts structured attributes that support exact filtering alongside semantic search.",[480,481,482],"table-wrap",{},[483,484,485,499],"table",{},[486,487,488],"thead",{},[489,490,491,496],"tr",{},[492,493,495],"th",{"align":494},"left","Attribute",[492,497,498],{"align":494},"Purpose",[500,501,502,513,523,533,545,555,565],"tbody",{},[489,503,504,510],{},[505,506,507],"td",{"align":494},[36,508,509],{},"Category",[505,511,512],{"align":494},"Product type (dress, t-shirt, shoes, etc.)",[489,514,515,520],{},[505,516,517],{"align":494},[36,518,519],{},"Colors",[505,521,522],{"align":494},"Exact colour filtering",[489,524,525,530],{},[505,526,527],{"align":494},[36,528,529],{},"Pattern",[505,531,532],{"align":494},"Floral, striped, plain, etc.",[489,534,535,540],{},[505,536,537],{"align":494},[36,538,539],{},"Has Print",[505,541,542,543],{"align":494},"Enables queries such as ",[116,544,122],{},[489,546,547,552],{},[505,548,549],{"align":494},[36,550,551],{},"Description",[505,553,554],{"align":494},"AI-generated text used for keyword (BM25) search",[489,556,557,562],{},[505,558,559],{"align":494},[36,560,561],{},"Embedding",[505,563,564],{"align":494},"Vector representation used for semantic similarity search",[489,566,567,572],{},[505,568,569],{"align":494},[36,570,571],{},"Attributes (JSON)",[505,573,574],{"align":494},"Additional metadata such as sleeve length, neckline, fit, or material that can evolve without schema changes",[17,576],{"height":19},[30,578,579],{},[13,580,581,584],{},[36,582,583],{},"OpenSearch Index Mapping (Search Engine):","\nOpenSearch stores an optimized search document for every analysed image.",[30,586,587],{},[13,588,589],{},"Each indexed document contains:",[130,591,592],{},[85,593,594,600,606,609,612,615],{},[72,595,596,597,599],{},"Product identifier (",[161,598,388],{},")",[72,601,602,603,599],{},"Image identifier (",[161,604,605],{},"image_id",[72,607,608],{},"Product information (name, brand)",[72,610,611],{},"Searchable AI attributes",[72,613,614],{},"AI-generated description",[72,616,617],{},"Vector embedding",[30,619,620],{},[13,621,622],{},"This enables three complementary search modes:",[130,624,625],{},[85,626,627,633,639],{},[72,628,629,632],{},[36,630,631],{},"Keyword Search (BM25)"," for lexical matching.",[72,634,635,638],{},[36,636,637],{},"Semantic Search (kNN)"," using vector embeddings.",[72,640,641,644,645,119,647,119,649,651],{},[36,642,643],{},"Hybrid Search"," combining exact filters (",[161,646,450],{},[161,648,453],{},[161,650,381],{},") with semantic ranking to provide accurate and relevant results.",[17,653],{"height":19},[102,655,656],{},[13,657,658],{},"7. Design Decisions",[480,660,661],{},[483,662,663,673],{},[486,664,665],{},[489,666,667,670],{},[492,668,669],{"align":494},"Decision",[492,671,672],{"align":494},"Reason",[500,674,675,685,695,705,718,728,738,748,758,768],{},[489,676,677,682],{},[505,678,679],{"align":494},[36,680,681],{},"S3",[505,683,684],{"align":494},"Durable, cheap object storage for the source images.",[489,686,687,692],{},[505,688,689],{"align":494},[36,690,691],{},"SQS (+ DLQ)",[505,693,694],{"align":494},"Decouples and rate-limits AI processing, isolates failures.",[489,696,697,702],{},[505,698,699],{"align":494},[36,700,701],{},"Worker Lambda",[505,703,704],{"align":494},"Elastic, pay-per-use background AI analysis.",[489,706,707,712],{},[505,708,709],{"align":494},[36,710,711],{},"Bedrock VLM",[505,713,714,715,717],{"align":494},"Zero-shot fine-grained attributes (incl. ",[161,716,381],{},") with no labelled training data.",[489,719,720,725],{},[505,721,722],{"align":494},[36,723,724],{},"Titan Multimodal Embeddings",[505,726,727],{"align":494},"Shared image\u002Ftext vector space enables semantic recall from text queries.",[489,729,730,735],{},[505,731,732],{"align":494},[36,733,734],{},"Aurora Postgres + pgvector",[505,736,737],{"align":494},"Relational source of truth with clean CRUD, durable vectors for reindexing.",[489,739,740,745],{},[505,741,742],{"align":494},[36,743,744],{},"OpenSearch",[505,746,747],{"align":494},"Single engine for exact filters + BM25 + kNN (hybrid search).",[489,749,750,755],{},[505,751,752],{"align":494},[36,753,754],{},"API Gateway + Product Service",[505,756,757],{"align":494},"Node.js microservice (ECS\u002FFargate) on the read path, persistent OpenSearch pooling, microservice-stack fit.",[489,759,760,765],{},[505,761,762],{"align":494},[36,763,764],{},"Bedrock Batch Inference",[505,766,767],{"align":494},"Cheaper, throughput-friendly path for the one-time 100k backfill.",[489,769,770,775],{},[505,771,772],{"align":494},[36,773,774],{},"Controlled Vocabulary",[505,776,777,778,781,782,123],{"align":494},"Consistent enums (categories, colours, patterns) so filters never silently miss (e.g., ",[161,779,780],{},"red"," vs ",[161,783,784],{},"crimson",[17,786],{"height":19},[102,788,789],{},[13,790,791],{},"8. Scalability & Reliability",[130,793,794],{},[85,795,796,799,802,805,808,816,819,826],{},[72,797,798],{},"Queue-based processing absorbs the 100k backfill and ongoing additions at a controlled rate.",[72,800,801],{},"Horizontal worker scaling via Lambda concurrency, capped to respect Bedrock quotas.",[72,803,804],{},"Automatic retries with exponential backoff for transient model\u002Fthrottling errors.",[72,806,807],{},"Dead-letter queue captures permanently failed jobs for inspection and replay.",[72,809,810,811,257,813,815],{},"Idempotent processing (",[161,812,256],{},[161,814,260],{},") makes retries and replays safe.",[72,817,818],{},"OpenSearch is rebuildable from Aurora at any time; no AI re-run required.",[72,820,821,822,825],{},"Model versioning (",[161,823,824],{},"model_version",") enables targeted reprocessing on model or taxonomy changes.",[72,827,828],{},"Batch Inference path keeps the one-time seed cheap and within throughput limits.",[17,830],{"height":19},[102,832,833],{},[13,834,835],{},"9. Trade-offs",[480,837,838],{},[483,839,840,853],{},[486,841,842],{},[489,843,844,847,850],{},[492,845,846],{"align":494},"Choice",[492,848,849],{"align":494},"Alternative",[492,851,852],{"align":494},"Why",[500,854,855,867,880,892,905,920,933],{},[489,856,857,861,864],{},[505,858,859],{"align":494},[36,860,734],{},[505,862,863],{"align":494},"DynamoDB",[505,865,866],{"align":494},"Relational catalogue with joins and easy admin CRUD; DynamoDB only wins at millions with pure key-value access.",[489,868,869,874,877],{},[505,870,871],{"align":494},[36,872,873],{},"Hybrid (filters + vectors)",[505,875,876],{"align":494},"Pure vector search",[505,878,879],{"align":494},"Pure vectors fail negation and exact colour\u002Fcategory filtering.",[489,881,882,886,889],{},[505,883,884],{"align":494},[36,885,711],{},[505,887,888],{"align":494},"Rekognition \u002F Custom Labels",[505,890,891],{"align":494},"Fine-grained attributes zero-shot, no labelled data or per-attribute training.",[489,893,894,899,902],{},[505,895,896],{"align":494},[36,897,898],{},"Asynchronous (SQS + worker)",[505,900,901],{"align":494},"Synchronous analysis",[505,903,904],{"align":494},"Decoupled, retryable, and resilient to spikes and model throttling.",[489,906,907,911,917],{},[505,908,909],{"align":494},[36,910,744],{},[505,912,913,914],{"align":494},"SQL ",[161,915,916],{},"LIKE",[505,918,919],{"align":494},"Real relevance, semantic recall, and scale.",[489,921,922,927,930],{},[505,923,924],{"align":494},[36,925,926],{},"Batch Inference for seed",[505,928,929],{"align":494},"100k on-demand calls",[505,931,932],{"align":494},"~50% cheaper and respects throughput quotas.",[489,934,935,940,943],{},[505,936,937],{"align":494},[36,938,939],{},"Product Service (ECS\u002FFargate)",[505,941,942],{"align":494},"Search Lambda",[505,944,945],{"align":494},"Persistent OpenSearch connection pooling and microservice stack fit; Lambda only preferable for scale-to-zero, spiky low volume.",[17,947],{"height":28},[30,949,950],{},[13,951,952],{},[116,953,954,955,958,959,962],{},"Note: For the current requirement of approximately 100,000 images, a solution based on PostgreSQL with full-text search (",[161,956,957],{},"tsvector",") and ",[161,960,961],{},"pgvector"," would be sufficient. This design adopts OpenSearch to provide a clear path for hybrid search, richer text relevance, faceted filtering, and future scalability as the product catalogue and search traffic grow.",{"title":964,"searchDepth":965,"depth":965,"links":966},"",2,[],"A senior-level system architecture design for AI image analysis, attribute extraction, and hybrid search across 100,000+ S3 images, designed in ~5 hours by a developer with 1.5 years of experience.","md",{},true,"\u002Fmeaningless-code\u002Fimage-content-product-search",{"title":973,"description":967,"canonical":974,"robots":975,"ogTitle":973,"ogDescription":967,"ogImage":25,"twitterCard":976,"twitterTitle":973,"twitterDescription":967},"Architecting AI-Powered Image Content Product Search in 5 Hours | Meaningless [C]ode","https:\u002F\u002Fnalinda.dev\u002Fslice-of\u002Fmeaningless-code\u002Fimage-content-product-search","index,follow,max-snippet:-1,max-image-preview:large,max-video-preview:-1","summary_large_image",{"loc":978,"lastmod":979,"changefreq":980},"\u002Fslice-of\u002Fmeaningless-code\u002Fimage-content-product-search","2026-07-04","monthly","image-content-product-search","meaningless-code\u002Fimage-content-product-search","system-design|||architecture|||aws|||bedrock|||opensearch|||vector-search","IW79hG_isiUvgEPBm5sjgU3oWP2kVUR6atb_42ENJVg",1783149732292]