AI-driven data platform for food and beverageAI-driven data platform development for food and beverage
Business Result
- Through our efforts, we successfully identified and extracted menu information from over 10,000 establishments - Ability to successfully process over 100GB of data on a daily basis - Analyzing data sources in 45+ countries globally - Our approach ensured that we captured the necessary information with precision and accuracy, resulting in a valuable resource for the beverage industry.
Project idea
Our primary objective was to curate a comprehensive database of all beverages available on menus across cafes and restaurants throughout Europe. As such, our client could have access to source, transform and analyze data sources in 45+ countries globally.
Tech Stack
AWSS3K8sArgo WorkflowsPuppeteerSpaCyPyTorchGrafanaPrometheus
Solution
Team Composition 2 software engineers, 2 data analysts, 1 PM Project Duration 16 months To accomplish this, our team created a tool that was constantly fetching places from different sources, visiting numerous websites, and extracting information on available menus and beverages. First of all, we used various open sources to gather and update information about venues in a specific city. Then we fetched information from: - on-website unstructured lists, pdf/image menus; - aggregator websites; - delivery services; - menu photos from aggregation services. We’ve used AWS S3 for storing intermediate results and argo workflows to orchestrate workload, proprietary OCRs to get information from images and pdf files and headless chrome to perform data extraction. Also, we’ve created several metrics to ensure data quality and prevent data pollution due to the website changes or downtime. Key features - Predictable quality metrics - Automated website updated and menu discovery - Orchestrating huge workload, making up to 1000 requests on a crawl phase - 12 languages was supported