🔔 FCM Loaded

Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)

Sixteen Alpha AI

2 - 5 years

Delhi

Posted: 10/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

About the Project

Were developing a next-generation intelligent web crawling system capable of exploring deep and dynamic web data sources including sites behind authentication, infinite scrolls, and JavaScript-heavy pages.

The crawler will be integrated with an AI-driven pipeline for automated data understanding, classification, and transformation.

Were looking for a highly experienced engineer who has previously built large-scale, distributed crawling frameworks and integrated AI or NLP/LLM-based components for contextual data extraction.


Key Responsibilities
  • Design, develop, and deploy scalable deep web crawlers capable of bypassing common anti-bot mechanisms.
  • Implement AI-integrated pipelines for data processing, entity extraction, and semantic categorization.
  • Develop dynamic scraping systems for sites that rely on JavaScript, infinite scrolling, or APIs.
  • Integrate with vector databases , LLM-based data labeling, or automated content enrichment modules.
  • Optimize crawling logic for speed, reliability, and stealth across distributed environments.
  • Collaborate on data pipeline orchestration using tools like Airflow, Prefect, or custom async architectures.
Required Expertise
  1. Proven experience building deep or dark web crawlers (Playwright, Scrapy, Puppeteer, or custom async frameworks).
  2. Strong understanding of browser automation, session management, and anti-detection mechanisms .
  3. Experience integrating AI/ML/NLP pipelines e.g., text classification, entity recognition, or embedding-based similarity.
  4. Skilled in asynchronous Python (asyncio, aiohttp, Playwright async API).
  5. Familiar with database and pipeline systems PostgreSQL, MongoDB, Elasticsearch, or similar.
  6. Ability to design robust data flows that connect crawling AI inference storage/visualization.


Nice to Have
  • Knowledge of LLMs (OpenAI, Hugging Face, LangChain, or custom fine-tuned models) .
  • Experience with data cleaning, deduplication, and normalization pipelines .
  • Familiarity with distributed crawling frameworks (Ray, Celery, Kafka) .
  • Prior experience integrating real-time analytics dashboards or monitoring tools.


What We Offer
  • Competitive freelance pay based on expertise and delivery.
  • Flexible, async-first remote collaboration.
  • Opportunity to shape an AI-first data platform from the ground up.
  • Potential for long-term partnership if the collaboration is successful.


Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.