Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)
Sixteen Alpha AI
2 - 5 years
Delhi
Posted: 10/12/2025
Getting a referral is 5x more effective than applying directly
Job Description
Were developing a next-generation intelligent web crawling system capable of exploring deep and dynamic web data sources including sites behind authentication, infinite scrolls, and JavaScript-heavy pages.
The crawler will be integrated with an AI-driven pipeline for automated data understanding, classification, and transformation.
Were looking for a highly experienced engineer who has previously built large-scale, distributed crawling frameworks and integrated AI or NLP/LLM-based components for contextual data extraction.
- Design, develop, and deploy scalable deep web crawlers capable of bypassing common anti-bot mechanisms.
- Implement AI-integrated pipelines for data processing, entity extraction, and semantic categorization.
- Develop dynamic scraping systems for sites that rely on JavaScript, infinite scrolling, or APIs.
- Integrate with vector databases , LLM-based data labeling, or automated content enrichment modules.
- Optimize crawling logic for speed, reliability, and stealth across distributed environments.
- Collaborate on data pipeline orchestration using tools like Airflow, Prefect, or custom async architectures.
- Proven experience building deep or dark web crawlers (Playwright, Scrapy, Puppeteer, or custom async frameworks).
- Strong understanding of browser automation, session management, and anti-detection mechanisms .
- Experience integrating AI/ML/NLP pipelines e.g., text classification, entity recognition, or embedding-based similarity.
- Skilled in asynchronous Python (asyncio, aiohttp, Playwright async API).
- Familiar with database and pipeline systems PostgreSQL, MongoDB, Elasticsearch, or similar.
- Ability to design robust data flows that connect crawling AI inference storage/visualization.
- Knowledge of LLMs (OpenAI, Hugging Face, LangChain, or custom fine-tuned models) .
- Experience with data cleaning, deduplication, and normalization pipelines .
- Familiarity with distributed crawling frameworks (Ray, Celery, Kafka) .
- Prior experience integrating real-time analytics dashboards or monitoring tools.
- Competitive freelance pay based on expertise and delivery.
- Flexible, async-first remote collaboration.
- Opportunity to shape an AI-first data platform from the ground up.
- Potential for long-term partnership if the collaboration is successful.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
