Agent MCP BrightData avec Exploration Itérative

Un agent intelligent utilisant le protocole MCP (Model Context Protocol) pour explorer et analyser des sites web de manière itérative et structurée.

🚀 Fonctionnalités

Exploration itérative : Méthodologie en 4 phases pour une recherche complète
Protection anti-doublons : Évite les appels redondants sur les mêmes URLs
Outils MCP : Utilise scrape_as_markdown, scrape_as_html et search_engine
Gestion intelligente du contexte : Optimise l'utilisation de la mémoire
Interface conversationnelle : Interaction naturelle avec l'agent

📋 Prérequis

Python 3.8+
Compte BrightData avec API token
Clé API OpenAI

🛠️ Installation

Cloner le repository

git clone https://github.com/elonmsk/MCP.git cd MCP

Installer les dépendances

pip install -r requirements.txt

Configurer les variables d'environnement

cp sample.env .env # Éditer .env avec vos clés API

⚙️ Configuration

Créez un fichier .env avec vos clés API :

# BrightData API Configuration API_TOKEN=your_brightdata_api_token_here BROWSER_AUTH=your_browser_auth_here WEB_UNLOCKER_ZONE=your_web_unlocker_zone_here # OpenAI API Configuration OPENAI_API_KEY=your_openai_api_key_here

🎯 Utilisation

Démarrage de l'agent (mode console)

python main.py

Démarrage de l'API web

python app.py

L'API sera disponible sur http://localhost:8000

Documentation de l'API

Une fois l'API démarrée, visitez :

Documentation interactive : http://localhost:8000/docs
Documentation alternative : http://localhost:8000/redoc

Commandes disponibles

Exploration complète : explore [URL] [sujet]
explore https://example.com aides logement
Réinitialisation : reset, clear, ou new
reset
Quitter : exit ou quit
exit

🔍 Méthodologie d'Exploration

L'agent suit une méthodologie structurée en 4 phases :

Phase 1 : Navigation interne structurée

Analyse de la page d'accueil
Identification de la structure de navigation
Création d'une carte mentale du site

Phase 2 : Exploration approfondie

Suivi des chemins de navigation identifiés
Exploration en profondeur des sections pertinentes
Collecte d'informations détaillées

Phase 3 : Validation externe

Recherches externes spécifiques
Vérification de la cohérence des données
Identification d'informations complémentaires

Phase 4 : Recherche externe complémentaire

Validation finale par moteur de recherche
Complémentarité et validation finale

🛡️ Protection Anti-Doublons

Limite d'appels : 15 appels maximum par session
Vérification d'URLs : Évite les appels sur les mêmes pages
Instructions système : L'agent est programmé pour éviter les doublons
Compteur en temps réel : Affichage du nombre d'appels restants

📊 Outils Utilisés

scrape_as_markdown : Extraction du contenu des pages web
scrape_as_html : Obtention du HTML brut si nécessaire
search_engine : Recherches externes via Google

⚠️ Note : L'outil extract n'est pas utilisé car il cause des erreurs.

🔧 Architecture

main.py (mode console) ├── WebExplorer (classe principale) │ ├── scrape_page() - Scraping de pages │ ├── search_site() - Recherches externes │ └── explore_site_iteratively() - Exploration complète ├── ConversationManager - Gestion du contexte └── create_duplicate_protected_tools() - Protection anti-doublons app.py (API web) ├── FastAPI application ├── Routes API (/explore, /search, /scrape, /chat) ├── WebExplorer (version simplifiée) └── Session MCP globale

📝 Exemple d'Utilisation

🤖 Agent BrightData MCP avec exploration itérative activé 💡 Utilisez 'explore [URL] [sujet]' pour une exploration complète 🔄 Utilisez 'reset', 'clear' ou 'new' pour réinitialiser la session 📊 Protection anti-doublons via instructions système You: explore https://actionlogement.fr aides logement réfugiés 🔍 DÉBUT DE L'EXPLORATION ITÉRATIVE 🌐 Site cible: https://actionlogement.fr 📋 Sujet de recherche: aides logement réfugiés 📊 Limite d'appels: 10 🗺️ PHASE 1: NAVIGATION INTERNE STRUCTURÉE ✅ Page d'accueil analysée (15 liens trouvés) ...

🤝 Contribution

Les contributions sont les bienvenues ! N'hésitez pas à :

Fork le projet
Créer une branche pour votre fonctionnalité
Commiter vos changements
Pousser vers la branche
Ouvrir une Pull Request

📄 Licence

Ce projet est sous licence MIT. Voir le fichier LICENSE pour plus de détails.

🆘 Support

Pour toute question ou problème :

Vérifiez que vos clés API sont correctement configurées
Assurez-vous que toutes les dépendances sont installées
Consultez les logs pour identifier les erreurs
Ouvrez une issue sur GitHub si le problème persiste

🚀 Déploiement sur Render

Configuration automatique

Le projet inclut un fichier render.yaml pour un déploiement automatique.

Configuration manuelle

Type de service : Web Services
Build Command : pip install -r requirements.txt
Start Command : gunicorn app:app -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:$PORT
Variables d'environnement :
- API_TOKEN : Votre token BrightData
- BROWSER_AUTH : Votre authentification BrightData
- WEB_UNLOCKER_ZONE : Votre zone BrightData
- OPENAI_API_KEY : Votre clé API OpenAI

Endpoints disponibles

GET / : Page d'accueil
GET /health : Statut de santé
POST /explore : Exploration complète d'un site
POST /search : Recherche sur un site
POST /scrape : Scraping d'une page
POST /chat : Chat avec l'agent

🔄 Mise à Jour

Pour mettre à jour le projet :

git pull origin main pip install -r requirements.txt --upgrade

Développé avec ❤️ pour l'exploration intelligente du web

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

An intelligent agent using the Model Context Protocol to iteratively explore and analyze websites in a structured way, with built-in duplicate protection and conversational interface.

Related MCP Servers

browser-use MCP server
deploya-labs
A
security
A
license
A
quality
AI-driven browser automation server that implements the Model Context Protocol to enable natural language control of web browsers for tasks like navigation, form filling, and visual interaction.
Last updated -
1
2
MIT License
WebSearch-MCP
mnhlt
A
security
F
license
A
quality
A Model Context Protocol server that enables AI assistants to perform real-time web searches, retrieving up-to-date information from the internet via a Crawler API.
Last updated -
1
285
17
Prysm MCP Server
pinkpixel-dev
A
security
A
license
A
quality
A Model Context Protocol server enabling AI assistants to scrape web content with high accuracy and flexibility, supporting multiple scraping modes and content formatting options.
Last updated -
4
26
2
MIT License
Better Fetch
flutterninja9
A
security
F
license
A
quality
A Model Context Protocol server that intelligently fetches and processes web content, transforming websites and documentation into clean, structured markdown with nested URL crawling capabilities.
Last updated -
2
33
4

View all related MCP servers

Agent MCP BrightData