Harvester #
class Harvester()
This is where all the magic happends. Genisis. Root. God.
Normally, you’ll only need to instanciate this once within
your code.
Arguments:
host
str, optional - The address you want to bind to. Defaults to ‘127.0.0.1’.port
int, optional - The port you want to listen on. Defaults to 5000.do_not_track
bool, optional - Disables the analytics tracking. Not recomended. Defaults to False.
serveforever #
| serveforever() -> Thread
Wraps the server’s serveforever method in a daemonized thread.
If you’d like do do this yourself you can call harvester._serveforever()
shutdown #
| shutdown()
Gracefully shutsdown the server and waits for all connections to close
capture #
| capture(captcha: Captcha) -> Intercepter
This method will use the information passed to it to
configure the server to harvest captchas on a certian
domain with the specified captcha type and sitekey.
Arguments:
captcha
Captcha - An object instanciated byReCaptchaV2
,ReCaptchaV3
,
orhCaptcha
Returns:
Intercepter
- [description]
Example:
captcha = ReCaptchaV3(
url='https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php',
sitekey='6LdyC2cUAAAAACGuDKpXeDorzUDWXmdqeg-xy696',
action='examples/v3scores'
)
harvester = Harvester()
intercepter = harvester.capture(captcha)
Intercepter #
class Intercepter()
An instance of Intercepter
will be returned from Harvester.capture
.
From here you’ll be able to create a browser instance which will get
you one step closer to solving captchas.
This is where all the magic happends. Genisis. Root. God.
Normally, you’ll only need to instanciate this once within
your code.
Attributes:
tokens
TokenQueue - The queue like object that allows you
to request tokens from the harvester windows.
setup_browser #
| setup_browser(user_data_dir: str, width: int = 400, height: int = 600, browser_args: List[str] = None, extensions: List[str] = None, executable: str = None) -> Browser
This is the only way to instanciate a Browser instance to harvest captchas through.
After calling this method you will be able to access the browser instance via .browser
This method caches the browser as only one browser should be tied to
each intercepter. This may change in the future
Arguments:
user_data_dir
str - This is where Chrome will store all it’s data.
It can be an empty directory. If you want to load a prevoius set of profiles,
pass the same path the next time you call the method.width
int, optional - This sets the width of the browser window. Defaults to 400.height
int, optional - This sets the height of the browser window. Defaults to 600.browser_args
List[str], optional - This allows you to pass extra arguments to the
browser process. Defaults to None.extensions
List[str], optional - This allows you to load any extensions you’d like
to be loaded when starting the Chrome instance. Defaults to None.executable
str, optional - If for some reason Chrome is not installed in the usual
spot or you want to use a custom binary, pass the path to binary here, or pass a
program name that will be found in thePATH
env variable . Defaults to None.
Returns:
Browser
- A browser instance that you can use to open windows to solve captchas from.
TokenQueue #
class TokenQueue()
get #
| get(timeout: float = None, poll: float = 1) -> Optional[str]
Contacts a harvester window to display a captcha to the user.
Arguments:
timeout
float, optional - Timeout the captcha request if the
user takes too long (seconds). Defaults to None.poll
float, optional - If there are no avalibe captcha harvester
windows how often do you want check for new ones (seconds). Defaults to 1.
Returns:
str
- A captcha token to be submitted where ever you need it.
hCaptcha #
hCaptcha(url: str, sitekey: str) -> Captcha
Creates a Captcha instance configured to solve hCaptchas
Arguments:
url
str - The full url on which the captcha is displayed.sitekey
str - The sitekey of the captcha.
Returns:
Captcha
- A Captcha instance configured with the passed arguments
ReCaptchaV2 #
ReCaptchaV2(url: str, sitekey: str) -> Captcha
Creates a Captcha instance configured to solve ReCaptchaV2
Arguments:
url
str - The full url on which the captcha is displayed.sitekey
str - The sitekey of the captcha.
Returns:
Captcha
- A Captcha instance configured with the passed arguments
ReCaptchaV3 #
ReCaptchaV3(url: str, sitekey: str, action: str) -> Captcha
Creates a Captcha instance configured to solve ReCaptchaV3
Arguments:
url
str - The full url on which the captcha is displayed.sitekey
str - The sitekey of the captcha.action
str - Action parameter passed when loading the captcha via JS.
Returns:
Captcha
- A Captcha instance configured with the passed arguments
Browser #
class Browser()
You can instantiate a brower instance like:
intercepter.setup_browser(user_data_dir="harvester-browser-data")
get_profiles #
| get_profiles() -> List['Profile']
Get’s all the profiles that have been created by the harvester.
NOTE: If you’d like to import profiles from any user-data-dir just
prepend Account-
before the profiles you’d like the harvester
to recognize.
Returns:
List[Profile]
- All the profiles created by the harvester.
get_profile #
| get_profile(name: str) -> 'Profile'
Returns a profile by name.
Arguments:
name
str - The name of the profile
Raises:
ValueError
- If the profile does not exist.
Returns:
Profile
- AProfile
instance from which you can control
the browser.
profile_exists #
| profile_exists(name: str) -> bool
Checks to see if a profile directory exists.
Arguments:
name
str - The name of the profile to look up.
Returns:
bool
- If the directory with the passedname
exists.
new_profile #
| new_profile(name: str) -> 'Profile'
Creates a new directory with the profile’s name and
returns a new profile instance.
Arguments:
name
str - The name of the new profile.
Raises:
ValueError
- When the name of an existing profile has been passed.
Returns:
Profile
- Returns a newProfile
instance with a name that hasn’t been used.
Profile #
class Profile()
You can instantiate a profile in one of hte following ways:
browser.new_profile('Foo')
browser.get_profile('Foo')
browser.get_profiles() # [Profile(Foo)]
set_proxy #
| @verify_browser_not_running
| set_proxy(proxy: Proxy) -> 'Profile'
Makes sure all connections from this profile from
hence forth are all routed through the proxy server
specified in the Proxy
object.
Arguments:
proxy
Proxy - An object describing the proxy connection.
Returns:
Profile
- Useful for chaining.
poll #
| poll() -> bool
Poll to see if the browser process is still running
Returns:
bool
-True
if the browser process is still running,
otherwiseFalse
kill #
| kill()
Sends SIGKILL
to the undelying browser process if
it is still running.
delete #
| @verify_browser_not_running
| delete()
Removes the profile directory from the user_data_dir
passed
when calling setup_browser
.
harvest #
| @verify_browser_not_running
| harvest()
Opens a browser window pointing to domain passed to
harvester.capture(...)
, sepcially configured to solve
captchas.
launch #
| @verify_browser_not_running
| launch(url: str = None, app: bool = False)
Opens a window for the user to do anything they want, usually before calling .harvest()
.
NOTE: This method waits for the user to close the browser process.
On macOS this means the whole process must be quit, not just
the window.
Arguments:
url
str, optional - Open the browser to a specific url. Defaults to None.app
bool, optional - Open the browser window with the--app=<url>
flag. Defaults to False.
login_to_google #
| @verify_browser_not_running
| login_to_google(login_url: str = DEFAULT_LOGIN_URL)
Opens a window for the user to login to Google via a YouTube endpoint, usually before calling .harvest()
NOTE: This method waits for the user to close the browser process.
On macOS this means the whole process must be quit, not just
the window.
Arguments:
login_url
str, optional - If you’d like to login via a different endpoint, pass it here. Defaults to
accounts.google.com.
Proxy #
@dataclass
class Proxy()
Create a Proxy object to be passed to Profile.set_proxy()
.
proxy = Proxy(host='127.0.0.1', port=9436)
authed_proxy = Proxy(host='127.0.0.1', port=9436, username='bar', password='foo')