API Documentation

Harvester #

class Harvester()

This is where all the magic happends. Genisis. Root. God.

Normally, you’ll only need to instanciate this once within
your code.

Arguments:

  • host str, optional - The address you want to bind to. Defaults to ‘127.0.0.1’.
  • port int, optional - The port you want to listen on. Defaults to 5000.
  • do_not_track bool, optional - Disables the analytics tracking. Not recomended. Defaults to False.

serveforever #

 | serveforever() -> Thread

Wraps the server’s serveforever method in a daemonized thread.

If you’d like do do this yourself you can call harvester._serveforever()

shutdown #

 | shutdown()

Gracefully shutsdown the server and waits for all connections to close

capture #

 | capture(captcha: Captcha) -> Intercepter

This method will use the information passed to it to
configure the server to harvest captchas on a certian
domain with the specified captcha type and sitekey.

Arguments:

  • captcha Captcha - An object instanciated by ReCaptchaV2, ReCaptchaV3,
    or hCaptcha

Returns:

  • Intercepter - [description]

Example:

captcha = ReCaptchaV3(
    url='https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php',
    sitekey='6LdyC2cUAAAAACGuDKpXeDorzUDWXmdqeg-xy696',
    action='examples/v3scores'
)

harvester = Harvester()

intercepter = harvester.capture(captcha)

Intercepter #

class Intercepter()

An instance of Intercepter will be returned from Harvester.capture.

From here you’ll be able to create a browser instance which will get
you one step closer to solving captchas.

This is where all the magic happends. Genisis. Root. God.

Normally, you’ll only need to instanciate this once within
your code.

Attributes:

  • tokens TokenQueue - The queue like object that allows you
    to request tokens from the harvester windows.

setup_browser #

 | setup_browser(user_data_dir: str, width: int = 400, height: int = 600, browser_args: List[str] = None, extensions: List[str] = None, executable: str = None) -> Browser

This is the only way to instanciate a Browser instance to harvest captchas through.

After calling this method you will be able to access the browser instance via .browser

This method caches the browser as only one browser should be tied to
each intercepter. This may change in the future

Arguments:

  • user_data_dir str - This is where Chrome will store all it’s data.
    It can be an empty directory. If you want to load a prevoius set of profiles,
    pass the same path the next time you call the method.
  • width int, optional - This sets the width of the browser window. Defaults to 400.
  • height int, optional - This sets the height of the browser window. Defaults to 600.
  • browser_args List[str], optional - This allows you to pass extra arguments to the
    browser process. Defaults to None.
  • extensions List[str], optional - This allows you to load any extensions you’d like
    to be loaded when starting the Chrome instance. Defaults to None.
  • executable str, optional - If for some reason Chrome is not installed in the usual
    spot or you want to use a custom binary, pass the path to binary here, or pass a
    program name that will be found in the PATH env variable . Defaults to None.

Returns:

  • Browser - A browser instance that you can use to open windows to solve captchas from.

TokenQueue #

class TokenQueue()

get #

 | get(timeout: float = None, poll: float = 1) -> Optional[str]

Contacts a harvester window to display a captcha to the user.

Arguments:

  • timeout float, optional - Timeout the captcha request if the
    user takes too long (seconds). Defaults to None.
  • poll float, optional - If there are no avalibe captcha harvester
    windows how often do you want check for new ones (seconds). Defaults to 1.

Returns:

  • str - A captcha token to be submitted where ever you need it.

hCaptcha #

hCaptcha(url: str, sitekey: str) -> Captcha

Creates a Captcha instance configured to solve hCaptchas

Arguments:

  • url str - The full url on which the captcha is displayed.
  • sitekey str - The sitekey of the captcha.

Returns:

  • Captcha - A Captcha instance configured with the passed arguments

ReCaptchaV2 #

ReCaptchaV2(url: str, sitekey: str) -> Captcha

Creates a Captcha instance configured to solve ReCaptchaV2

Arguments:

  • url str - The full url on which the captcha is displayed.
  • sitekey str - The sitekey of the captcha.

Returns:

  • Captcha - A Captcha instance configured with the passed arguments

ReCaptchaV3 #

ReCaptchaV3(url: str, sitekey: str, action: str) -> Captcha

Creates a Captcha instance configured to solve ReCaptchaV3

Arguments:

  • url str - The full url on which the captcha is displayed.
  • sitekey str - The sitekey of the captcha.
  • action str - Action parameter passed when loading the captcha via JS.

Returns:

  • Captcha - A Captcha instance configured with the passed arguments

Browser #

class Browser()

You can instantiate a brower instance like:

intercepter.setup_browser(user_data_dir="harvester-browser-data")

get_profiles #

 | get_profiles() -> List['Profile']

Get’s all the profiles that have been created by the harvester.

NOTE: If you’d like to import profiles from any user-data-dir just
prepend Account- before the profiles you’d like the harvester
to recognize.

Returns:

  • List[Profile] - All the profiles created by the harvester.

get_profile #

 | get_profile(name: str) -> 'Profile'

Returns a profile by name.

Arguments:

  • name str - The name of the profile

Raises:

  • ValueError - If the profile does not exist.

Returns:

  • Profile - A Profile instance from which you can control
    the browser.

profile_exists #

 | profile_exists(name: str) -> bool

Checks to see if a profile directory exists.

Arguments:

  • name str - The name of the profile to look up.

Returns:

  • bool - If the directory with the passed name exists.

new_profile #

 | new_profile(name: str) -> 'Profile'

Creates a new directory with the profile’s name and
returns a new profile instance.

Arguments:

  • name str - The name of the new profile.

Raises:

  • ValueError - When the name of an existing profile has been passed.

Returns:

  • Profile - Returns a new Profile instance with a name that hasn’t been used.

Profile #

class Profile()

You can instantiate a profile in one of hte following ways:

browser.new_profile('Foo')

browser.get_profile('Foo')

browser.get_profiles()  # [Profile(Foo)]

set_proxy #

 | @verify_browser_not_running
 | set_proxy(proxy: Proxy) -> 'Profile'

Makes sure all connections from this profile from
hence forth are all routed through the proxy server
specified in the Proxy object.

Arguments:

  • proxy Proxy - An object describing the proxy connection.

Returns:

  • Profile - Useful for chaining.

poll #

 | poll() -> bool

Poll to see if the browser process is still running

Returns:

  • bool - True if the browser process is still running,
    otherwise False

kill #

 | kill()

Sends SIGKILL to the undelying browser process if
it is still running.

delete #

 | @verify_browser_not_running
 | delete()

Removes the profile directory from the user_data_dir passed
when calling setup_browser.

harvest #

 | @verify_browser_not_running
 | harvest()

Opens a browser window pointing to domain passed to
harvester.capture(...), sepcially configured to solve
captchas.

launch #

 | @verify_browser_not_running
 | launch(url: str = None, app: bool = False)

Opens a window for the user to do anything they want, usually before calling .harvest().

NOTE: This method waits for the user to close the browser process.
On macOS this means the whole process must be quit, not just
the window.

Arguments:

  • url str, optional - Open the browser to a specific url. Defaults to None.
  • app bool, optional - Open the browser window with the --app=<url> flag. Defaults to False.

login_to_google #

 | @verify_browser_not_running
 | login_to_google(login_url: str = DEFAULT_LOGIN_URL)

Opens a window for the user to login to Google via a YouTube endpoint, usually before calling .harvest()

NOTE: This method waits for the user to close the browser process.
On macOS this means the whole process must be quit, not just
the window.

Arguments:

  • login_url str, optional - If you’d like to login via a different endpoint, pass it here. Defaults to
    accounts.google.com.

Proxy #

@dataclass
class Proxy()

Create a Proxy object to be passed to Profile.set_proxy().

proxy = Proxy(host='127.0.0.1', port=9436)
authed_proxy = Proxy(host='127.0.0.1', port=9436, username='bar', password='foo')