.. highlight:: sh

=================
User Guide
=================


Creating Functions
-------------------

.. currentmodule:: qualipy.reflect.function

.. autofunction:: function

Example 1 - A simple function with no additional arguments::

    import qualipy as qpy
    @qpy.function(return_format=float)
    def mean(data, column):
        return data[column].mean()

Per the rules, ``data`` represents the data passed through, in this case a pandas DataFrame,
 ``column`` is the string name of column is used to access the column from the DataFrame. 
Additionally, the method ``mean`` returns a float value, which is consistent with the
``return_format`` set in the decorator call.

Example 2 - A simple function with additional arguments::

    @qpy.function(return_format=int, allowed_arguments=["standard_deviations"])
    def std_over_limit(data, column, standard_deviations):
        mean = data[column].mean()
        std = data[column].std()
        data = data[
            (data[column] < (mean - standard_deviations * std))
            | (data[column] > (mean + standard_deviations * std))
        ]
        return data.shape[0]


Example 3 - A function when running SQL as backend::

    @qpy.function(return_format=float)
    def mean(data, column):
        return data.engine.execute(
            sa.select([sa.func.avg(sa.column(column))]).select_from(data._table)
        ).scalar()


Creating a mapping
-------------------

.. currentmodule:: qualipy.reflect.column

.. autofunction:: column

Example 1 - Reflect a pandas column with one function::

    price = qpy.column(column_name="price", column_type=FloatType(), functions=[mean])

Here, ``price`` is the name of the pandas column. We want to column to be of float type,
and we're collecting the mean of the price.

Example 2 - Reflect a column, and call a function with arguments::

    price = qpy.column(
        column_name="price",
        column_type=FloatType(),
        functions=[{"function": std_over_limit, "parameters": {"standard_deviations": 3}}],
    )

Example 3 - Reflect multiple columns, and call a function on just one of them::

    num_columns = qpy.column(
        column_name=["price", "some_other_column"],
        column_type=FloatType(),
        functions=[mean],
        extra_functions={
            "price": [
                {"function": std_over_limit, "parameters": {"standard_deviations": 3}},
            ],
        },
    )

In this scenario, ``mean`` will be applied to ``price``, but ``std_over_limit`` will only
be applied ``price``

Project
--------

.. currentmodule:: qualipy.project

.. autoclass:: Project     

   .. automethod:: __init__
   .. automethod:: add_column

Example 1 - Instantiate a project::

    import qualipy as qpy

    project = qpy.Project(project_name='stocks', config_dir='/tmp/.config')

Example 2 - Instantiate a project and add a column to it::

    import qualipy as qpy

    project = qpy.Project(project_name='stocks', config_dir='/tmp/.config')
    # using the price column defined above
    project.add_column(column=price, name='price_analysis')

Supported DataSet Types
------------------------

Currently, there are three different dataset types supported: Pandas, Spark, and SQL

**Pandas**

.. currentmodule:: qualipy.backends.pandas_backend.dataset

.. autoclass:: PandasData     

   .. automethod:: __init__
   .. automethod:: set_stratify_rule

Example 1 - Setting symbol as a stratification::

    from qualipy.backends.pandas_backend.dataset import PandasData

    stocks = PandasData(stocks)
    stocks.set_stratify_rule("symbol")

Example 2 - Setting symbol as a stratification and specifying the subset of stocks to analyze::

    from qualipy.backends.pandas_backend.dataset import PandasData

    stocks = PandasData(stocks)
    stocks.set_stratify_rule("symbol", values=['IBM', 'AAPL'])

**SQL**

.. currentmodule:: qualipy.backends.sql_backend.dataset

.. autoclass:: SQLData     

   .. automethod:: __init__
   .. automethod:: set_custom_where

Example 1 - Instantiating a table::

    import sqlalchemy as sa
    from qualipy.backends.sql_backend.dataset import SQLData

    engine = sa.create_engine('sqlite://')
    data = SQLData(engine=engine, table_name='my_table')

Example 2 - Instantiating a table and setting a custom where clause::

    import sqlalchemy as sa
    from qualipy.backends.sql_backend.dataset import SQLData

    engine = sa.create_engine('sqlite://')
    data = SQLData(engine=engine, table_name='my_table')
    data.set_custom_where("my_col = 'setosa'")

Qualipy
--------

.. currentmodule:: qualipy.run

.. autoclass:: Qualipy     

   .. automethod:: __init__
   .. automethod:: set_dataset
   .. automethod:: set_chunked_dataset


Data types
-----------

There are several data types one can check for, depending on the backend.
For pandas, these include
    * `DateTimeType`
    * `FloatType` - will match against float16-128
    * `IntType` - will match against int0-64
    * `NumericTypeType` - will match with any numeric subtype
    * `ObjectType`
    * `BoolType`

For SQL and SPARK backends, these are generally less important as type is usually
enforced by the framework itself, reducing the need for type checking.