Database Connections¶

class chocolate.SQLiteConnection(url, result_table='results', complementary_table='complementary', space_table='space')[source]¶

Connection to a SQLite database.

Before calling any method you must explicitly lock() the database since SQLite does not handle well concurrency.

We use dataset under the hood allowing us to manage a SQLite database just like a list of dictionaries. Thus no need to predefine any schema nor maintain it explicitly. You can treat this database just as a list of dictionaries.

Parameters:

url (str) – Full url to the database, as described in the SQLAlchemy documentation. The url is parsed to find the database path. A lock file will be created in the same directory than the database. In memory databases (url = "sqlite:///" or url = "sqlite:///:memory:") are not allowed.
result_table (str) – Table used to store the experiences and their results.
complementary_table (str) – Table used to store complementary information necessary to the optimizer.
space_table (str) – Table used to save the optimization Space.

Raises:

RuntimeError – When an invalid name is given, see error message for precision.

results_as_dataframe()¶

Compile all the results and transform them using the space specified in the database. It is safe to use this method while other experiments are still writing to the database.

Returns:	A `pandas.DataFrame` containing all results with its `"_chocolate_id"` as `"id"`, their parameters and its loss. Pending results have a loss of `None`.

lock(timeout=-1, poll_interval=0.05)[source]¶

Context manager that locks the entire database.

Parameters:	timeout – If the lock could not be acquired in timeout seconds raises a timeout error. If 0 or less, wait forever. poll_interval – Number of seconds between lock acquisition tryouts.
Raises:	`TimeoutError` – Raised if the lock could not be acquired.

Example:

conn = SQLiteConnection("sqlite:///temp.db")
with conn.lock(timeout=5):
    # The database is locked
    all_ = conn.all_results()
    conn.insert({"new_data" : len(all_)})

# The database is unlocked

all_results()[source]¶: Get a list of all entries of the result table. The order is undefined.

insert_result(document)[source]¶: Insert a new document in the result table. The columns must not be defined nor all present. Any new column will be added to the database and any missing column will get value None.

update_result(filter, values)[source]¶

Update or add values of given rows in the result table.

Parameters:	filter – An identifier of the rows to update. values – A mapping of values to update or add.

count_results()[source]¶: Get the total number of entries in the result table.

all_complementary()[source]¶: Get all entries of the complementary information table as a list. The order is undefined.

insert_complementary(document)[source]¶: Insert a new document (row) in the complementary information table.

find_complementary(filter)[source]¶: Find a document (row) from the complementary information table.

get_space()[source]¶

Returns the space used for previous experiments.

Raises:	`AssertionError` – If there are more than one space in the database.

insert_space(space)[source]¶

Insert a space in the database.

Raises:	`AssertionError` – If a space is already present in the database.

clear()[source]¶: Clear all data from the database.

class chocolate.MongoDBConnection(url, database='chocolate', result_col='results', complementary_col='complementary', space_col='space')[source]¶

Connection to a MongoDB database.

Parameters:

url (str) – Full url to the database including credentials but omitting the database and the collection. When using authenticated databases, the url must contain the database and match the database argument.
database (str) – The database name in the MongoDB engine.
result_col (str) – Collection used to store the experiences and their results.
complementary_col (str) – Collection used to store complementary information necessary to the optimizer.
space_table (str) – Collection used to save the optimization Space.

results_as_dataframe()¶

Compile all the results and transform them using the space specified in the database. It is safe to use this method while other experiments are still writing to the database.

Returns:	A `pandas.DataFrame` containing all results with its `"_chocolate_id"` as `"id"`, their parameters and its loss. Pending results have a loss of `None`.

lock(timeout=-1, poll_interval=0.05)[source]¶

Context manager that locks the entire database.

conn = MongoDBConnection("mongodb://localhost:27017/")
with conn.lock(timeout=5):
    # The database is lock
    all_ = conn.all_results()
    conn.insert({"new_data" : len(all_)})

# The database is unlocked

Parameters:	timeout – If the lock could not be acquired in timeout seconds raises a timeout error. If 0 or less, wait forever. poll_interval – Number of seconds between lock acquisition tryouts.
Raises:	`TimeoutError` – Raised if the lock could not be acquired.

all_results()[source]¶: Get all entries of the result table as a list. The order is undefined.

insert_result(document)[source]¶: Insert a new document in the result table.

update_result(token, values)[source]¶

Update or add values to given documents in the result table.

Parameters:	token – An identifier of the documents to update. value – A mapping of values to update or add.

count_results()[source]¶: Get the total number of entries in the result table.

all_complementary()[source]¶: Get all entries of the complementary information table as a list. The order is undefined.

insert_complementary(document)[source]¶: Insert a new document in the complementary information table.

find_complementary(filter)[source]¶: Find a document from the complementary information table.

get_space()[source]¶

Returns the space used for previous experiments.

Raises:	`AssertionError` – If there are more than one space in the database.

insert_space(space)[source]¶

Insert a space in the database.

Raises:	`AssertionError` – If a space is already present in the database.

clear()[source]¶: Clear all data from the database.

class chocolate.DataFrameConnection(from_file=None)[source]¶

Connection to a pandas DataFrame.

This connection is meant when it is not possible to use the file system or other type of traditional database (e.g. a Kaggle scripts) and absolutely not in concurrent processes. In fact, using this connection in different processes will result in two independent searches not sharing any information.

Parameters:	from_file – The name of a file containing a pickled data frame connection.

Using this connection requires small adjustments to the proposed main script. When the main process finishes, all data will vanish if not explicitly writen to disk. Thus, instead of doing a single evaluation, the main process will incorporate a loop calling the search/sample next method multiple times. Additionally, at the end of the experiment, either extract the best configuration using results_as_dataframe() or write all the data using pickle.

results_as_dataframe()¶

Compile all the results and transform them using the space specified in the database. It is safe to use this method while other experiments are still writing to the database.

Returns:	A `pandas.DataFrame` containing all results with its `"_chocolate_id"` as `"id"`, their parameters and its loss. Pending results have a loss of `None`.

lock(*args, **kwargs)[source]¶: This function does not lock anything. Do not use in concurrent processes.

all_results()[source]¶: Get a list of all entries of the result table. The order is undefined.

insert_result(document)[source]¶: Insert a new document in the result data frame. The columns does not need to be defined nor all present. Any new column will be added to the database and any missing column will get value None.

update_result(document, value)[source]¶

Update or add value of given rows in the result data frame.

Parameters:	document – An identifier of the rows to update. value – A mapping of values to update or add.

count_results()[source]¶: Get the total number of entries in the result table.

all_complementary()[source]¶: Get all entries of the complementary information table as a list. The order is undefined.

insert_complementary(document)[source]¶: Insert a new document (row) in the complementary information data frame.

find_complementary(filter)[source]¶: Find a document (row) from the complementary information data frame.

get_space()[source]¶: Returns the space used for previous experiments.

insert_space(space)[source]¶

Insert a space in the database.

Raises:	`AssertionError` – If a space is already present.

clear()[source]¶: Clear all data.