Database Connections¶
-
class
chocolate.
SQLiteConnection
(url, result_table='results', complementary_table='complementary', space_table='space')[source]¶ Connection to a SQLite database.
Before calling any method you must explicitly
lock()
the database since SQLite does not handle well concurrency.We use dataset under the hood allowing us to manage a SQLite database just like a list of dictionaries. Thus no need to predefine any schema nor maintain it explicitly. You can treat this database just as a list of dictionaries.
Parameters: - url (str) – Full url to the database, as described in the SQLAlchemy
documentation.
The url is parsed to find the database path. A lock file will be
created in the same directory than the database. In memory
databases (
url = "sqlite:///"
orurl = "sqlite:///:memory:"
) are not allowed. - result_table (str) – Table used to store the experiences and their results.
- complementary_table (str) – Table used to store complementary information necessary to the optimizer.
- space_table (str) – Table used to save the optimization
Space
.
Raises: RuntimeError
– When an invalid name is given, see error message for precision.-
results_as_dataframe
()¶ Compile all the results and transform them using the space specified in the database. It is safe to use this method while other experiments are still writing to the database.
Returns: A pandas.DataFrame
containing all results with its"_chocolate_id"
as"id"
, their parameters and its loss. Pending results have a loss ofNone
.
-
lock
(timeout=-1, poll_interval=0.05)[source]¶ Context manager that locks the entire database.
Parameters: - timeout – If the lock could not be acquired in timeout seconds raises a timeout error. If 0 or less, wait forever.
- poll_interval – Number of seconds between lock acquisition tryouts.
Raises: TimeoutError
– Raised if the lock could not be acquired.Example:
conn = SQLiteConnection("sqlite:///temp.db") with conn.lock(timeout=5): # The database is locked all_ = conn.all_results() conn.insert({"new_data" : len(all_)}) # The database is unlocked
-
insert_result
(document)[source]¶ Insert a new document in the result table. The columns must not be defined nor all present. Any new column will be added to the database and any missing column will get value None.
-
update_result
(filter, values)[source]¶ Update or add values of given rows in the result table.
Parameters: - filter – An identifier of the rows to update.
- values – A mapping of values to update or add.
-
all_complementary
()[source]¶ Get all entries of the complementary information table as a list. The order is undefined.
-
insert_complementary
(document)[source]¶ Insert a new document (row) in the complementary information table.
-
get_space
()[source]¶ Returns the space used for previous experiments.
Raises: AssertionError
– If there are more than one space in the database.
-
insert_space
(space)[source]¶ Insert a space in the database.
Raises: AssertionError
– If a space is already present in the database.
- url (str) – Full url to the database, as described in the SQLAlchemy
documentation.
The url is parsed to find the database path. A lock file will be
created in the same directory than the database. In memory
databases (
-
class
chocolate.
MongoDBConnection
(url, database='chocolate', result_col='results', complementary_col='complementary', space_col='space')[source]¶ Connection to a MongoDB database.
Parameters: - url (str) – Full url to the database including credentials but omitting the
database and the collection. When using authenticated databases, the url must
contain the database and match the
database
argument. - database (str) – The database name in the MongoDB engine.
- result_col (str) – Collection used to store the experiences and their results.
- complementary_col (str) – Collection used to store complementary information necessary to the optimizer.
- space_table (str) – Collection used to save the optimization
Space
.
-
results_as_dataframe
()¶ Compile all the results and transform them using the space specified in the database. It is safe to use this method while other experiments are still writing to the database.
Returns: A pandas.DataFrame
containing all results with its"_chocolate_id"
as"id"
, their parameters and its loss. Pending results have a loss ofNone
.
-
lock
(timeout=-1, poll_interval=0.05)[source]¶ Context manager that locks the entire database.
conn = MongoDBConnection("mongodb://localhost:27017/") with conn.lock(timeout=5): # The database is lock all_ = conn.all_results() conn.insert({"new_data" : len(all_)}) # The database is unlocked
Parameters: - timeout – If the lock could not be acquired in timeout seconds raises a timeout error. If 0 or less, wait forever.
- poll_interval – Number of seconds between lock acquisition tryouts.
Raises: TimeoutError
– Raised if the lock could not be acquired.
-
update_result
(token, values)[source]¶ Update or add values to given documents in the result table.
Parameters: - token – An identifier of the documents to update.
- value – A mapping of values to update or add.
-
all_complementary
()[source]¶ Get all entries of the complementary information table as a list. The order is undefined.
-
insert_complementary
(document)[source]¶ Insert a new document in the complementary information table.
-
get_space
()[source]¶ Returns the space used for previous experiments.
Raises: AssertionError
– If there are more than one space in the database.
-
insert_space
(space)[source]¶ Insert a space in the database.
Raises: AssertionError
– If a space is already present in the database.
- url (str) – Full url to the database including credentials but omitting the
database and the collection. When using authenticated databases, the url must
contain the database and match the
-
class
chocolate.
DataFrameConnection
(from_file=None)[source]¶ Connection to a pandas DataFrame.
This connection is meant when it is not possible to use the file system or other type of traditional database (e.g. a Kaggle scripts) and absolutely not in concurrent processes. In fact, using this connection in different processes will result in two independent searches not sharing any information.
Parameters: from_file – The name of a file containing a pickled data frame connection. Using this connection requires small adjustments to the proposed main script. When the main process finishes, all data will vanish if not explicitly writen to disk. Thus, instead of doing a single evaluation, the main process will incorporate a loop calling the search/sample
next
method multiple times. Additionally, at the end of the experiment, either extract the best configuration usingresults_as_dataframe()
or write all the data usingpickle
.-
results_as_dataframe
()¶ Compile all the results and transform them using the space specified in the database. It is safe to use this method while other experiments are still writing to the database.
Returns: A pandas.DataFrame
containing all results with its"_chocolate_id"
as"id"
, their parameters and its loss. Pending results have a loss ofNone
.
-
lock
(*args, **kwargs)[source]¶ This function does not lock anything. Do not use in concurrent processes.
-
insert_result
(document)[source]¶ Insert a new document in the result data frame. The columns does not need to be defined nor all present. Any new column will be added to the database and any missing column will get value None.
-
update_result
(document, value)[source]¶ Update or add value of given rows in the result data frame.
Parameters: - document – An identifier of the rows to update.
- value – A mapping of values to update or add.
-
all_complementary
()[source]¶ Get all entries of the complementary information table as a list. The order is undefined.
-
insert_complementary
(document)[source]¶ Insert a new document (row) in the complementary information data frame.
-
find_complementary
(filter)[source]¶ Find a document (row) from the complementary information data frame.
-
insert_space
(space)[source]¶ Insert a space in the database.
Raises: AssertionError
– If a space is already present.
-