Quick-Start
Application for browsing the data in the browser.
npm6.4.1- modern browser
git clone https://github.com/siret/similant.gitcd similantnpm install- Initialize BROWSER with empty data folder in
/publicfoldercp empty-data data
npm start- Browser should open http://localhost:3000/ in new tab.
All configuration files are stored in /public/data folder. Example configuration is in /public/example folder.
Configuration file descriptors.json contains list of similarity models, it is represented as JSON array of JSON objects.
[
{
"id": "<model id>",
"name": "<model name>",
"url": "<url to model configuration>"
},
{
"id": "<model id>",
"name": "<model name>",
"url": "<url to model configuration>"
},
...
]Configuration file descriptors/<model>.json contains information about model and related clustering sizes.
{
"id": "<model id>",
"title": "<model name>",
"type": "<model type>",
"clusters": [
{
"id": "<clustering id>",
"size": <clustering size>,
"url": "<clustering url>"
},
{
"id": "<clustering id>",
"size": <clustering size>,
"url": "<clustering url>"
},
...
],
"data": {
"<record id>": <descriptor related fields>,
"<record id>": <descriptor related fields>,
...
},
...
<type related fields>
...
}The supported types are:
time-serieswith additional fieldaxis(JSON array contains labels of time points).set-tokenswith additional fieldslabels(JSON objects contains translate table from token to label)limit(number of most frequent shown tokens)
Clustering files are usually placed in descriptors/<model>/<size>.json and contains information about groups of records (clusters).
{
"<cluster id>": {
"id": "<cluster id>",
"pos": [ <x position>, <y position> ],
"radius": <cluster radius>,
"items": [
"<record id>",
"<record id>",
...
]
},
...
}Database records configuration file is placed in individuals.json and contains "all" information about every record in database in form of JSON object. Information are shown in left panel.
{
"<record_id>": {
"id": "<record_id>",
"data": {
"<key>": "<value>",
"<key>": "<value>",
...
}
},
"<record_id>": {
"id": "<record_id>",
"data": {
"<key>": "<value>",
"<key>": "<value>",
...
}
},
...
}Targets configuration file is placed in targets.json and is there located list of all available targets. Targets are shown in right panel.
[
{
"id": "<target id>",
"name": "<target name>",
"url": "<target configuration URL>"
},
{
"id": "<target id>",
"name": "<target name>",
"url": "<target configuration URL>"
},
...
]Target configuration file contains information about current target and value for every record. It is usually located in /targets folder.
{
"name": "<target name>",
"type": "<target type>",
"data": {
"<record id>": "<value>",
"<record id>": "<value>",
...
},
...
<type related fields>
...
}The supported types are:
ordinalwith additional fieldaxis(JSON array containing expected values),histogramwith additional fieldbins(JSON array containing limits of histogram bins).
In the python folder the script.py for quick model generation is located.
python3.6.8numpy1.16.1scikit-learn0.20.2scipy1.2.0pandas0.24.0
Script is prepared for CSV file (UTF-8 encoding) in following format.
id,data
<record id 1>,<descriptor value 1>,<descriptor value 2>,...,<descriptor value n_1>
<record id 2>,<descriptor value 1>,<descriptor value 2>,...,<descriptor value n_2>
...
<record id m>,<descriptor value 1>,<descriptor value 2>,...,<descriptor value n_m>
New models can be added using Python script:
python add_model.py -i <path to CSV file> --addAll options can be listed using following command:
python add_model.py -h