Show last authors
1 this page describes what an _AISOP domain_ is, a project that reflects the course material and how it is used in the AISOP webapp.
2
3 ### Purpose of a domain
4
5 Contain all course-specific information that is used by the AISOP web-application so that it can apply to a seminar, display analyses and classify the portfolios made in these courses in a meaningful manner.
6
7 ### Ingredients of a domain
8
9 Minimum: this allows the AISOP web-app to run:
10
11 * title and description
12 * set of labels of level 1 and of other levels
13 * concept-map (exported in CXL) where the labels are nodes
14 * spacy model that allows the text-classification of paragraphs of portfolios made in this course: one for level 1, and one for each of the level 1 topics (to classify its subtopics)
15 * sequence of analysis scripts and aggregation scripts to deliver the portfolio explorer and portfolio dashboard
16
17 Optional: this allows others to further develop the domain
18
19 * source content which contains the sentences used for training
20 * extracted sentences/fragments
21 * annotations for these extracted sentences within the labels of level 1 and the others
22 * annotation statistics and model training results (in the form of statistics)
23 * test sentences to verify the proper elementary function of the classifiers
24 * test portfolios to verify the proper function
25
26 ### Packaging of a domain
27
28 We propose that a domain be packaged as a directory which can be shared as a repository and to contain the following directory organization:
29
30 * the directory-name reflects the course name
31 * the directory contains a file `about.json` with the properties `title`, `description`, `language` (in iso-639-3) and an array of strings for the authors, `subjects` an array of strings containing the LC-subject-classification, and `logo` (the link to a logo)
32 * the directory contains a file `license.txt` with the license text
33 * the directory contains a `labels.txt` file with a list of label names, organized in a hierarchy by simple indenting
34 * the directory contains a file `pipeline.json` with the steps of the analysis and aggregation
35 * the pipeline refers to the model spacy directories (level 1 and one for the children of each level 1) which are included
36 * the concept-map used called `cmap.cxl` and its source `cmap.cmap` (for dev)
37 * all models:
38 * the `l1-model` directory is the spacy model for the classifier for the l1-topics
39 * the `l2-models` contains a directory for each l1-topic which contains a spacy model for the sub-labels of this l1-topic
40 * any extra file or directory mentioned as link
41 * the `tests.txt` file contains the test fragments so that the debug tool can be used right away, one line per fragment
42 * the `log.txt` file contains the statistical output of the training and/or statistitics: one line per label, one column par dimension
43 * optionally, any file used for development, documented by a `README.md` (see below)
44
45 All paths of links used in the `about.json` and `pipeline.json` files can be resolved in a relative manner. For them to be recognized, we recommend to express relative paths with the syntax of starting with `./` as in `"logo":"./my-logo.svg"`. This allows the web-app to perform relative resolution in a secure way (not going outside of the domain directory except for known places) before it is given to the web-server or to the analysis scripts.
46
47 While the README.md should be the main entry point for the source work for creating the domain, we propose the following folder names:
48
49 - `source-content`: a collection of files (e.g. PDFs, pictures, texts, pptx, ...) that represent the source input from where an extraction is made
50 - `extracts` is the result of the extraction process and is made of JSON files, one, or one folder, per source collection
51 - `annotations` is the result of the annotations exported from prodigy in the form of JSONL files
52 - moreover, instructions used and the log of all processes is visible in the `README.md` file

Need help?

If you need help with XWiki you can contact: