Hide last authors
Paul Libbrecht 2.2 1 this page describes what an _AISOP domain_ is, a project that reflects the course material and how it is used in the AISOP webapp.
2
3 ### Purpose of a domain
4
5 Contain all course-specific information that is used by the AISOP web-application so that it can apply to a seminar, display analyses and classify the portfolios made in these courses in a meaningful manner.
6
7 ### Ingredients of a domain
8
9 Minimum: this allows the AISOP web-app to run:
10
Paul Libbrecht 4.1 11 * title and description
12 * set of labels of level 1 and of other levels
13 * concept-map (exported in CXL) where the labels are nodes
14 * spacy model that allows the text-classification of paragraphs of portfolios made in this course: one for level 1, and one for each of the level 1 topics (to classify its subtopics)
15 * sequence of analysis scripts and aggregation scripts to deliver the portfolio explorer and portfolio dashboard
Paul Libbrecht 2.2 16
17 Optional: this allows others to further develop the domain
18
Paul Libbrecht 4.1 19 * source content which contains the sentences used for training
20 * extracted sentences/fragments
21 * annotations for these extracted sentences within the labels of level 1 and the others
22 * annotation statistics and model training results (in the form of statistics)
23 * test sentences to verify the proper elementary function of the classifiers
24 * test portfolios to verify the proper function
Paul Libbrecht 2.2 25
26 ### Packaging of a domain
27
28 We propose that a domain be packaged as a directory which can be shared as a repository and to contain the following directory organization:
29
Paul Libbrecht 4.1 30 * the directory-name reflects the course name
31 * the directory contains a file `about.json` with the properties `title`, `description`, `language` (in iso-639-3) and an array of strings for the authors, `subjects` an array of strings containing the LC-subject-classification, and `logo` (the link to a logo)
32 * the directory contains a file `license.txt` with the license text
33 * the directory contains a `labels.txt` file with a list of label names, organized in a hierarchy by simple indenting
34 * the directory contains a file `pipeline.json` with the steps of the analysis and aggregation
35 * the pipeline refers to the model spacy directories (level 1 and one for the children of each level 1) which are included
Paul Libbrecht 4.2 36 * the concept-map used called `cmap.cxl` and its source `cmap.cmap` (for dev)
37 * all models:
38 * the `l1-model` directory is the spacy model for the classifier for the l1-topics
39 * the `l2-models` contains a directory for each l1-topic which contains a spacy model for the sub-labels of this l1-topic
Paul Libbrecht 4.1 40 * any extra file or directory mentioned as link
Paul Libbrecht 6.1 41 * the `tests.txt` file contains the test fragments so that the debug tool can be used right away, one line per fragment
42 * the `log.txt` file contains the statistical output of the training and/or statistitics: one line per label, one column par dimension
43 * optionally, any file used for development, documented by a `README.md` (see below)
Paul Libbrecht 3.1 44
45 All paths of links used in the `about.json` and `pipeline.json` files can be resolved in a relative manner. For them to be recognized, we recommend to express relative paths with the syntax of starting with `./` as in `"logo":"./my-logo.svg"`. This allows the web-app to perform relative resolution in a secure way (not going outside of the domain directory except for known places) before it is given to the web-server or to the analysis scripts.
Paul Libbrecht 6.1 46
47 While the README.md should be the main entry point for the source work for creating the domain, we propose the following folder names:
48
49 - `source-content`: a collection of files (e.g. PDFs, pictures, texts, pptx, ...) that represent the source input from where an extraction is made
50 - `extracts` is the result of the extraction process and is made of JSON files, one, or one folder, per source collection
51 - `annotations` is the result of the annotations exported from prodigy in the form of JSONL files
52 - moreover, instructions used and the log of all processes is visible in the `README.md` file

Need help?

If you need help with XWiki you can contact: