Wiki source code of AISOP domains

Last modified by Paul Libbrecht on 2025/04/17 21:43

version	line-number	content
2.2	1	this page describes what an _AISOP domain_ is, a project that reflects the course material and how it is used in the AISOP webapp.
	2
	3	### Purpose of a domain
	4
	5	Contain all course-specific information that is used by the AISOP web-application so that it can apply to a seminar, display analyses and classify the portfolios made in these courses in a meaningful manner.
	6
	7	### Ingredients of a domain
	8
	9	Minimum: this allows the AISOP web-app to run:
	10
4.1	11	* title and description
	12	* set of labels of level 1 and of other levels
	13	* concept-map (exported in CXL) where the labels are nodes
	14	* spacy model that allows the text-classification of paragraphs of portfolios made in this course: one for level 1, and one for each of the level 1 topics (to classify its subtopics)
	15	* sequence of analysis scripts and aggregation scripts to deliver the portfolio explorer and portfolio dashboard
2.2	16
	17	Optional: this allows others to further develop the domain
	18
4.1	19	* source content which contains the sentences used for training
	20	* extracted sentences/fragments
	21	* annotations for these extracted sentences within the labels of level 1 and the others
	22	* annotation statistics and model training results (in the form of statistics)
	23	* test sentences to verify the proper elementary function of the classifiers
	24	* test portfolios to verify the proper function
2.2	25
	26	### Packaging of a domain
	27
	28	We propose that a domain be packaged as a directory which can be shared as a repository and to contain the following directory organization:
	29
4.1	30	* the directory-name reflects the course name
	31	* the directory contains a file `about.json` with the properties `title`, `description`, `language` (in iso-639-3) and an array of strings for the authors, `subjects` an array of strings containing the LC-subject-classification, and `logo` (the link to a logo)
	32	* the directory contains a file `license.txt` with the license text
	33	* the directory contains a `labels.txt` file with a list of label names, organized in a hierarchy by simple indenting
	34	* the directory contains a file `pipeline.json` with the steps of the analysis and aggregation
	35	* the pipeline refers to the model spacy directories (level 1 and one for the children of each level 1) which are included
4.2	36	* the concept-map used called `cmap.cxl` and its source `cmap.cmap` (for dev)
	37	* all models:
	38	* the `l1-model` directory is the spacy model for the classifier for the l1-topics
	39	* the `l2-models` contains a directory for each l1-topic which contains a spacy model for the sub-labels of this l1-topic
4.1	40	* any extra file or directory mentioned as link
6.1	41	* the `tests.txt` file contains the test fragments so that the debug tool can be used right away, one line per fragment
	42	* the `log.txt` file contains the statistical output of the training and/or statistitics: one line per label, one column par dimension
	43	* optionally, any file used for development, documented by a `README.md` (see below)
3.1	44
	45	All paths of links used in the `about.json` and `pipeline.json` files can be resolved in a relative manner. For them to be recognized, we recommend to express relative paths with the syntax of starting with `./` as in `"logo":"./my-logo.svg"`. This allows the web-app to perform relative resolution in a secure way (not going outside of the domain directory except for known places) before it is given to the web-server or to the analysis scripts.
6.1	46
	47	While the README.md should be the main entry point for the source work for creating the domain, we propose the following folder names:
	48
	49	- `source-content`: a collection of files (e.g. PDFs, pictures, texts, pptx, ...) that represent the source input from where an extraction is made
	50	- `extracts` is the result of the extraction process and is made of JSON files, one, or one folder, per source collection
	51	- `annotations` is the result of the annotations exported from prodigy in the form of JSONL files
	52	- moreover, instructions used and the log of all processes is visible in the `README.md` file

Applications

More applications

Need help?

If you need help with XWiki you can contact:

XWiki 13.10.2

Wiki source code of AISOP domains

Applications

Navigation

Need help?