Changes for page The AISOP recipe
Last modified by Paul Libbrecht on 2025/06/15 23:32
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -61,12 +61,11 @@ 61 61 62 62 It is time to endow the fragments with topics so that we can recognize students' paragraphs' topics. In AISOP, we have used the (commercial) [[prodigy>>https://prodi.gy/]] for this task in two steps which, both, iterate through all fragments to give them topics. 63 63 64 -**The first step: top-level-labels:** This is the simple [["text classifier" recipe>>https://prodi.gy/docs/recipes#textcat]] of prodigy: we can invoke the following command for this: prodigy textcat.manual the-course-name-l1 ./fragments.jsonl ~-~-label labels-depth1.txt which will offer a web-interface on which each fragment is annotated with the (top-level) label. This web-interface can be left running for several days. 65 -Then extract the content into a file: prodigy db-out the-course-name-l1 > the-course-name-dbout.jsonl 64 +**The first step: top-level-labels:** This is the simple [["text classifier" recipe>>https://prodi.gy/docs/recipes#textcat]] of prodigy: we can invoke the following command for this: prodigy textcat.manual the-course-name ./fragments.jsonl ~-~-label labels-depth1.txt which will offer a web-interface on which each fragment is annotated with the (top-level) label. This web-interface can be left running for several days. 66 66 67 -**The second step is the hierarchical annotation** [[custom recipe>>https://gitlab.com/aisop/aisop-nlp/-/tree/main/hierarchical_annotation?ref_type=heads]] (link to become public soon): The same fragments are now annotated with the top-level annotation and all their children. E.g. using the command python -m prodigy subcat_annotate_with_top2 the-course-name-l2 the-course-name-dbout.jsonl labels-all-depths.txt -F ./subcat_annotate_with_top2.py .66 +**The second step is the hierarchical annotation** [[custom recipe>>https://gitlab.com/aisop/aisop-nlp/-/tree/main/hierarchical_annotation?ref_type=heads]] (link to become public soon): The same fragments are now annotated with the top-level annotation and all their children. E.g. using the command xxx 68 68 69 -The resulting data-set can be extracted out of prodigy using the db-out recipe, e.g. prodigy db-out the-course-name-l2 the-course-name-l2 -dbout or can be converted to a spaCy dataset for training e.g. using the command xxxxx (see [[here>>https://gitlab.com/aisop/aisop-nlp/-/tree/main/it3/fundamental-principles]])68 +The resulting data-set can be extracted out of prodigy using the db-out recipe, e.g. prodigy db-out the-course-name-l2 the-course-name-l2 70 70 71 71 72 72 ---- ... ... @@ -75,21 +75,19 @@ 75 75 76 76 === 2.1 Train a Recognition Model === 77 77 78 - See [[here>>https://gitlab.com/aisop/aisop-nlp/-/tree/main/it3/fundamental-principles]].77 +... 79 79 80 80 === 2.2 Create a Pipeline === 81 81 82 -... write down the configuration JSON of the pipeline, get inspired [[pipeline-medieninderlehre.json>>https://gitlab.com/aisop/aisop-webapp/-/blob/main/config/couchdb/pipeline-medieninderlehre.json?ref_type=heads]]81 +... 83 83 84 84 === 2.3 Create a Seminar and Import Content === 85 85 86 86 ... 87 87 88 -Create a seminar with the web-interface, associate the appropriate pipeline. 89 - 90 90 === 2.4 Interface with the composition platform === 91 91 92 - See the Mahara authorization configuration.89 +... 93 93 94 94 ---- 95 95