Sakytinės lietuvių kalbos tekstynas ‒ natūralios vartosenos tyrimų šaltinis

The article describes the Corpus of Spoken Lithuanian, its structure, compilation stages (collection of the recordings, transcription, and grammatical annotation), and the methodology of data collection and digitalization; in addition, it discusses the possibilities of corpus application in the rese...

Full description

Saved in:
Bibliographic Details
Published in:Taikomoji kalbotyra (Online) no. 9; pp. 176 - 198
Main Author: Kamandulytė-Merfeldienė, Laura
Format: Journal Article
Language:Lithuanian
Published: Vilniaus Universiteto Leidykla 2017
Vilnius University Press
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The article describes the Corpus of Spoken Lithuanian, its structure, compilation stages (collection of the recordings, transcription, and grammatical annotation), and the methodology of data collection and digitalization; in addition, it discusses the possibilities of corpus application in the research of natural language usage and the research, which has already been carried out, using the corpus data. At present (2017), the corpus, which is freely accessible for internet users, contains 226,174 word forms. The users of the online corpus version can perform search of a word or a word form and obtain data on the frequency of the form in the whole corpus or its part as well as see grammatical information about it. In 2016-2017, the Corpus of Spoken Lithuanian was supplemented by new data resulting from the implementation of the project “Contemporary Spoken Lithuanian: A Corpus-based Analysis of Grammar and Lexis” (LIP-085/2016) financed by the Research Council of Lithuania under the programme of the State Lithuanian Studies and Dissemination Programme for 2016–2024. The project will also create a new internet access, which will provide more possibilities for the users. The updated corpus consists of 256 conversations (383,587 words) produced by 1,086 speakers (659 females and 427 males), whose age ranges from 3 to 81 years. When developing the Corpus of Spoken Lithuanian, much attention was paid to its composition, i.e. the proportions of the corpus. In order to improve the universality and suitability of the corpus for a more varied analysis, the principle of a balanced corpus was maintained; therefore, several criteria were taken into consideration when collecting the data: the nature of spoken language (private vs public speech) and structure (dialogues vs polilogues), different communication situations (direct vs indirect (e.g. a telephone conversation), demographic indicators, and social relations among the interlocutors. Therefore, in 2018, users of the updated version of the corpus will be able to filter results according to different categories, such as gender, age, place and structure of the conversation, and perform a more detailed search. It is expected that when the users are provided with more possibilities to analyse corpus data on the internet, the amount of spoken language research will increase comprising different areas of lexis and grammar.
AbstractList The article describes the Corpus of Spoken Lithuanian, its structure, compilation stages (collection of the recordings, transcription, and grammatical annotation), and the methodology of data collection and digitalization; in addition, it discusses the possibilities of corpus application in the research of natural language usage and the research, which has already been carried out, using the corpus data. At present (2017), the corpus, which is freely accessible for internet users, contains 226,174 word forms. The users of the online corpus version can perform search of a word or a word form and obtain data on the frequency of the form in the whole corpus or its part as well as see grammatical information about it. In 2016-2017, the Corpus of Spoken Lithuanian was supplemented by new data resulting from the implementation of the project “Contemporary Spoken Lithuanian: A Corpus-based Analysis of Grammar and Lexis” (LIP-085/2016) financed by the Research Council of Lithuania under the programme of the State Lithuanian Studies and Dissemination Programme for 2016–2024. The project will also create a new internet access, which will provide more possibilities for the users. The updated corpus consists of 256 conversations (383,587 words) produced by 1,086 speakers (659 females and 427 males), whose age ranges from 3 to 81 years. When developing the Corpus of Spoken Lithuanian, much attention was paid to its composition, i.e. the proportions of the corpus. In order to improve the universality and suitability of the corpus for a more varied analysis, the principle of a balanced corpus was maintained; therefore, several criteria were taken into consideration when collecting the data: the nature of spoken language (private vs public speech) and structure (dialogues vs polilogues), different communication situations (direct vs indirect (e.g. a telephone conversation), demographic indicators, and social relations among the interlocutors. Therefore, in 2018, users of the updated version of the corpus will be able to filter results according to different categories, such as gender, age, place and structure of the conversation, and perform a more detailed search. It is expected that when the users are provided with more possibilities to analyse corpus data on the internet, the amount of spoken language research will increase comprising different areas of lexis and grammar.
Author Kamandulytė-Merfeldienė, Laura
Author_xml – sequence: 1
  fullname: Kamandulytė-Merfeldienė, Laura
BookMark eNrjYmDJy89LZWLgNDIwstS1sDQ25WDgLS7OMjAwMDK3MDYxMuVkCApOzK4sycw7Mr1YIScztaS0LPPoZoXsxJyk_GKFktTs4pLKvMRihUcNkxTyEkuOri5KzMkEypQlFpXkF6fmgRRVFmXmAvUcXZiYAzQos5iHgTUtMac4lRdKczPIuLmGOHvoJqem5ufEZ-WXFuUBxePNDCyNjSyMCUgDAFtxQnA
ContentType Journal Article
DBID AE2
BIXPP
REL
DatabaseName Central and Eastern European Online Library (C.E.E.O.L.) (DFG Nationallizenzen)
CEEOL: Open Access
Central and Eastern European Online Library - CEEOL Journals
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Languages & Literatures
DocumentTitleAlternate The Corpus of Spoken Lithuanian as a Research Source of Natural Usage
EISSN 2029-8935
EndPage 198
ExternalDocumentID 609328
GroupedDBID AAFWJ
ACQDZ
AE2
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BIXPP
GROUPED_DOAJ
M~E
OK1
REL
ID FETCH-ceeol_journals_6093283
IngestDate Tue Oct 29 20:35:15 EDT 2024
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords spoken language
Lithuanian
corpus
morphological annotation
spontanous speech
Language Lithuanian
LinkModel OpenURL
MergedId FETCHMERGED-ceeol_journals_6093283
OpenAccessLink https://www.ceeol.com//search/article-detail?id=609328
PageCount 23
ParticipantIDs ceeol_journals_609328
PublicationCentury 2000
PublicationDate 2017
PublicationDateYYYYMMDD 2017-01-01
PublicationDate_xml – year: 2017
  text: 2017
PublicationDecade 2010
PublicationTitle Taikomoji kalbotyra (Online)
PublicationTitleAlternate Applied Linguistics
PublicationYear 2017
Publisher Vilniaus Universiteto Leidykla
Vilnius University Press
Publisher_xml – name: Vilniaus Universiteto Leidykla
– name: Vilnius University Press
SSID ssj0002783425
Score 4.0531826
Snippet The article describes the Corpus of Spoken Lithuanian, its structure, compilation stages (collection of the recordings, transcription, and grammatical...
SourceID ceeol
SourceType Publisher
StartPage 176
SubjectTerms Applied Linguistics
Baltic Languages
Title Sakytinės lietuvių kalbos tekstynas ‒ natūralios vartosenos tyrimų šaltinis
URI https://www.ceeol.com//search/article-detail?id=609328
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtZ3LToQwFIYbLxs3xvvddKFuCIlchsvSyxgXowudGHemQEkQBpJpmYSdz-DGtWuXvoLzIj6Jp7SDxGiiCzcETtpS-Jryn9KeIrQHnZ5DDYfoke1Fuu1EHd2LAxsuwzA249DzY-Eonl-7l7feadfuTvZwB6-_sf0rabABa7Fy9g-0m0LBAOfAHI5AHY6_4n5N0koEBgCR6LtMA4nJy1Gyf9LZP7a0lGRBwTROU8arnDBtMtfB1HLCRaKjYzHyAWlGUHbBRChxjVfDZKBKqNMYJOMqJEmjbPskSYtBcZ_Im0Ae8jWUqezZyYDkUZlVXFZRv6DDmGZi4pk0TFZrk_aAhFx5WTeemyTLE1K2JvZQ0M89mkRVmqkPTt2zmWLODQilTquB-a1e1HCd1gfZkNtUfwmL7RyC8hSLvMEB9lqu9L38mWrZYvPzWZAARdYSDP0FNK-UPj6SiBbRVMaX0FpPjQ8zfIB7TUhrtoyuFLi3J4YVtPErlsBwAwy_PzxiQDV-kZjwJyZcY4I842eFZwVtn3X7J-d6Xb871XzYnXwoaxXN5EVO1xF2zNj1AsO3rcCyLUp9s-NFwsULQ9cggbmBVr4tYvMH-xaaE8jkGNI2muHDku6gaRaVu_Ur_AAf5TOR
link.rule.ids 315,782,786,4028
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sakytin%C4%97s+lietuvi%C5%B3+kalbos+tekstynas+%E2%80%92+nat%C5%ABralios+vartosenos+tyrim%C5%B3+%C5%A1altinis&rft.jtitle=Taikomoji+kalbotyra+%28Online%29&rft.au=Kamandulyt%C4%97-Merfeldien%C4%97%2C+Laura&rft.date=2017&rft.pub=Vilniaus+Universiteto+Leidykla&rft.eissn=2029-8935&rft.issue=9&rft.spage=176&rft.epage=198&rft.externalDocID=609328
thumbnail_s http://sdu.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.ceeol.com%2F%2Fapi%2Fimage%2Fgetissuecoverimage%3Fid%3Dpicture_2017_31732.jpg