# Introduction he Automatic speech recognizers (ASR) are used to facilitate communication between humans and machines. So it's a machine which understands human and the words spoken by them. The process of segmentation is one of the most important phases in the automatic recognition of speech. There are various units of speech into which it can be segmented, but syllables are found to be one of the most efficient units for automatic speech segmentation. The characteristics features of speech can be expressed by using STE and ZCR. The STE function also known as Short Term Energy function is known to be the better representative of speech segment boundaries. By computing the shorttime Fourier analysis information in the speech signal can be extracted. But due to difficulty in computing the phase and also in processing the phase function over the past few decades the features of the FT phase were not exploited fully. By processing the derivative of the FT phase, the information in the short-time FT phase function can be extracted. There are various units of speech. The syllables are found to be the most suitable unit for automatic speech segmentation. A single component in the syllable is called as nucleus. The nucleus is found to be vowel while the onset and coda are usually consonantal in form. The energy peak in the nucleus region can be viewed as the syllable; the consonants can be viewed as the valleys at both the ends. Many languages been spoken around the world posses a syllabic structure [10]. Mostly the syllable contains two phonetic segments of type CV such as in Japanese language. In contrast, English and German possess a more highly heterogeneous syllable structure [2]. # II. # Research Background a) Language Units of Speech in Punjabi Punjabi is an Aryan language that is spoken by more than hundred million people those are inhabitants of the historical Punjab region (in north western India and Pakistan) and in the Diaspora, particularly Britain, Canada, North America, East Africa and Australasia [8]. Like other Indian languages the Punjabi language also contains segmental phonemes. The three basic units into which the speech can be segmented are: Words, Phonemes and Syllables. The syllable is the most important and widely used unit for automatic speech segmentation. Punjabi is a syllabic language thus syllables are selected as the basic units for segmentation. # b) Syllables as Basic unit of speech Aksharas is the basic units of the writing system. An Akshara is an orthographic representation of a speech sound in an Indian language. Basically they are syllabic in nature; the typical forms of akshara are V, CV, CCV and CCCV type, where C and V are consonant vowel respectively [9]. There are thirty eight consonants in Punjabi language. Where ten are non-nasal and ten are nasal vowels. Vowels can appear alone but consonants can only appear with vowels. The number of nasal vowels is same as non-nasal vowels and is represented by Bindi or Tippi over the Non-Nasal Vowels. Following is the list of consonants in Punjabi language: # Three State Respresentation of Speech The continuous speech signals composed of two elements one includes the speech information, and the other carries noise or silent sections. The verbal part of the speech can be further divided into two categories: voiced and unvoiced speech. Moment the air from the lungs passes through the larynx voiced sound is produced. With the passage of air directly through the vocal tract formations the unvoiced speech sounds are produced. The speech production process is incomplete without the detection of voiced and unvoiced speech that is separated by a silence region. In case of silence region no excitation is supplied to the vocal tract and thus, no speech is produced. A regular speech is incomplete inaccurate without silence region. It helps to make the speech understandable [3]. IV. # Characterization OF Speech In order to segment continuous speech it is required to check its basic content, whether the signal is voiced or unvoiced. The two characteristics features of voice are the zero crossing rate (ZCR) and short term energy (STE) [13]. # a) Zero Crossing Rate The rate at which the signal crosses zero provides the information regarding its (source of creation) i.e. zero crossing rate. Unvoiced speech has higher zero crossing rate. Whereas in case of voiced speech the zero crossing rate is low. Thus, the amplitude of unvoiced segments is lower than that of the voiced segments. ZCR can be defined as: The STE can be defined as follows: (2) The STE of voiced signal is always much greater than that of unvoiced signals. In a speech signal where there are voiced signal its STE will be high, the peaks in the signal represents nucleus that is denoted as vowel where as the valleys at both the ends represents the coda. # SEGMENTATION OF SPEECH The syllable is composed of three parts, the onset, rime (nucleus) and coda. The rime also known as nuclei, where as the onset and coda consist of consonants. The high energy regions are represented by the nuclei where as the valleys at both ends corresponds to syllable boundaries. The vowel region corresponds to much higher energy region compared to that of a consonant region [9]. In case of spontaneous speech, the definition of a syllable in terms of short-term energy function is suitable for almost all the languages. Due to local energy fluctuations the STE function alone cannot be directly used to perform segmentation. Techniques such as fixed or even adaptive threshold will not work when the energy variation across the signal is quite high [1]. To overcome the problems of local energy fluctuations, the STE function should be smoothed. The information in speech signals can be represented in terms of features derived from short-time Fourier analysis. The information in the short-time FT phase function can be extracted by computing the group delay function [9]. H(?) = H1(?) ? H2(?),(3) group delay function can be represented as ?h(?) = ??(arg (H(?))) ---- ?? = ?h1 (?) + ?h2 (?).(4) The equation (1) shows the multiplicative property of magnitude spectra where as equation ( 2) is in group delay domain it shows an addition. The group delay spectrum has been found better due to its additive. It was observed that in case of the magnitude spectra the peaks are clearly visible, but when the two poles are combined together the peaks are not resolved. The research shows the disadvantage of multiplicative property of magnitude spectra. In case of group delay spectra the peaks and valleys are better resolved when the signal is in minimum phase [2]. For any syllable, the STE function of the voiced region, the energy is quite high and diminishes at the ends, representing the consonants, due to which local energy fluctuations. If these local variations are smoothed, then the minima at both ends of a voiced region correspond to syllable boundaries [9]. # The algorithm for group delay based segmentation Step 1 -Let x[n] be continuous speech signal. Step 2 -Compute N, the length(x) of the input signal. Step 3 -Calculate the STE function E[m], where m=1,2,?,M is the number of frames. Step 4 -Inverse the STE i.e E(i)= 1/E(m) Step 5 -Compute the IFFT of E(i), It gives the magnitude of the input signal in form of complex function i.e. a+ib. Step 6 -The phase angle is computed from the above values, i.e. ?= tan-1(b/a). Step 7 -Compute the negative derivative of Fourier transformation i.e. the group delay function. # Results and discussions The technique of automatic segmentation is applied on the continuous Punjabi speech. The method was implemented in Matlab. The group delay algorithm is applied to segment the continuous Punjabi speech waveform. The following sentence is given as an input to the system. 20131![Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XIII Issue XII Version I Respresentation of 38 consonants and 20 vowels in Punjabi LanguageAs already mentioned syllables are the basic and most recommended used units of speech. Syllables are composed of vowel and consonants. Every syllable must have a vowel also known as its nucleus, where as presence of consonant is optional. Vowel (V) is always the nucleus part and the left part is onset and the right part is coda which is always a consonant. The seven types of syllables recognized in Punjabi language are represented in the following figure:](image-2.png "T © 2013 Fi gure 1 :") 2![Figure 2 : Syllables in Punjabi language](image-3.png "Figure 2 :") 3![Figure 3 : Block diagram of characteristic features of voice](image-4.png "Figure 3 :") ![Term Energy (STE) Short-time energy of speech signals reflects the amplitude variation. By processing STE function the speech can be segmented. STE shows the voiced content of the signal [13].](image-5.png "") ![Automatic Segmentation of Punjabi Speech Signal Using Group Delay V.](image-6.png "C") 8![Compute the minimum phase of group delay, i.e. phase(n) -phase(n -1), let the signal be of length n. Locate the positive peaks in the minimum phase group delay function, (Ei gd[f]). If Ei gd[f] is positive, and Ei gd[f-1]