How to listen to two others in real time UDP

Started by
7 comments, last by hplus0603 4 years ago

I have a voip app, I have a problem when two or more people are talking together at the same time the packages are shuffling among themselves and cause an impression that they are "stuck". How do I receive two or more packages in real time without the packages mixing?

Advertisement

What exactly are you doing, what is your code? UDP doesn't have connections, using for example `recvfrom` you can see the source address of a particular packet, you might assign some “connection” meaning to a given source ip:port combination.

If making multiple “connections” between the same two IP addresses, you will need different ports, one side (probably the “client” end that “establishes” the connection), might use an ethereal port. Also be sure to consider the implications of NAT, common on nearly every network if accessing the internet.

I made an image in paint representing the processes (I think you can understand it better). The problem is when a customer is listening to audio packages coming from two other customers, the packages are in real time, so they mix and end up with the voice lock on the two customers who are sending the audio.

Like @syncviews says:

On the server, you can tell the two senders apart by looking at the source address using recvfrom(), and assign a separate session to each source address (and then implement some kind of session timeout when you stop receiving data from an address:port source for a little while.)

On the client, you have to decide whether you want to support one stream or multiple streams. If you only support one stream on the client, you need to mix all the available audio on the server for that client, and then encode and send that one stream to the client. If you want to support multiple streams, you have to tag each packet with a separate stream identifier (add some framing information that includes sender/session source information,) and sort the apart on the receiving side.

enum Bool { True, False, FileNotFound };

@undefined

So, I did what you recommended, now I'm in doubt if I'm putting together the right audio packages. Should I join byte by byte? Example:

packetfinal[0] = packetOne [0];

packetfinal[1] = packetTwo [0];

packetfinal[2] = packetOne [1];

packetfinal[3] = packetTwo [1];

etc…

Or do you have a formula for doing this?

No, what I am suggesting is that you create a higher-level packet format. Something like:

struct packet {
	uint16_t sequence_wrap;
	uint16_t time_wrap;
	uint16_t sourceid;
	uint16_t kind;
	uint8_t data[]; // the rest of the packet
};

You then define a number of “kinds” of packets:

enum PacketKind {
	PK_Null,
	PK_AddSource,
	PK_RemoveSource,
	PK_SourceSpeaking,
};

For each kind of packet, you have a separate structure that goes into the “data” section:

struct packet_Null {
	//	no data needed
};
struct packet_AddSource {
    uint64_t player_id;
    char player_name[MAX_NAME_LENGTH];
};
struct packet_RemoveSource {
	//	no data needed -- sourceid is already in header
};
struct packet_SourceSpeaking {
	//	the data array is the payload for the codec
};

In your packet receive function, you dispatch it as appropriate:

unordered_map<uint16_t, shared_ptr<playback>> g_sources;

void onPacket(packet *p) {
	switch (p->kind) {
	case PK_Null: // do nothing
		break;
	case PK_AddSource:
		onAddSource(p, (packet_AddSource *)p->data);
		break;
	case PK_RemoveSource:
		onRemoveSource(p, (packet_RemoveSource *)p->data);
		break;
	case PK_SourceSpeaking:
		onSourceSpeaking(p, (packet_SourceSpeaking *)p->data);
		break;
	default:
		//	unknown packet kind
		break;
	}
}

When the client receives a packet with a source ID, it creates a separate codec/playback stream for each active source id:

void onAddSource(packet *p, packet_AddSource *a) {
	auto pb(g_sources.find(p->sourceid));
	if (pb == g_sources.end()) {
		g_sources[p->sourceid] = shared_ptr<playback>(new playback(p->sourceid, a->player_id, a->player_name));
	}
}
void onRemoveSource(packet *p, packet_RemoveSource *r) {
	auto pb(g_sources.find(p->sourceid));
	if (pb != g_sources.end()) {
		g_sources.erase(pb);
	}
}

Finally, when you receive a packet for a given source, you feed it to the codec.

void onSourceSpeaking(packet *p, packet_SourceSpeaking *s) {
	auto pb(g_sources.find(p->sourceid));
	if (pb == g_sources.end()) {
		//	did I miss an AddSource ?
		//	maybe I could add an "anonumous" playback here and
		//	maybe fill in name and playerid later, instead of
		//	ignoring the data?
		return;
	}
	pb->gotCodecData((char const *)s);
}

Now, on the server, when you receive data from some player at an IP address and port, you translate that to a sourceid. (Think of “sourceid” as “index of currently online player,” perhaps.) Typically, this will happen by players sending a “login” message with name + credentials in their first packet, and that mapping name+credentials to (sourceid → player, playername, set of players to listen to) For each player that should listen to this player, send a packet_AddSource packet to that player.

Then, for each player listening to the player that sent incoming codec data packets, create a packet_SourceSpeaking packet and send to those players.

When you detect that a player has logged off (stopped sending, timing out, or explicit logoff, or kick/boot,) then send a packet_RemoveSource packet to all the listening players.

enum Bool { True, False, FileNotFound };

@hplus0603 I can receive a list with two arrays of bytes (one from each client), but when it comes to reproducing it, the audio and interference bursts, when only one client is speaking the audio is OK, but when two people are speaking, everything is inaudible . (I'm probably putting the bytes together wrongly).

byte[][] vetor = new byte[cTracks][1400];

                                for (int i = 0; i < cTracks; i++) {
                                    Log.i(TAG, "Entrou Lista " + i);
                                    vetor[i] = aBuf.get(i).getBuf();
                                }

                                for (int i = 0; i < cTracks; i++) {
                                    inBuf.remove(0);
                                }

                                int remove = 0;
                                byte[] audioF = new byte[cTracks * 1400];

                                int i = 0;
                                while (i - remove < (cTracks * 1400)) {
                                    for (int j = 0; j < cTracks; j++) {
                                        audioF[i - remove] = vetor[j][remove];
                                        i = i + 1;
                                    }
                                    remove += 1;
                                    i = i + 1;
                                }

                                Log.i(REC, "Tamanho do Audio final = " + audioF.length);
                                track.write(audioF, 0, audioF.length);

cTracks is the size of the list, in the list it is only possible to have one value per ip.

You need a separate codec playback stream per player. You can't just jam the bytes from all different senders into a single receiver. The code and instructions I left above should be sufficient to show how you could accomplish this.

enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement